[Xapian-discuss] Xapian documentation

James Aylett james-xapian at tartarus.org
Wed May 10 12:41:50 BST 2006


On Wed, May 10, 2006 at 01:24:06AM +0100, Olly Betts wrote:

> > but I suspect we want
> > to split the "how does Xapian think" up a little so people who don't
> > want to program to Xapian can read it.
> 
> None of the internal documentation really describes it at that level
> though (if anything our internal documentation is a bit too low level!)

Richard thought he needed to fill in the gap a little :-)

> "Overview" gives a reasonable idea of "how Xapian thinks" from a user's
> point of view.  While it could be improved, I think it's along the right
> lines.

Yes, there's a lot of good stuff there. One thing I particularly want
to do is change the slant slightly so it doesn't start "This document
provides an introduction to the native C++ Xapian API" - most of the
information is great for people who will never code to Xapian, and all
of it is good for people who will use the bindings, and I suspect some
people get frightened off by the mention of C++.

> I think the full list of current users is probably too dynamic and
> mostly interesting as a set of links to look at.  A few cherry-picked
> examples to illustrate the size and range of uses would work in a manual
> though.

Good idea.

> The way I see it, for a new visitor, the front page is perhaps their
> first or at least one of their earliest impressions of Xapian.  So
> it's good to have a succinct, clean-looking page.  It needs to explain
> what Xapian is and what it can do for them (and probably also what it
> can't do so that we don't constantly get asked!)

Agreed. 

> It's good to cover the different possible reasons people might be
> looking at Xapian, and to some extent we have this already - there's a
> paragraph for developers and one for people looking for something like
> Omega.

Yes, it just needs repacking slightly, and making more obvious. Layout
is probably of more use than changing the text here - you have to read
all the paragraphs to find the one you want. Some headings would help.

I also feel that the order is wrong. I'd like:

 * Welcome to Xapian. The latest version is <X>, released <Y>

   Rationale for including the latest version: (a) a fair number of
   people will be return visitors, and if they aren't on the mailing
   list they will want to quickly double-check if they are up to date
   (alternatively we could put it in the side menu for this); (b) less
   importantly, I find when visiting a new project that I get a lot of
   confidence if the latest release has a recent date

 * Xapian is an OSPIR library, released under the GPL. <then a bit
   explaining that for people who don't know what information retrieval
   is?>. 

   Rationale for removing the bindings and C++ reference: this says
   what Xapian is, with a slightly unpacked explanation of why this is
   good. It's therefore useful for people who will go on to use Omega,
   whereas currently I worry that some people will be put off by the
   time they get to that paragraph.

 * Omega (for the least hands-on class of user)

 * Bindings (for the next step up) and C++ API

If each section has a heading then potential Omega users may actually
jump straight into the right section, saving even more time.

Then, once we have that stuff sorted out, some sort of "Hacking" entry
on the menu for people who want to get involved. (Probably called "Get
involved", since we also want people to help out in ways that don't
involve C++.)

> I know you weren't seriously suggesting that, but as a serious
> point, it's best if the prose flows naturally, so we want to try to
> avoid too much repetition in the sentence structure.

I disagree: the front page is somewhere where we *don't* want
whole-page narrative, because we don't want people to have to read the
whole lot to make sense. However we should still be able to largely
avoid repetition, and as you say the text is mostly there, it just
needs jiggling and laying out differently IMHO.

> I've noticed the current front page also provides a readily
> cut-and-paste-able "soundbite" which people can use if they mention Xapian
> in their blog or post it to del.icio.us or whatever:
> 
> http://www.google.com/search?q=%22Xapian+is+an+Open+Source+Probabilistic+Information+Retrieval+library%22

Except a lot of people don't understand the sound bite - I started
drafting the intro for the manual and had to take the sound bite and
then explain it. Most people who would benefit from Omega don't want
to think about what a 'Probabilistic Information Retrieval library' is
- but yes, we definitely still want this at the top of the front page,
as well as the license, because they're crucial pieces of information.

I note that "open source information retrieval library" brings up
Xapian top, but "open source search engine library" brings up Lucene
then Swish-e then Nutch, then Lucene.Net, then OpenGL (for some
reason) *then* Xapian. We're on the top page, but because "search
engine" isn't even in the same paragraph as the rest we're losing
Googleheight. If I were looking for an OSS search engine, I wouldn't
get to Xapian until I'd given up on three other systems, two of which
probably could do what I want them to do.

> Then I see the features page as really just there to answer the next
> question: "does it support obscure <feature x>?"  To that end, it
> perhaps deserves a link from the front page prose.

Yes, probably in the paragraph that is "what is (conceptually) Xapian?".

> And to some extent the history page lets people know that Xapian is a
> reasonably mature project (though rereading the page, a couple more
> dates would help!)

I think we need three: start of Open Muscat (umm, can't remember,
1998?), start of Xapian (2001), Webtop (?).

> Once you know what Xapian is and you've decided that it is for you, you
> don't really need to view either of the front page or features again.
> For returning visitors the front page really just serves as a jumping
> off point to whichever page you actually wanted.

Yes. Agree very much that the home page wants to be light for this
reason.

For an example of a project home I like, see
<http://www.drupal.org/>. For one that I dislike, see
<http://www.mamboserver.com/>.

Basically, Drupal gets across what it is, that it's quality, and all
the useful links plus version number, right at the top of the
page. Mambo doesn't manage more than the useful links.

> The only concession is the "latest stable version is N" link, and even
> that's mostly there as a subtle way to convey to new users that this is
> an active project, not one which hasn't made a release this century!

:)

> Looking at this from the other direction, I dislike projects I have to
> download just to read the documentation - it's nice to be able to browse
> it online.  Which is another argument for pulling it all together into
> a more coherent whole, as that whole can be put on the website as well
> as being packaged for downloading.

Yes, we want it all online and available offline too. I similarly
dislike that (you get it a lot in FSF projects).

> > The FAQ can obviously link into the manual as well; better, the FAQ
> > could be part of the manual (this is what we should have done with
> > Zap) so it's distributed in the tarball.
> 
> At least while releases are reasonably regular that makes sense.  If
> they become less frequent, we can always update the manual between
> code releases anyhow.

Indeed.

> > I'd rather use wiki-text, but then we need a way to convert that into
> > a nice book.
> 
> By chance, this cropped up in the latest Debian Weekly News:
> 
> http://lists.debian.org/debian-edu/2006/05/msg00017.html
> 
> Essentially it's a docbook output filter for moinmoin, which you can
> then feed into the usual docbook processing tools.  With a bit of
> scripting you should be able to glue together the docbook from lots
> of different pages and wrap it in a higher level docbook tag (the filter
> seems to output each page as an "article", so you can wrap them all in a
> "part" or "book").  

Cool.

> Also linked was an OpenOffice Writer to moinmoin convertor, and the same
> page also has a thing to allow you to use docbook instead of wiki
> markup (I think that would be a mistake for us though):
> 
> http://ooowiki.de/WikiKonverter (in German only)
> http://translate.google.com/translate?hl=en&sl=de&u=ooowiki.de/WikiKonverter
>   (comedy English translation)

Yeah, I think we should stick with Wikitext.

> > However we'd need to know how to do indexing before commiting to that
> > - I abhore the idea of writing even a small bookworth of information
> > but not being able to produce a decent index.
> 
> My suggestion would be to add a simple macroname.  Are you simply
> looking for the ability to say "include a link to here for the index
> entry 'foo'"?  If so, then a simple custom macro would allow you to
> write "[[Index(foo)]]" and expand it to an empty string for now, but
> we'd have the information in place to handle it in a more sophisticated
> way when exporting to docbook later.
> 
> http://moinmoin.wikiwikiweb.de/HelpOnMacros

That would do. I'd prefer a slightly richer macro that did
[[Index(foo,indexalias)]] and outputted ``foo'' into the main text,
with indexalias optional, because it saves a lot of time. (That's one
thing I disliked about Docbook.)

[disconnected wikis]
> At least the backing is often a directory of flat files.

True. I discussed this with a guy at work, and we decided that the
'best' way of doing it would probably involve doing three-way merges
by hand, again, so we gave up thinking about it :-/

> Another approach would be to have the wiki working in its own SVN
> checkout of the master sources, with a (probably manually assisted)
> checkin periodically.

I'm thinking from a practical point of view that I'm likely to have
the most time to work on documentation when I don't have an internet
connection. All I really need there is to take the flat files and edit
them by hand, perhaps some preview scripts; then I need something to
tell me if the wiki has changed since I last grabbed the flat files.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list