[Xapian-discuss] Xapian performance on gmane.org compared

Olly Betts olly at survex.com
Fri Aug 28 02:17:18 BST 2009

On Thu, Aug 27, 2009 at 04:06:06PM +0200, Henry wrote:
> I'm trying to get a handle on how best to improve the situation, so  
> having something to compare against would be informative.  I notice  
> that gmane.org has about 70 million articles, yet the same search  
> [co.uk] returns in 4s.  Yes, these are plain text and relatively small  
> docs, but still...

Note that gmane doesn't currently index positional information - the
current search machine doesn't have enough disk space to!

> If I may:
> What DB format is gmane.org using (chert/flint)?

As document on http://search.gmane.org, it's chert.

> What's the DB size on disk?


> How many search servers is gmane.org using?  Their approx. spec?

One, which also handles indexing - see "rain" in the list here:


I'm in the process of commissioning a replacement server ("plane" above)
with a lot more disk space, but it isn't currently live.

As Richard says, my patch in #394 should help, but note that you can
tune the size of the "pond" by setting POND_SIZE in the environment.
The default is 100000 which was sane for the situation I wrote it for,
but higher or lower might be better (and I'd be interested to hear what
works best for other situations so we can set it sanely automatically).
There's no benefit in setting it higher than the number of documents
matched by the AND query of the terms in the phrase.


