[Xapian-discuss] Xapian performance on gmane.org compared

Arjen van der Meijden acmmailing at tweakers.net
Thu Aug 27 15:48:44 BST 2009


On 27-8-2009 16:06 Henry wrote:
> Using xapian revision 13300 (chert db).
> Test chert database is about 4GB - 320,000 docs.
> 
> Performance for typical one or more keyword searches is quick.  For  
> example, search for [upload site page] yields the query:
> Xapian::Query((upload:(pos=1) OR site:(pos=2) OR page:(pos=3)))
> Takes a second.
> 
> However, searching for something like [co.uk] is mind-numbingly and  
> _alarmingly_ slow.
> Xapian::Query((co:(pos=1) PHRASE 2 uk:(pos=2)))
> Looks like it interprets this search as a phrase.
> Takes over _40_ seconds.

You could have a look at the size of the result for non-phrased co and 
uk (i.e. "co AND uk"). We've seen pretty bad performance for some phrase 
queries in the flint-database, but then our machine used to be 
io-dependent. This should give you an idea of how many documents are 
loaded from disk for the initial selection and how fast that goes.
But since the phrase-query touches another large table, you can't use it 
as more than a simple base line.

> I'm trying to get a handle on how best to improve the situation, so  
> having something to compare against would be informative.  I notice  
> that gmane.org has about 70 million articles, yet the same search  
> [co.uk] returns in 4s.  Yes, these are plain text and relatively small  
> docs, but still...

4GB is a "very small" database, i.e. it can fit in a amount of ram that 
is now becoming common for desktops. How much memory does your 
search-machine have? If it doesn't have at least 4GB, and you can spare 
a bit of money, increase it.

If there are no other factors in play, and your query-performance is 
solely or largely caused by lacking I/O-performance, you could also 
install a ssd-drive. With our benchmark, we had all phrase-queries turn 
from io-limited into cpu-limited, simply because both the ram and ssd's 
in our server just were easily fast enough to keep up.

Best regards,

Arjen



More information about the Xapian-discuss mailing list