[Xapian-discuss] xapian performance

Olly Betts olly at survex.com
Thu Nov 16 16:12:56 GMT 2006


On Thu, Nov 16, 2006 at 01:00:01PM -0200, Fernando Nemec wrote:
> As I told you, the improve in search time by queries like "A B C" was
> great.

That's good.

> In the other hand, I try to search for "A B" (considering A and
> B are very common words) and it took 90 seconds when before the patch
> it used to take 60 seconds.

I suspect the regression for "A B" is due to using wdf instead of the
true positionlist length when deciding which term to check first.  For
the 2 term case we can use the true statistics though.  Actually, we
can use the true statistics to order the two terms with the highest
wdf for any case.  Try this updated patch:

http://www.oligarchy.co.uk/xapian/patches/xapian-experimental-phrase-optimisation-v2.patch

If anyone else has a large database with positional information, please
give this patch a whirl.  I'd be slightly cautious about using it live
in a production system - correctness shouldn't be an issue, but there
could be performance regressions for some cases still.

Cheers,
    Olly



More information about the Xapian-discuss mailing list