[Xapian-discuss] Search performance issues and profiling/debugging

Olly Betts olly at survex.com
Fri Nov 2 19:15:57 GMT 2007


On Fri, Nov 02, 2007 at 01:27:49PM +0200, Ron Kass wrote:
> One thing which was I think was somewhat clear was that with BM25 
> parameters changed, searching was slower.

Some parameter combinations will be slower than others as they'll affect
how tight the theoretical upper bound on the weight is compared to the
weights we actually see.

There's scope for improving this by tracking statistics such as "maximum
document length" and "highest wdf", which is something I want to look at
doing but haven't found time for yet.

We will also need such statistics to implement Divergence from
Randomness weighting schemes, which have the potential to out-perform
BM25:

http://en.wikipedia.org/wiki/Divergence_from_randomness_model

> However, I think part of the problem was with the database files as 
> well, as we received SegFault on specific databases combination and not 
> with others.. but this could be a coincidence.

If it's not related to this BM25Weight bug, it seems most likely that
the segfault is caused by a bug which only manifests in very particular
circumstances.

> Let me know if you managed to hunt down the rouge parameter and we can 
> test it again to see the effect.

This patch fixes my testcase:

http://oligarchy.co.uk/xapian/patches/xapian-bm25-nonzero-k2.patch 

Can you try it and see what difference it makes?

Cheers,
    Olly



More information about the Xapian-discuss mailing list