[Xapian-discuss] Matches estimate varies with sorting method

Fabrice Colin fabrice.colin at gmail.com
Mon Oct 22 12:09:06 BST 2007


Hi all,

On 10/17/07, Olly Betts <olly at survex.com> wrote:
> On Tue, Oct 16, 2007 at 09:50:29PM +0800, Fabrice Colin wrote:
> > Note that the MSet is obtained with Enquire::get_mset(0, 100, 101),
> > so that probably explains where the 100 comes from.
>
> But this sounds wrong.  If "checkatleast" is 101,
> get_matches_estimated() should only be less if the estimate is exact.
>
> What are the corresponding values of get_matches_min() and
> get_matches_max() in the two cases?
>
> Does this also happen with SVN HEAD?  There have been some
> matcher-related changes, but nothing specifically addressing that I'm
> aware of.
>
> And can you supply a recipe to reproduce this easily?
>
I am attaching two patches for 1.0.3 that mimic what my app does, and replicate
the behaviour I am seeing.

The patch for omindex saves last_mod as a yyyymmdd formatted string.
The patch for simplesearch uses DateValueRangeProcessor, sets "checkatleast"
to 11 and sorts by value then relevance.

I indexed my system's documentation with :
$ xapian-omega-1.0.3/omindex --db /tmp/toto /usr/share/doc

If I then search for documents modified in 2006 and 2007 with :
$ xapian-core-1.0.3/examples/simplesearch /tmp/toto "20060101..20071231"
simplesearch estimates the number of results to be between 8714
and 9508 (get_matches_estimated() returned 8714).

When sorting by relevance, all three get_matches methods return 10.

Cheers,

Fabrice
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omindex-yyyymmdd.patch
Type: text/x-patch
Size: 859 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20071022/8fe0550a/omindex-yyyymmdd.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simplesearch-datevalue.patch
Type: text/x-patch
Size: 1134 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20071022/8fe0550a/simplesearch-datevalue.bin


More information about the Xapian-discuss mailing list