[Xapian-discuss] Matches estimate varies with sorting method
Olly Betts
olly at survex.com
Wed Oct 17 01:07:36 BST 2007
On Tue, Oct 16, 2007 at 09:50:29PM +0800, Fabrice Colin wrote:
> I found that the figure returned by MSet::get_matches_estimated() varies
> depending on how results are to be sorted.
This in itself isn't a bug - it is after all an estimate!
> For instance, in my index, value 4 contains date and time in the format
> "yyyymmddhhmmss". For the same query, the number of results will be
> estimated to 20000+ when results are first sorted by date and time
> with set_sort_by_value_then_relevance(4) and to only 100 if I use
> set_sort_by_relevance(). The first figure is the correct one.
You're likely to get a more accurate estimate when sorting since the
matcher generally has to consider more documents when sorting.
> Note that the MSet is obtained with Enquire::get_mset(0, 100, 101), so that
> probably explains where the 100 comes from.
But this sounds wrong. If "checkatleast" is 101, get_matches_estimated()
should only be less if the estimate is exact.
What are the corresponding values of get_matches_min() and
get_matches_max() in the two cases?
Does this also happen with SVN HEAD? There have been some
matcher-related changes, but nothing specifically addressing that I'm
aware of.
And can you supply a recipe to reproduce this easily?
> The estimate will also be correct with set_sort_by_relevance_then_value(4).
>
> If I am not mistaken, a similar problem was reported, and apparently fixed,
> back in September :
> http://comments.gmane.org/gmane.comp.search.xapian.general/5110
>
> I am using 1.0.3.
That fix would have made it into 1.0.3, so I don't think it can be the
exact same issue.
Cheers,
Olly
More information about the Xapian-discuss
mailing list