[Xapian-discuss] Result count before fetch
Olly Betts
olly at survex.com
Tue May 19 04:28:23 BST 2009
On Sun, May 17, 2009 at 09:09:34PM +0200, Jesper Krogh wrote:
> Ivo Jansch - Ibuildings wrote:
> > To be able to use the Xapian results in a Zend_Paginator, I would like
> > to retrieve the amount of matches _before_ I retrieve them using
> > get_mset. Is this possible?
> >
> > I thought about using $enquier->get_mset(0,1)->get_matches_estimated()
You can actually ask for no matches (i.e. get_mset(0,0)) and get an
estimate without doing much work at all, but it generally won't be as
accurate as you'll get by actually doing a match, and the estimate tends
to improve the more documents you ask for.
You can look at the bounds to know how wrong it could be. If the query
is a single term and there's no collapsing or matchdecider or cutoffs,
the estimate will always be exact.
> > but that would mean an extra get_mset call per pageload, which would
> > seem inefficient.
>
> Someone with more insight that me might give a better answer, but I
> think the answer is no, since the estimate is done in the process of
> finding the results.
Indeed - I'm unclear on the scenario, but I'd suggest just reading the
MSet when you want the estimate and storing it in your "Paginator" for
when you want the results.
> It passes through the index, so if it has reached like 10% through to
> get the 10 matches you requested, then the estimate would be that you
> would have 100 in the total set, allthough it would not be accurate
> unless you actuall requested more than was actually available so you
> forced it to visit all hits.
It's somewhat more complex that just scaling up like this, but it's true
that for a given query, considering more documents will tend to improve
the estimate.
See also:
http://trac.xapian.org/wiki/FAQ/MoreAccurateEstimates
Cheers,
Olly
More information about the Xapian-discuss
mailing list