[Xapian-discuss] Sort by docid

Olly Betts olly at survex.com
Wed Jun 29 18:02:54 BST 2005


On Wed, Jun 29, 2005 at 12:14:34PM -0400, Marco Tabini wrote:
> The problem at hand is that I'm building a search engine for a mailing list
> and I would like to return matches sorted by date; ordering by docid (since
> the messages are indexed in chronological order) seems to be the simplest
> way to do so, but because I'm running a probabilistic query I don't think I
> can use Enquire::set_docid_order, since that will first sort by relevance
> and then by docid.

The answer is to use Enquire::set_docid_order to set BoolWeight as the
weighting scheme.  This is suggested in the API docs for set_docid_order
but it could be more explicit:

    Note: If you add documents in strict date order, then a boolean search
    with set_docid_order(Xapian::Enquire::DESCENDING) is a very efficient
    way to perform "sort by date, newest first".

So you want:

    Xapian::Enquire enq;
    // ...
    enq.set_docid_order(Xapian::Enquire::DESCENDING);
    enq.set_weighting_scheme(Xapian::BoolWeight());

This is the technique I'm using for gmane:  http://rain.gmane.org/

Currently DESCENDING is slower than ASCENDING, because ASCENDING can
terminate early.  I'm going to tweak things so posting lists are run
backwards in the DESCENDING case, which should make it about as fast
as ASCENDING.

This does mean that you don't get the probabilistic weights, but that's
probably not really a problem.

> I thought about adding the date as a value and then use set_sort_by_value,
> but I wonder about performance (the database contains about one million
> records).

That would be somewhat slower.

Cheers,
    Olly



More information about the Xapian-discuss mailing list