[Xapian-discuss] Searching subset of documents

Olly Betts olly at survex.com
Thu Jun 1 16:19:09 BST 2006


On Thu, Jun 01, 2006 at 03:13:54AM -0600, Rusty Conover wrote:
> The subset of documents to be searched is not nicely able to be  
> defined with boolean fields. Currently I'm running a query in an  
> external database which returns the Xapian document ids where that  
> the Xapian query should be matched.

So is this the result of an SQL query?

> I've written code so that custom decider functions can be passed to  
> get_mset() in Search::Xapian, but doesn't appear to be able to do the  
> job.  Because the decider function isn't passed the document id, just  
> the document object itself.  I suppose this is because the document  
> id appears to be munged with the number of active databases currently  
> being searched, to ensure uniqueness across all databases.

I think it's just an oversight that it doesn't get the docid.  If you're
searching multiple databases, it's easy enough to map the merged docid
back to the database and docid it came from.  But this isn't the best
approach for you I think, unless you're rejecting very few documents.
The MatchDecider is assumed to be expensive and so is applied to as
few documents as possible, hence as late in the matcher's processing as
possible.

> Is there a more efficient way to go about this, where the document  
> list could be filtered before the term matcher goes to work?  Does it  
> really make a difference with regard to order?

You really want to do this as early as you can (i.e. near the root of
the query tree), to avoid having to read sections of postlist which you
aren't going to use (assuming the external source of docids can't do
anything useful with a "skip_to").

So you want to be able to have an ExternalSourcePostList which just gets
docids from some external source and then you can take the query do:

enquire.set_query(Query(Query::OP_FILTER, query, external_source_postlist));

It'd be handy to have something like this available, and it's not too
hard to implement.  I'm not likely to have time to look at it for a
while, but I can point you in the right direction if you want to look.

Cheers,
    Olly



More information about the Xapian-discuss mailing list