[Xapian-discuss] indexing / searching hierarchies

James Aylett james-xapian at tartarus.org
Fri Jan 4 20:30:09 GMT 2008


On Fri, Jan 04, 2008 at 02:21:56PM -0500, Kapil Thangavelu wrote:

> i'd like to index a set of documents arranged in a hierarchy, and perform
> queries to retrieve subsets of documents based on their position within a
> hierarchy...
> 
> ie
> 
>  /australia
>    /mammals
>       /marsupials
>       /dingos
>    /reptiles
>       /snakes
> 
> so i'd like to search against a given sub hierarchy.
> 
> in systems like lucene, i'd index the full document path as a filed, and use
> a prefix query when searching against a subset of the hierarchy.

You have a choice of doing the work at index time or at query
time. Index time is preferred. (Query time will work in the way lucene
does, but you have to do a little more work.)

Index time
----------

Generate terms for each level of the hierarchy. You'll want to give
them a prefix, assuming you're doing the standard Xapian term
style. Say you choose the prefix XH (for hierarchy - X is for any 'user'
prefixes), then you might generate:

XHaustralia
XHaustralia/mammals
XHaustralia/mammals/marsupials

for a single document. And perhaps:

XHaustralia
XHaustralia/reptiles
XHaustralia/reptiles/snakes

for another. Then at search time you search for
'XHaustralia/reptiles', or whatever level you actually want. (You can
use QueryParser::add_boolean_prefix() to say search on topic:australia
or topic:australia/reptiles .)

Query time
----------

Generate a single term for the position in the hierarchy:

XHaustralia/mammals/marsupials/

Then at search time, you want to OP_FILTER on a query constructed
something like say:

Query q(Query::OP_OR,
  db.allterms_begin('XHaustralia/'),
  db.allterms_end('XHaustralia/'));

(the trailing slashes prevent it from matching XHaustralian, if your
hierarchy contains that separately for some reason - there are
obviously other examples which would actually trip you up).

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list