[Xapian-discuss] Evaluating Xapian

Olly Betts olly at survex.com
Mon Jan 24 14:55:19 GMT 2005


On Mon, Jan 24, 2005 at 02:04:09PM +0100, Markus Peter wrote:
> A person is defined by age/birthdate, gender, contact information like
> state, country, city in different fields, as well as freetext documents
> attached to it. It must be possible to freely combine searches for the
> different fields. That means that I can for example search for all male
> persons between 20 and 30 from the USA where the word "foo" occurs in
> the attached freetext documents.
> 
> The questions now are:
> - Can I, and if yes, how would I do it, restrict searches to a specific
> age group?

You can form a Query object as an OP_OR of all the filter terms, then
apply it to your probabilistic query with OP_FILTER.  In perl, that
looks something like (untested):

    my $query = Search::Xapian::Query->new("foo");
    my $filter = Search::Xapian::Query->new(OP_OR, map {"XAGE$_"} (20..30));
    $filter = Search::Xapian::Query->new(OP_AND, $filter, "XSEXm");
    my $enquire = $db->enquire(OP_FILTER, $query, $filter);

This assumes a document is indexed by the term XAGE20 if the person is 20
years old, and by XSEXm if the person is male.

> Range search features are also useful for such things like "give me all
> documents matching 'foo bar' which have been modified the last 30
> days". I really really want to avoid adding a seperate filtering step
> afterwards for things like that. Omega seems to implement a feature like
> that, but how?

Omega does something slightly more complicated for date ranges to avoid
OR-ing really large lists of terms.  Instead there are terms
representing whole months, and whole years, and these are used whenever
a range includes whole months and/or years, with the ends of the range
being made up of terms covering a single date, if necessary.

If you are creating very large ranges, you might want to experiment
with a similar approach.

> - Does anyone have good Perl-based examples for the indexer and the
> searcher as a starting point?

For searching, see:

Search-Xapian-0.8.4.0/examples/simplesearch.pl

I don't have a good example perl indexer to hand, though the API is
pretty similar to C++, so you can probably use the C++ example.

> - The documentation I read so far is not very explicit on searching
> different fields. The way I currently understood it, I simply make the
> name of the fields I want to support part of the terms I add to the
> documents?

Pretty much.  Conventionally, the prefix you add should be all uppercase.
Multi-letter prefixes should start with "X", and for such prefixes
you need to add a ":" to separate prefix and term if the term itself
starts with a capital letter.

Single letter prefixes (i.e. A-W, Y, and Z) are reserved for standard
uses (e.g. D is a date in the format YYYYMMDD - today would be
D20050124) so that applications like Omega know what to do with them,
and so that you can combine searches over arbitrary database.

But if you don't care about compatibility, you can mostly ignore these
conventions.  In the main library, only the queryparser cares about
them, and only in that it needs to know when to add a ":", and that
it'll use the R prefix for "raw" terms.

At some point, we intend to make the prefix and term combining more
transparent.  Once you know the rules, the current system works very
well, but it looks a bit quirky.

Cheers,
    Olly



More information about the Xapian-discuss mailing list