[Xapian-discuss] time bias/sorting

Olly Betts olly at survex.com
Fri Jul 14 15:18:05 BST 2006


On Wed, Jul 12, 2006 at 11:43:39AM +0100, Joss Shaw wrote:
> A term is like a posting but without positional information. You search on
> terms and postings.
> 
> What therefore is a value - are these searched on in the traditional
> sense ('keyword foo bar'), or are they used just to narrow a search
> down - like a boolean operator might.

Hmm, the "Overview" document isn't at all clear on this (and even refers
you to a non-existent Enquire method).  I've just rewritten it to say this
which is somewhat better:

  Each document can have the following types of information associated with it:

  * document data - this is an arbitrary block of data accessed using
    Xapian::Document::get_data(). The contents of the document data can be
    whatever you want and in whatever format. Often it contains a URL or other
    external UID, a document title, and an excerpt from the document text. If
    you wish to interoperate with Omega, it should contain name=value pairs,
    one per line (recent versions of Omega also support one field value per
    line, and can assign names to line numbers in the query template).
  * document values - these are arbitrary blocks of data which are stored so
    they can be accessed rapidly during the match process (to allow sorting
    collapsing of duplicates, etc). Each block is stored in a numbered slot.
    There's currently no length limit, but you should keep them short for
    efficiency.
  * terms and positional information - terms index the document (like index
    entries in the back of a book); positional information records the word
    offset into the document of each occurrence of a particular term. This is
    used to implement phrase searching and the NEAR operator.

  There's some overlap in what you can do with terms and with values.  A
  simple boolean operator (e.g. document language) is definitely better
  done using a term and OP_FILTER.

  Using a value allows you to do things you can't do with terms, such as
  "sort by price", or "show only the best match for each website".  You
  can also perform filtering with a value which is more sophisticated
  than can easily be achieved with terms, for example: find matches
  with a price between $100 and $900.  Omega uses boolean terms to perform
  date range filtering, but this might actually be better done using a
  value (the code in Omega was written before values were added to
  Xapian).

Cheers,
    Olly



More information about the Xapian-discuss mailing list