[Xapian-discuss] indexing and queryparsing: UTF-8 and PHP
Peter Karman
peter at peknet.com
Sat Feb 25 19:18:26 GMT 2006
These are good questions.
tata 668 scribbled on 2/25/06 10:54 AM:
> 1) Am I correct when I say that Xapian doesn't provide an indexer
> function?
The Omega project provides a couple different indexers. That's a separate
project from the Xapian library, but they're available together, as are the
bindings for using other languages (like PHP).
Your questions about how "words" are defined is one reason I prefer Swish-e
(http://swish-e.org) for smaller projects. Swish-e lets you define which
characters constitute a "word" and the indexer splits text strings accordingly.
Also, the indexer is "smart" about word context in HTML and XML and lets you
bias some words more than others (like titles or headings, for example).
Since this is the Xapian list and not the Swish-e list, I will say that Xapian
offers some key features Swish-e does not, which is why I am on this list. :) I
am currently working on the next version of Swish-e, which will offer the Xapian
library as a backend, thus combining the best of both worlds: the ease and power
of Swish-e's indexer with the scalability and ranking features of Xapian.
--
Peter Karman . http://peknet.com/ . peter at peknet.com
More information about the Xapian-discuss
mailing list