[Xapian-discuss] indexing and queryparsing: UTF-8 and PHP
tata 668
tata668 at gmail.com
Sat Feb 25 21:34:35 GMT 2006
I don't really know Swish-e but it seems more html and xml oriented. I NEVER
index any html, xml or any kind of files. I only need to index information
like "member description" that would require slow MySQL plain-text search
without a dedicated library like Xapian.
I would definitivly like to see a function in Xapian that would take a text,
splits the words and index them into the associated Document.
Document::index_text(textToIndex, encoding)
This function would use the same spliting algorithm than the queryparser and
it would accept UTF-8 text...
That's my wish! ;-)
----- Original Message -----
From: "Peter Karman" <peter at peknet.com>
To: "tata 668" <tata668 at gmail.com>
Cc: <xapian-discuss at lists.xapian.org>
Sent: Saturday, February 25, 2006 2:18 PM
Subject: Re: [Xapian-discuss] indexing and queryparsing: UTF-8 and PHP
> These are good questions.
>
> tata 668 scribbled on 2/25/06 10:54 AM:
>
>> 1) Am I correct when I say that Xapian doesn't provide an indexer
>> function?
>
> The Omega project provides a couple different indexers. That's a separate
> project from the Xapian library, but they're available together, as are
> the bindings for using other languages (like PHP).
>
> Your questions about how "words" are defined is one reason I prefer
> Swish-e (http://swish-e.org) for smaller projects. Swish-e lets you define
> which characters constitute a "word" and the indexer splits text strings
> accordingly. Also, the indexer is "smart" about word context in HTML and
> XML and lets you bias some words more than others (like titles or
> headings, for example).
>
> Since this is the Xapian list and not the Swish-e list, I will say that
> Xapian offers some key features Swish-e does not, which is why I am on
> this list. :) I am currently working on the next version of Swish-e, which
> will offer the Xapian library as a backend, thus combining the best of
> both worlds: the ease and power of Swish-e's indexer with the scalability
> and ranking features of Xapian.
>
>
> --
> Peter Karman . http://peknet.com/ . peter at peknet.com
More information about the Xapian-discuss
mailing list