[Xapian-discuss] indexing and queryparsing: UTF-8 and PHP
Olly Betts
olly at survex.com
Mon Feb 27 15:03:41 GMT 2006
On Mon, Feb 27, 2006 at 09:04:40AM +0100, Thomas Deniau wrote:
> Before devising this solution, I wanted to do this from OmegaScript, but I
> haven't found any call that would return a list of the unstemmed forms of a
> term in the sample of the document, not in the query
To implement that, we'd have to word split the sample, stem each word,
compare it to the list of stemmed forms in the query, and if it matches
add the unstemmed form to a list which we return.
Then to do the highlighting, you have to word split the sample, compare
each word to the list of unstemmed forms, and if it matches, highlight
it.
Compare that to using $highlight - you've added the overhead of word
splitting the sample a second time, plus the generation and checking
of a potentially long list of unstemmed forms...
Cheers,
Olly
More information about the Xapian-discuss
mailing list