[Xapian-discuss] UTF-8: what is done and what is not?
Olly Betts
olly at survex.com
Fri Nov 3 01:34:51 GMT 2006
On Thu, Nov 02, 2006 at 08:01:44PM -0500, tata 668 wrote:
> Doesn't a UTF-8 queryparser useless until it uses the exact same word
> splitter than the one use for indexing the documents?
It would be more convenient is a compatible word splitter were available
in the core library, but "useless" is much too strong a summary of the
situation.
I'm intending to improve this situation before releasing 1.0. Prior to
that, I suggest cribbing from indextext.cc in Omega - that's what
omindex and scriptindex use for tokenising utf-8 text.
Cheers,
Olly
More information about the Xapian-discuss
mailing list