[Xapian-discuss] UTF-8: what is done and what is not?
tata 668
tata668 at gmail.com
Fri Nov 3 01:42:02 GMT 2006
I didn't want to sound rude using the word "useless", I'm sorry.
I'm gonna look at that indextext.cc file.
Thanks a lot again! Xapian is an important part of my application!
JL
----- Original Message -----
From: "Olly Betts" <olly at survex.com>
To: "tata 668" <tata668 at gmail.com>
Cc: <xapian-discuss at lists.xapian.org>
Sent: Thursday, November 02, 2006 8:34 PM
Subject: Re: [Xapian-discuss] UTF-8: what is done and what is not?
> On Thu, Nov 02, 2006 at 08:01:44PM -0500, tata 668 wrote:
>> Doesn't a UTF-8 queryparser useless until it uses the exact same word
>> splitter than the one use for indexing the documents?
>
> It would be more convenient is a compatible word splitter were available
> in the core library, but "useless" is much too strong a summary of the
> situation.
>
> I'm intending to improve this situation before releasing 1.0. Prior to
> that, I suggest cribbing from indextext.cc in Omega - that's what
> omindex and scriptindex use for tokenising utf-8 text.
>
> Cheers,
> Olly
More information about the Xapian-discuss
mailing list