[Xapian-discuss] UTF-8: what is done and what is not?

Fri Nov 3 01:42:02 GMT 2006

I didn't want to sound rude using the word "useless", I'm sorry.

I'm gonna look at that indextext.cc file.

Thanks a lot again! Xapian is an important part of my application!

JL

----- Original Message ----- 
From: "Olly Betts" <olly at survex.com>
To: "tata 668" <tata668 at gmail.com>
Cc: <xapian-discuss at lists.xapian.org>
Sent: Thursday, November 02, 2006 8:34 PM
Subject: Re: [Xapian-discuss] UTF-8: what is done and what is not?

> On Thu, Nov 02, 2006 at 08:01:44PM -0500, tata 668 wrote:
>> Doesn't a UTF-8 queryparser useless until it uses the exact same word 
>> splitter than the one use for indexing the documents?
> 
> It would be more convenient is a compatible word splitter were available
> in the core library, but "useless" is much too strong a summary of the
> situation.
> 
> I'm intending to improve this situation before releasing 1.0.  Prior to
> that, I suggest cribbing from indextext.cc in Omega - that's what
> omindex and scriptindex use for tokenising utf-8 text.
> 
> Cheers,
>    Olly