[Xapian-discuss] UTF-8: what is done and what is not?
tata 668
tata668 at gmail.com
Fri Nov 3 01:01:44 GMT 2006
Thanks for the reply!
But I wonder the same thing than some months ago:
Doesn't a UTF-8 queryparser useless until it uses the exact same word
splitter than the one use for indexing the documents?
I'm really surprised I'm the only one with this problem...
Julien
----- Original Message -----
From: "Olly Betts" <olly at survex.com>
To: "tata 668" <tata668 at gmail.com>
Cc: <xapian-discuss at lists.xapian.org>
Sent: Thursday, November 02, 2006 7:49 PM
Subject: Re: [Xapian-discuss] UTF-8: what is done and what is not?
> On Thu, Nov 02, 2006 at 08:46:36AM -0500, tata 668 wrote:
>> I'm aware of the UTF-8 branch here:
>> http://www.oligarchy.co.uk/xapian/branches/utf8/ , but I'd like more
>> information about what it contains and if it's enough for me.
>
> The current status is summarised here:
>
> http://wiki.xapian.org/Utf8Support
>
> I'm in the process of turning the release handle for 0.9.8 (to fix
> various minor problems reported since 0.9.7), so I'm very close to
> merging the utf-8 branch in and the rate of visible progress should pick
> up.
>
>> Currently, I wrote my own word spliter to index the data and my own
>> queryparser. They are not perfect and I would like to use built-in Xapian
>> objects instead.
>
> There's not currently a word splitter in the core library, but
> Xapian::QueryParser now works in utf-8 on the branch, so you can
> probably use that now.
>
> I've not tested utf-8 from any of the bindings yet. Some languages
> standardise on a particular internal representation, so there could
> be issues here (I don't know how PHP handles such issues). But I'd
> certainly encourage you to try it and let us know if it works or if
> there are problems.
>
> Cheers,
> Olly
More information about the Xapian-discuss
mailing list