[Xapian-discuss] UTF8 support plans (without stemming)
Olly Betts
olly at survex.com
Wed Jun 29 04:19:57 BST 2005
On Thu, Apr 28, 2005 at 01:37:02PM +0100, Olly Betts wrote:
> On Thu, Apr 28, 2005 at 11:08:28AM +0400, Alexandre wrote:
> > it's very hard to make it work with other languages (for example, with
> > russian) - there are lots of problems inside...
>
> The problems aren't anywhere near as great as you seem to expect, at
> least in part because unicode support has always been a goal we've kept
> in mind.
And here's a search which works in Russian (or Chinese or Elvish or ...)
implemented using Xapian:
http://rain.gmane.org/?query=%D0%B2%D0%BE%D0%B7%D0%BC%D0%BE%D0%B6%D0%BD%D0%BE
This is using a patched version of the QueryParser. Currently I'm using
glib's unicode routines, but I wonder if we really want to add a
dependency on glib when we only use a very tiny part of it.
I already have C code for handling utf-8. I'm going to see what else is
around for unicode versions of "isalpha" etc.
In the meantime, if anyone is interested in my somewhat hacked up patch
to give a UTF8 savvy QueryParser, let me know and I'll send you a copy.
Cheers,
Olly
More information about the Xapian-discuss
mailing list