[Xapian-discuss] Trouble with German language indexing/searching
Jim Lynch
jim at fayettedigital.com
Wed Feb 15 21:30:05 GMT 2006
Hi Olly,
I'd appreciate a copy of that patch. If I could I'd reverse the
transliteration but it looks to be one way only.
Thanks,
Jim.
Olly Betts wrote:
>On Wed, Feb 15, 2006 at 11:51:29AM -0500, Jim Lynch wrote:
>
>
>>OK, not entirely. When I search for für using Omega, the term that gets
>>returned in the resultant xml is
>><queryterm term="fuer" show="fuer" freq="17"/>
>>
>>I'm using a simple script to generate contextual samples and obviously
>>it doesn't work. So where do I go to tell Xapian that I've got an
>>extended character set?
>>
>>
>
>Currently the QueryParser performs transliteration of accented
>characters (assuming character set iso-8859-1), and this is done
>even when stemming is disabled. In this case, "u-umlaut" is converted
>to "ue".
>
>This has been discussed before a few times, for example:
>
>http://thread.gmane.org/gmane.comp.search.xapian.general/1815
>
>I'm planning to revisit this area before 1.0. I suspect that I'll
>remove the transliteration, and any that makes sense to keep will
>be pushed into the stemmers (since it's a form of normalisation)
>
>Meanwhile, it's not hard to disable if you're happy to run a patched
>version of xapian (I thought I'd sent such a patch to the list but I
>can't find it right now).
>
>Cheers,
> Olly
>
>
>
>
>
More information about the Xapian-discuss
mailing list