[Xapian-discuss] queryparser thinks ø is o
rm at seid-online.de
Sun Aug 28 13:49:23 BST 2005
On Mon, 2005-08-29 at 14:18 +0200, Marcus Ramberg wrote:
> On Aug 28, 2005, at 2:14 PM, R. Mattes wrote:
> > On Mon, 2005-08-29 at 11:04 +0200, Marcus Ramberg wrote:
> >> marcus at ds1:~/src/Horus-Indexer$ ./stemtest
> >> Xapian::Query(bolle:(pos=1))
> >> bølle
> >> So, I'm pretty sure it's not the stemmer. Any other ideas?
> > Lost's of :-)
> > Yes, the queryparser itself modifies characters. The code that does
> > this
> > is in 'xapian/xapian-core/queryparser/accentnormalisingitor.h'. IMHO
> > this is a rather "murky" and anglocentric part of the Xapian codebase.
> > Frankly, i just removed the offending parts of the code - but a
> > cleaner
> > solution would be preferable. My current approach would be to make
> > the static tables in 'xapian/xapian-core/queryparser/symboltab.h'
> > configurable by language (sigh, not enough time right now).
> hey Ralf.
> Thanks for the tips, however, disabling the action in normalizer
> makes the queryparser tokenize on æøå instead of including them in
> the term. where can I modify the tokenizer in queryparser to include
> high-ascii chars (or at least the ones I need).
Ah, sorry -- too fast typing, too little thinking.
I'm using some extentions/patches from Olly Betts that enable
unicode - either you have to wait until Olly Betts is back or
you have to nag him personally ;-}
I'm not shure about the status of his patches and i'd hate to
release code that's considered non-public. Anyway, i had to
tweak the aptches to apply them to 0.9.2 (and had to change some
signatures to get them to compile ...).
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
More information about the Xapian-discuss