[Xapian-discuss] queryparser thinks ø is o

R. Mattes rm at seid-online.de
Sun Aug 28 13:49:23 BST 2005


On Mon, 2005-08-29 at 14:18 +0200, Marcus Ramberg wrote:
> On Aug 28, 2005, at 2:14 PM, R. Mattes wrote:
> 
> > On Mon, 2005-08-29 at 11:04 +0200, Marcus Ramberg wrote:
> >
> >> marcus at ds1:~/src/Horus-Indexer$ ./stemtest
> >> Xapian::Query(bolle:(pos=1))
> >> bølle
> >> So, I'm pretty sure it's not the stemmer. Any other ideas?
> >
> > Lost's of :-)
> > Yes, the queryparser itself modifies characters. The code that does  
> > this
> > is in 'xapian/xapian-core/queryparser/accentnormalisingitor.h'. IMHO
> > this is a rather "murky" and anglocentric part of the Xapian codebase.
> >
> > Frankly, i just removed the offending parts of the code - but a  
> > cleaner
> > solution would be preferable. My current approach would be to make
> > the static tables in 'xapian/xapian-core/queryparser/symboltab.h'
> > configurable by language (sigh, not enough time right now).
> 
> hey Ralf.
> 
> Thanks for the tips, however, disabling the action in normalizer  
> makes the queryparser tokenize on æøå instead of including them in  
> the term. where can I modify the tokenizer in queryparser to include  
> high-ascii chars (or at least the ones I need).

Ah, sorry -- too fast typing, too little thinking.
I'm using some extentions/patches from Olly Betts that enable
unicode - either you have to wait until Olly Betts is back or
you have to nag him personally ;-} 
I'm not shure about the status of his patches and i'd hate to
release code that's considered non-public. Anyway, i had to
tweak the aptches to apply them to 0.9.2 (and had to change some
signatures to get them to compile ...).

 Cheers RalfD
  
> Marcus
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss




More information about the Xapian-discuss mailing list