[Snowball-discuss] Minor bug in utf-8 handling

Richard Boulton richard at lemurconsulting.com
Thu Feb 15 19:52:17 GMT 2007


Olly Betts wrote:
> I think I've spotted a bug in the handling of 3 byte utf-8 sequences
> while reading the code.  Both get_utf8 and get_b_utf8 fetch the third
> byte with *p when they should use p[c]:
> 
> http://oligarchy.co.uk/xapian/patches/snowball-3byte-utf8-bugfix.patch
> 
> In current stemmers, this is probably harmless, as the characters in use
> in the languages snowball has stemmers for encode as one or two byte
> utf-8 sequences.
> 
> I also improved the comment before skip_utf8.

I've applied this patch.



More information about the Snowball-discuss mailing list