[Snowball-discuss] Minor bug in utf-8 handling
Olly Betts
olly at survex.com
Tue Feb 13 17:01:45 GMT 2007
I think I've spotted a bug in the handling of 3 byte utf-8 sequences
while reading the code. Both get_utf8 and get_b_utf8 fetch the third
byte with *p when they should use p[c]:
http://oligarchy.co.uk/xapian/patches/snowball-3byte-utf8-bugfix.patch
In current stemmers, this is probably harmless, as the characters in use
in the languages snowball has stemmers for encode as one or two byte
utf-8 sequences.
I also improved the comment before skip_utf8.
Cheers,
Olly
More information about the Snowball-discuss
mailing list