[Snowball-discuss] Haskell bindings, and issues with UTF_8

Dag Odenhall dag.odenhall at gmail.com
Sun Aug 25 13:53:51 BST 2013


Well, this is embarrassing. I have just discovered that the fault was with
me (and the earlier Haskell bindings I based my version on), not
libstemmer. I was passing the number of unicode characters to
sb_stemmer_stem rather than the number of bytes when encoded. I really
should have known better, as proper handling of Unicode was my motivation
for writing new bindings in the first place!

The reason only UTF-8 appeared broken is probably that the other encodings
use single bytes for all characters used in the Snowball fixtures.

Sorry about the false alarm!

Cheers, Dag


On Mon, Dec 24, 2012 at 1:19 PM, Martin Porter <martin.f.porter at gmail.com>wrote:

> Dag,
>
> We're getting on for Christmas, but I'll talk to Richard Boulton about
> this, and other snowball issues, after the holiday season, and hope to
> come back with some answers.
>
> Thanks for your email,
>
> Martin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20130825/55466b90/attachment.html>


More information about the Snowball-discuss mailing list