[Snowball-discuss] Using UTF-16 with libstemmer_c
Blake Madden
madden_blake at yahoo.com
Mon Apr 14 13:17:39 BST 2008
Hai Zaar,
The oleander stemming library, "http://www.oleandersolutions.com/stemming/stemming.html", is Unicode (UTF-16) and written in C++. Perhaps this is what you are looking for?
Blake
----- Original Message ----
From: Hai Zaar <haizaar at gmail.com>
To: martin.porter at grapeshot.co.uk
Cc: Dima Babitsky <dimok21 at gmail.com>; snowball-discuss at lists.tartarus.org
Sent: Sunday, April 13, 2008 3:26:56 PM
Subject: Re: [Snowball-discuss] Using UTF-16 with libstemmer_c
On Sun, Apr 13, 2008 at 10:22 PM, Martin Porter
<martin.porter at grapeshot.co.uk> wrote:
>
> Hai Zaar,
>
> No, we don't deal with UTF-16 currently. Is that a problem to you?
Yes, it is. In my application (C++) I work with strings using ICU
library, which holds all strings in UTF-16 format. That means that in
order to stem a string, I have to convert it to UTF-8, pass to
stemmer, and then convert the stemmed result back to UTF-16. This
looks like a significant overhead.
>
> Martin
>
>
>
> On Sun, 2008-04-13 at 21:31 +0300, Hai Zaar wrote:
> > Good day! I've downloaded the latest libstemmer_c. Can it deal with
> > UTF-16 strings? - the only supported encodings listed are UTF_8 and
> > ISO_8859_1.
> >
> >
>
>
--
Zaar
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss at lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20080414/a17b63ca/attachment.htm
More information about the Snowball-discuss
mailing list