[Snowball-discuss] Using UTF-16 with libstemmer_c

Blake Madden madden_blake at yahoo.com
Mon Apr 14 13:17:39 BST 2008


Hai Zaar,

The oleander stemming library, "http://www.oleandersolutions.com/stemming/stemming.html", is Unicode (UTF-16) and written in C++.  Perhaps this is what you are looking for?

Blake


----- Original Message ----
From: Hai Zaar <haizaar at gmail.com>
To: martin.porter at grapeshot.co.uk
Cc: Dima Babitsky <dimok21 at gmail.com>; snowball-discuss at lists.tartarus.org
Sent: Sunday, April 13, 2008 3:26:56 PM
Subject: Re: [Snowball-discuss] Using UTF-16 with libstemmer_c

On Sun, Apr 13, 2008 at 10:22 PM, Martin Porter
<martin.porter at grapeshot.co.uk> wrote:
>
>  Hai Zaar,
>
>  No, we don't deal with UTF-16 currently. Is that a problem to you?
Yes, it is. In my application (C++) I work with strings using ICU
library, which holds all strings in UTF-16 format. That means that in
order to stem a string, I have to convert it to UTF-8, pass to
stemmer, and then convert the stemmed result back to UTF-16. This
looks like a significant overhead.

>
>  Martin
>
>
>
>  On Sun, 2008-04-13 at 21:31 +0300, Hai Zaar wrote:
>  > Good day! I've downloaded the latest libstemmer_c. Can it deal with
>  > UTF-16 strings? - the only supported encodings listed are UTF_8 and
>  > ISO_8859_1.
>  >
>  >
>
>



-- 
Zaar

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss at lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss






      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20080414/a17b63ca/attachment.htm 


More information about the Snowball-discuss mailing list