[Snowball-discuss] Using UTF-16 with libstemmer_c

Hai Zaar haizaar at gmail.com
Sun Apr 13 20:26:56 BST 2008


On Sun, Apr 13, 2008 at 10:22 PM, Martin Porter
<martin.porter at grapeshot.co.uk> wrote:
>
>  Hai Zaar,
>
>  No, we don't deal with UTF-16 currently. Is that a problem to you?
Yes, it is. In my application (C++) I work with strings using ICU
library, which holds all strings in UTF-16 format. That means that in
order to stem a string, I have to convert it to UTF-8, pass to
stemmer, and then convert the stemmed result back to UTF-16. This
looks like a significant overhead.

>
>  Martin
>
>
>
>  On Sun, 2008-04-13 at 21:31 +0300, Hai Zaar wrote:
>  > Good day! I've downloaded the latest libstemmer_c. Can it deal with
>  > UTF-16 strings? - the only supported encodings listed are UTF_8 and
>  > ISO_8859_1.
>  >
>  >
>
>



-- 
Zaar



More information about the Snowball-discuss mailing list