[Snowball-discuss] Using UTF-16 with libstemmer_c
Hai Zaar
haizaar at gmail.com
Sun Apr 13 20:26:56 BST 2008
On Sun, Apr 13, 2008 at 10:22 PM, Martin Porter
<martin.porter at grapeshot.co.uk> wrote:
>
> Hai Zaar,
>
> No, we don't deal with UTF-16 currently. Is that a problem to you?
Yes, it is. In my application (C++) I work with strings using ICU
library, which holds all strings in UTF-16 format. That means that in
order to stem a string, I have to convert it to UTF-8, pass to
stemmer, and then convert the stemmed result back to UTF-16. This
looks like a significant overhead.
>
> Martin
>
>
>
> On Sun, 2008-04-13 at 21:31 +0300, Hai Zaar wrote:
> > Good day! I've downloaded the latest libstemmer_c. Can it deal with
> > UTF-16 strings? - the only supported encodings listed are UTF_8 and
> > ISO_8859_1.
> >
> >
>
>
--
Zaar
More information about the Snowball-discuss
mailing list