[Snowball-discuss] Unicode

Vineet Gupta vineet@stratify.com
Fri, 22 Feb 2002 11:04:22 -0800


Another alternative is to get the IBM International Components for Unicode
library in C or Java.  
http://oss.software.ibm.com/icu/
It has a wide variety of converters, along with lots of other functionality
for internationalization and localization (for example its IsAlpha function
is better than the iswalpha that comes with Microsoft Visual C++).

For input hex characters, it might be useful to follow the usual convention
--- hex '0A0D' is one character, whereas hex '0A 0D' is two characters (so
start reading digits until you reach a non-hex digit, that constitutes one
character).  In a hex string the only legal characters might be 0-9, A-F and
space.

Vineet

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss