[Snowball-discuss] Unicode

Vineet Gupta vineet@stratify.com
Fri, 22 Feb 2002 11:04:22 -0800

Another alternative is to get the IBM International Components for Unicode
library in C or Java.  
It has a wide variety of converters, along with lots of other functionality
for internationalization and localization (for example its IsAlpha function
is better than the iswalpha that comes with Microsoft Visual C++).

For input hex characters, it might be useful to follow the usual convention
--- hex '0A0D' is one character, whereas hex '0A 0D' is two characters (so
start reading digits until you reach a non-hex digit, that constitutes one
character).  In a hex string the only legal characters might be 0-9, A-F and


Snowball-discuss mailing list