[Snowball-discuss] UTF-8

Vineet Gupta vineet@stratify.com
Fri, 22 Feb 2002 11:08:49 -0800


	3) UTF-8 encoded 8 bit characters. I believe the only change to the
	generated C is that cursor movements of the form z->c++; and z->c--;
need to
	be replaced by function calls that move over 1,2 or 3 bytes to get
to the
	next character.

It is much easier to have UCS-2 internally, and simply add a converter
to/from UTF-8.  This way you need to output only one style of code, with an
option to compile with and without UNICODE.  Converters to/from UTF-8 are
trivial, I can send you one if you need.

Vineet

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss