[Snowball-discuss] 16 bit characters in Snowball
Richard Boulton
richard@tartarus.org
25 May 2002 14:37:44 +0100
On Fri, 2002-05-24 at 20:47, Andreas Jung wrote:
> Seems that the problem is still not solved.
> I re-created all stemmers with and without -w option and in
> both cases snowball produced identical sources. Any ideas why?
Yes, -w doesn't change the output. What it does is allow snowball
programs to use character values in the range 0-65535 instead of 0-255.
A snowball program which can be generated successfully without -w will
not be affected by use of -w. However, a snowball program which uses
characters out of the range 0-255 will not be generated successfully
without -w.
If you're using -w to generate snowball output, you must also set
the typedef of "symbol" in api.h to something appropriate when you
compile the sources: see the comment at the start of api.h
Note that using -w and setting the size of symbol still doesn't
guarantee that the snowball program is using a 16 bit character set: see
the russian/stem.sbl file for an example: by default it uses KOI8-R (in
which all the character codes fit in one byte), but if you change the
comments around you can make it use Unicode instead.
--
Richard
_______________________________________________________________
Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss