[Snowball-discuss] snowball and combining characters

Jason Spashett jason at spashett.com
Fri Mar 30 18:19:02 BST 2012


Hello,

I have some Snowball questions that cannot find and answer to (without, 
perhaps, looking through the source). If someone could help answer these 
questions it would be appreciated.

What is the situation as regards combining characters vs pre-composed in 
unicode?

For example:

HEBREW LETTER BET WITH DAGESH

pre-composed FB30
using combiners 05D0 05BC

Does Snowball recognise these as the same character? I assume not. It is 
also fair to say that Snowball will count these differently.
1 'slot' in the pre-composed case and 2 slots in the combiner case?

If this is so, then I assume that the way to proceed might be to convert 
any combiner representation into the pre-composed form before using 
Snowball?

N.B. I am looking at stemming Yiddish, rather than Hebrew

Regards,

Jason.





More information about the Snowball-discuss mailing list