[Snowball-discuss] snowball and combining characters
Jason Spashett
jason at spashett.com
Fri Mar 30 18:19:02 BST 2012
Hello,
I have some Snowball questions that cannot find and answer to (without,
perhaps, looking through the source). If someone could help answer these
questions it would be appreciated.
What is the situation as regards combining characters vs pre-composed in
unicode?
For example:
HEBREW LETTER BET WITH DAGESH
pre-composed FB30
using combiners 05D0 05BC
Does Snowball recognise these as the same character? I assume not. It is
also fair to say that Snowball will count these differently.
1 'slot' in the pre-composed case and 2 slots in the combiner case?
If this is so, then I assume that the way to proceed might be to convert
any combiner representation into the pre-composed form before using
Snowball?
N.B. I am looking at stemming Yiddish, rather than Hebrew
Regards,
Jason.
More information about the Snowball-discuss
mailing list