[Snowball-discuss] Finnish stemmer diff file
Blake Madden
madden_blake@hotmail.com
Wed May 12 19:00:02 2004
In the Finnish stemmer's diff file (the text file that shows a list of
Finnish words and respective stemmed equivalents), there are a few entries
that have uppercased 'Ä's in them. This can be somewhat confusing given
that the stemmers are meant to only work with lowercased text. Here is one
example:
edelliseltÄ edelliseltÄ
edelliseltä edellis
This gives the impression that there is something special about 'Ä', like it
is a special consonant. It looks here like "edelliseltÄ" and "edelliseltä"
are entirely different words. However, this is not exactly the case. In
reality, "edelliseltÄ" was not stemmed correctly because it was not
lowercased first. Like I said, this could just be a little confusing.
Thanks,
Blake
_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now!
http://toolbar.msn.com/go/onm00200415ave/direct/01/