[Snowball-discuss] Finnish stemmer diff file

Blake Madden madden_blake@hotmail.com
Wed May 12 19:00:02 2004


In the Finnish stemmer's diff file (the text file that shows a list of 
Finnish words and respective stemmed equivalents), there are a few entries 
that have uppercased 'Ä's in them.  This can be somewhat confusing given 
that the stemmers are meant to only work with lowercased text.  Here is one 
example:

edelliseltÄ                   edelliseltÄ
edelliseltä                   edellis

This gives the impression that there is something special about 'Ä', like it 
is a special consonant.  It looks here like "edelliseltÄ" and "edelliseltä" 
are entirely different words.  However, this is not exactly the case.  In 
reality, "edelliseltÄ" was not stemmed correctly because it was not 
lowercased first.  Like I said, this could just be a little confusing.

Thanks,
Blake

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now! 
http://toolbar.msn.com/go/onm00200415ave/direct/01/