[Snowball-discuss] evaluation of Snowball stemmers

Martin Porter martin.porter at grapeshot.co.uk
Fri Dec 10 22:26:20 GMT 2004


Fred,

Do you mean you got a 29%/56% average precision improvement when you
switched stemming off? Anything is possible, but this does surprise me: I
would have expected Russian, with its highly (and regularly) inflected
vocabulary to do quite well under stemming.

If you look at the paper at 

http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/16.pdf

(Mono- and Crosslingual Retrieval Experiments at the University of
Hildesheim  - René Hackl, Thomas Mandl and  Christa Womser-Hacker) the
evidence, for Finnish, points the other way ("the snowball stemmer works
very well"). Their Russian experiments were not unfortunately taken to
conclusion, but I feel much more confidence myself in the snowball Russian
stemmer than the snowball Finnish stemmer.

On the other hand I have had verbal notice (which I did not entirely trust!)
of the Finnish stemmer doing badly in some other tests.

I should point out that although the version of the stemmer you picked up
works for KOI-8, Snowball is designed to make switching to other character
codes as easy as possible. See the notes at

http://snowball.tartarus.org/codesets/guide.html

Martin

 






More information about the Snowball-discuss mailing list