[Snowball-discuss] evaluation of Snowball stemmers

Martin Porter martin.porter at grapeshot.co.uk
Fri Dec 10 08:04:58 GMT 2004


Diana,

I have not carefully monitored the use of the stemmers in evaluation work,
although I think it is fairly extensive. (Of course the stemmers are often
used in IR experiments even when stemming itself is not the subject of
evaluation.) But see this paper: 


Stephen Tomlinson (2003) Lexical and algorithmic stemming compared for 9
European languages with Hummingbird SearchServer(TM) at CLEF 2003. In Carol
Peters, editor, Working notes for the CLEF 2003 Workshop 21-22 August,
Trondheim, Norway.

http://www.stephent.com/ir/papers/clef03.html 


Tomlinson (2003) compares the Snowball stemmers with a commercial lexical
stemming (lemmatization) system. Of the nine languages tested, six gave
differences that were not statistically significant, two did better under
the lemmatization system, and one better under Snowball - I think I got that
right: you can verify it by looking at the paper.  

Given the simplicity and cheapness of the Snowball stemmers compared with a
full lemmatization system I think this is a good result for Snowball. 

Unfortunately I have not been able to find out much about the Hummingbird
system, either from Tomlinson's paper or elsewhere.

Martin






More information about the Snowball-discuss mailing list