[Snowball-discuss] Possible bug in Porter Stemmer

Marcel Daneck marcel.daneck at miosoft.de
Tue Oct 21 13:59:31 BST 2014


Hello,

I might have found a bug in the porter stemmer for english. (http://snowball.tartarus.org/algorithms/porter/stemmer.html)
In the example list of words (http://snowball.tartarus.org/algorithms/porter/diffs.txt) the word "agreement" stays "agreement" after stemming.

But step 4 says that if R2 ends with "ent" the "ent" should be deleted (and the Snowball code does so).
The region 2 for "agreement" is "ent", so it should be deleted and the resulting stem should be "agreem".

There are rules for "ment" and "ement" which could hit too. But the prefix for "ment" would be "agree" which has m=1 and not m>1. (Same for "ement")

It would be very nice if you could check this. Maybe it is just and old list.

Kindest regards




Marcel Daneck | IT Consultant

tel +49 40 688 7461-27 | mobil +49 170 6575 755

mail Marcel.Daneck at miosoft.de<mailto:Marcel.Daneck at miosoft.de> | www. miosoft.de<http://www.miosoft.de>


MIOsoft_Deutschland_GmbH_|_Großer_Grasbrook_9_|_20457_Hamburg_

AG_Hamburg_HRB_128042_|_Geschäftsführer:_Dr.-Ing._Ernst_Siepmann


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20141021/384f36cb/attachment.html>


More information about the Snowball-discuss mailing list