[Snowball-discuss] Minor mistakes in the english vocabulary

Jannik Zschiesche hello at apfelbox.net
Fri Jan 20 17:15:58 GMT 2012


Hello everyone,

I am working on a PHP implementation of the stemmer algorithms for english, german and spanish (as soon as I am done, I will host it on github for everyone to use).

While testing the implementation against the english vocabulary I found some - what I think - mistakes. Please correct me, if I am wrong.
(I used Porter2: http://snowball.tartarus.org/algorithms/english/stemmer.html)

In the vocabulary, there are the following transformations (and some more, but I don't want to flood you):



1. skies -> sky
R1: ""
R2: ""

Step 1a: replace "ies" with "i" -> ski
=> ski



2. sky -> sky
R1: ""
R2: ""

Step 1c: replace "y" with "i" -> ski
=> ski



3. succeed -> succeed
R1: "ceed"
R2: ""

Step 1b: replace "eed" with "ee" -> succee
Step 5: replace "ee" with "e" -> succe
=> succe



4. succeeds -> succeed
R1: "ceed"
R2: ""

Step 1a: remove "s" -> succeed
-> see "succeed"



5. tying -> tie
R1: "g"
R2: ""

Step 1b: remove "ing" -> ty
=> ty



6. ugly -> ugli
R1: "ly"
R2: ""

Step 1c: replace "y" with "i" -> ugli
=> ugli


Did I miss anything within the rules?




Kind Regards
Jannik Zschiesche


PS: I hope you won't receive this message twice, because I already sent it without being a subscriber.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20120120/b96376c9/attachment.htm>


More information about the Snowball-discuss mailing list