[Snowball-discuss] Minor mistakes in the english vocabulary
Jannik Zschiesche
hello at apfelbox.net
Fri Jan 20 17:15:58 GMT 2012
Hello everyone,
I am working on a PHP implementation of the stemmer algorithms for english, german and spanish (as soon as I am done, I will host it on github for everyone to use).
While testing the implementation against the english vocabulary I found some - what I think - mistakes. Please correct me, if I am wrong.
(I used Porter2: http://snowball.tartarus.org/algorithms/english/stemmer.html)
In the vocabulary, there are the following transformations (and some more, but I don't want to flood you):
1. skies -> sky
R1: ""
R2: ""
Step 1a: replace "ies" with "i" -> ski
=> ski
2. sky -> sky
R1: ""
R2: ""
Step 1c: replace "y" with "i" -> ski
=> ski
3. succeed -> succeed
R1: "ceed"
R2: ""
Step 1b: replace "eed" with "ee" -> succee
Step 5: replace "ee" with "e" -> succe
=> succe
4. succeeds -> succeed
R1: "ceed"
R2: ""
Step 1a: remove "s" -> succeed
-> see "succeed"
5. tying -> tie
R1: "g"
R2: ""
Step 1b: remove "ing" -> ty
=> ty
6. ugly -> ugli
R1: "ly"
R2: ""
Step 1c: replace "y" with "i" -> ugli
=> ugli
Did I miss anything within the rules?
Kind Regards
Jannik Zschiesche
PS: I hope you won't receive this message twice, because I already sent it without being a subscriber.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20120120/b96376c9/attachment.htm>
More information about the Snowball-discuss
mailing list