[Snowball-discuss] Minor mistakes in the english vocabulary

Robert Hafner tedivm at tedivm.com
Sat Jan 21 10:30:00 GMT 2012



I have a working PHP Port of the english one here if you'd like-

http://mortar.googlecode.com/svn/trunk/modules/Graffiti/classes/Stemmers/English.class.php





On Jan 20, 2012, at 9:15 AM, Jannik Zschiesche wrote:

> Hello everyone,
> 
> I am working on a PHP implementation of the stemmer algorithms for english, german and spanish (as soon as I am done, I will host it on github for everyone to use).
> 
> While testing the implementation against the english vocabulary I found some - what I think - mistakes. Please correct me, if I am wrong.
> (I used Porter2: http://snowball.tartarus.org/algorithms/english/stemmer.html)
> 
> In the vocabulary, there are the following transformations (and some more, but I don't want to flood you):
> 
> 
> 
> 1. skies -> sky
> R1: ""
> R2: ""
> 
> Step 1a: replace "ies" with "i" -> ski
> => ski
> 
> 
> 
> 2. sky -> sky
> R1: ""
> R2: ""
> 
> Step 1c: replace "y" with "i" -> ski
> => ski
> 
> 
> 
> 3. succeed -> succeed
> R1: "ceed"
> R2: ""
> 
> Step 1b: replace "eed" with "ee" -> succee
> Step 5: replace "ee" with "e" -> succe
> => succe
> 
> 
> 
> 4. succeeds -> succeed
> R1: "ceed"
> R2: ""
> 
> Step 1a: remove "s" -> succeed
> -> see "succeed"
> 
> 
> 
> 5. tying -> tie
> R1: "g"
> R2: ""
> 
> Step 1b: remove "ing" -> ty
> => ty
> 
> 
> 
> 6. ugly -> ugli
> R1: "ly"
> R2: ""
> 
> Step 1c: replace "y" with "i" -> ugli
> => ugli
> 
> 
> Did I miss anything within the rules?
> 
> 
> 
> 
> Kind Regards
> Jannik Zschiesche
> 
> 
> PS: I hope you won't receive this message twice, because I already sent it without being a subscriber.
> 
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20120121/dbad835c/attachment.htm>


More information about the Snowball-discuss mailing list