[Snowball-discuss] Minor mistakes in the english vocabulary
Robert Hafner
tedivm at tedivm.com
Sat Jan 21 10:30:00 GMT 2012
I have a working PHP Port of the english one here if you'd like-
http://mortar.googlecode.com/svn/trunk/modules/Graffiti/classes/Stemmers/English.class.php
On Jan 20, 2012, at 9:15 AM, Jannik Zschiesche wrote:
> Hello everyone,
>
> I am working on a PHP implementation of the stemmer algorithms for english, german and spanish (as soon as I am done, I will host it on github for everyone to use).
>
> While testing the implementation against the english vocabulary I found some - what I think - mistakes. Please correct me, if I am wrong.
> (I used Porter2: http://snowball.tartarus.org/algorithms/english/stemmer.html)
>
> In the vocabulary, there are the following transformations (and some more, but I don't want to flood you):
>
>
>
> 1. skies -> sky
> R1: ""
> R2: ""
>
> Step 1a: replace "ies" with "i" -> ski
> => ski
>
>
>
> 2. sky -> sky
> R1: ""
> R2: ""
>
> Step 1c: replace "y" with "i" -> ski
> => ski
>
>
>
> 3. succeed -> succeed
> R1: "ceed"
> R2: ""
>
> Step 1b: replace "eed" with "ee" -> succee
> Step 5: replace "ee" with "e" -> succe
> => succe
>
>
>
> 4. succeeds -> succeed
> R1: "ceed"
> R2: ""
>
> Step 1a: remove "s" -> succeed
> -> see "succeed"
>
>
>
> 5. tying -> tie
> R1: "g"
> R2: ""
>
> Step 1b: remove "ing" -> ty
> => ty
>
>
>
> 6. ugly -> ugli
> R1: "ly"
> R2: ""
>
> Step 1c: replace "y" with "i" -> ugli
> => ugli
>
>
> Did I miss anything within the rules?
>
>
>
>
> Kind Regards
> Jannik Zschiesche
>
>
> PS: I hope you won't receive this message twice, because I already sent it without being a subscriber.
>
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20120121/dbad835c/attachment.htm>
More information about the Snowball-discuss
mailing list