[Snowball-discuss] probable bug in English stemmer

Martin Porter martin at porterloo.wanadoo.co.uk
Sun Feb 6 15:19:44 GMT 2011


Dear Andrew,

It is very nice to hear from you again after so many years! Yes, the
snowball-discuss group goes on as ever, though now I am 66 I'm slowing down
somewhat. Richard Boulton of course is still very young.

I think this is just a difference between the old Porter stemmer, and the
Porter2/English stemmer.

The Porter stemmer stems exceptionalism to exception, and exception to
except. The Porter2 stemmer stems both exceptionalism and exception to except.

In detail, Porter2 stems

exceptionalism to exceptional (step 2)
exceptional to exception (step 3)
exception to except (step 4)

see http://snowball.tartarus.org/algorithms/english/stemmer.html

Perhaps exception should not stem to except (so "exceptional" might be an
exceptional word!) but that is a separate matter.

Similarly the other words you mention. I hope I've got this right: like you,
I am now a bit rusty on the English stemmer.

Martin

 
At 02:15 AM 2/5/2011 +0300, Andrew Aksyonoff wrote:
>
>Hello all,
>
>hope this mailing list is still alive and kicking after 10 years :)
>
>I've been bringing my rusty English Porter stemmer implementation
>up to date with the current state of Snowball and noticed this
>discrepancy between the description and libstemmer C library
>behaviour. 
>
>These words, all following the same -(t|s)ion -(ality|alism) pattern:
>
>   disproportionality
>   unconventionality
>   irrationality
>   exceptionalism
>   sensationalism
>
>stem both (!) suffixes. For instance "exceptionalism" reduces to
>"except" with the current version of C library.
>
>According to algorithm description (and posted Snowball source),
>-alism should reduce to -al in Step 2, and then Step 4 should reduce
>*either* -al or -(t|s)ion suffix, but not both.
>
>Is that an actual bug, or am I just misinterpreting Step 4?
>
>Thanks.
>
>-- 
>Best regards,
> Andrew                          mailto:shodan at shodan.ru






More information about the Snowball-discuss mailing list