[Snowball-discuss] probable bug in English stemmer

Andrew Aksyonoff shodan at shodan.ru
Fri Feb 4 23:15:56 GMT 2011


Hello all,

hope this mailing list is still alive and kicking after 10 years :)

I've been bringing my rusty English Porter stemmer implementation
up to date with the current state of Snowball and noticed this
discrepancy between the description and libstemmer C library
behaviour. 

These words, all following the same -(t|s)ion -(ality|alism) pattern:

   disproportionality
   unconventionality
   irrationality
   exceptionalism
   sensationalism

stem both (!) suffixes. For instance "exceptionalism" reduces to
"except" with the current version of C library.

According to algorithm description (and posted Snowball source),
-alism should reduce to -al in Step 2, and then Step 4 should reduce
*either* -al or -(t|s)ion suffix, but not both.

Is that an actual bug, or am I just misinterpreting Step 4?

Thanks.

-- 
Best regards,
 Andrew                          mailto:shodan at shodan.ru




More information about the Snowball-discuss mailing list