[Snowball-discuss] probable bug in English stemmer
Andrew Aksyonoff
shodan at shodan.ru
Fri Feb 4 23:15:56 GMT 2011
Hello all,
hope this mailing list is still alive and kicking after 10 years :)
I've been bringing my rusty English Porter stemmer implementation
up to date with the current state of Snowball and noticed this
discrepancy between the description and libstemmer C library
behaviour.
These words, all following the same -(t|s)ion -(ality|alism) pattern:
disproportionality
unconventionality
irrationality
exceptionalism
sensationalism
stem both (!) suffixes. For instance "exceptionalism" reduces to
"except" with the current version of C library.
According to algorithm description (and posted Snowball source),
-alism should reduce to -al in Step 2, and then Step 4 should reduce
*either* -al or -(t|s)ion suffix, but not both.
Is that an actual bug, or am I just misinterpreting Step 4?
Thanks.
--
Best regards,
Andrew mailto:shodan at shodan.ru
More information about the Snowball-discuss
mailing list