[Snowball-discuss] Porter2 algorithm question
Martin Porter
martin.f.porter at gmail.com
Fri May 31 08:08:33 BST 2019
See the first among the list of "common errors" in the paragraph
headed "common errors" in
https://tartarus.org/martin/PorterStemmer/
Only one rule is applied for these lists of endings, irrespective of
whether it results in a suffix removal or not. The order of the
suffixes in these lists is effectively random, although there is often
some plan to them. For example the list
ational tional enci anci izer ....
is by alphabetic order of the last letter but one, -a-, -a-, -c-, -c-,
-e-, ... to emphasize the suggestion, "the test for the string S1 can
be made fast by doing a program switch on the penultimate letter of
the word being tested". See the appearance of this phrase in
http://snowball.tartarus.org/algorithms/porter/stemmer.html
-- Martin
More information about the Snowball-discuss
mailing list