[Snowball-discuss] Porter2 algorithm question

Martin Porter martin.f.porter at gmail.com
Fri May 31 08:08:33 BST 2019


See the first among the list of "common errors" in the paragraph
headed "common errors" in

https://tartarus.org/martin/PorterStemmer/

Only one rule is applied for these lists of endings, irrespective of
whether it results in a suffix removal or not. The order of the
suffixes in these lists is effectively random, although there is often
some plan to them. For example the list

ational tional enci anci izer ....

is by alphabetic order of the last letter but one, -a-, -a-, -c-, -c-,
-e-, ... to emphasize the suggestion, "the test for the string S1 can
be made fast by doing a program switch on the penultimate letter of
the word being tested". See the appearance of this phrase in

http://snowball.tartarus.org/algorithms/porter/stemmer.html

-- Martin



More information about the Snowball-discuss mailing list