[Snowball-discuss] Porter2 algorithm question

Martin Holmes mholmes at uvic.ca
Fri May 31 17:25:43 BST 2019


Thanks Martin! That's really helpful. I was confused by starting from 
the Snowball (Porter2) stemmer without realizing that this logic is laid 
out in the original stemmer, and I guess is to some extent assumed in 
the description of Porter2 that I was using:

<http://snowball.tartarus.org/algorithms/english/stemmer.html>

Cheers,
Martin

On 2019-05-31 12:08 a.m., Martin Porter wrote:
> See the first among the list of "common errors" in the paragraph
> headed "common errors" in
> 
> https://tartarus.org/martin/PorterStemmer/
> 
> Only one rule is applied for these lists of endings, irrespective of
> whether it results in a suffix removal or not. The order of the
> suffixes in these lists is effectively random, although there is often
> some plan to them. For example the list
> 
> ational tional enci anci izer ....
> 
> is by alphabetic order of the last letter but one, -a-, -a-, -c-, -c-,
> -e-, ... to emphasize the suggestion, "the test for the string S1 can
> be made fast by doing a program switch on the penultimate letter of
> the word being tested". See the appearance of this phrase in
> 
> http://snowball.tartarus.org/algorithms/porter/stemmer.html
> 
> -- Martin
> 

-- 
------------------------------------------
Martin Holmes
UVic Humanities Computing and Media Centre



More information about the Snowball-discuss mailing list