[Snowball-discuss] Another Porter2 question: "able" sample seems wrong

Martin Holmes mholmes at uvic.ca
Tue Jun 4 18:01:50 BST 2019


Hi there,

I'm implementing Porter2 according to the description here:

<https://snowballstem.org/algorithms/english/stemmer.html>

and I've hit an instance in the sample test data where my process 
generates different output. The test word is "able", and my process 
generates "able" (unchanged), while the test data shows "abl".

I believe the difference comes in Step 5, with this instruction:

e  :  delete if in R2, or in R1 and not preceded by a short syllable

In the case of this word, R1 is "le", so the "e" is in R1. But it is 
preceded by a short syllable; "ab" matches definition (b):

"Define a short syllable in a word as either (a) a vowel followed by a 
non-vowel other than w, x or Y and preceded by a non-vowel, or * (b) a 
vowel at the beginning of the word followed by a non-vowel."

So it seems to me that the "e" should not be deleted.

Am I missing something here? Should I be reading "preceded by" to mean 
"_immediately_ preceded by", meaning that the intervening "l" should 
change the condition, triggering the deletion of "e"?

All help appreciated,
Martin




More information about the Snowball-discuss mailing list