[Snowball-discuss] Another Porter2 question: "able" sample seems wrong
Martin Holmes
mholmes at uvic.ca
Tue Jun 4 18:01:50 BST 2019
Hi there,
I'm implementing Porter2 according to the description here:
<https://snowballstem.org/algorithms/english/stemmer.html>
and I've hit an instance in the sample test data where my process
generates different output. The test word is "able", and my process
generates "able" (unchanged), while the test data shows "abl".
I believe the difference comes in Step 5, with this instruction:
e : delete if in R2, or in R1 and not preceded by a short syllable
In the case of this word, R1 is "le", so the "e" is in R1. But it is
preceded by a short syllable; "ab" matches definition (b):
"Define a short syllable in a word as either (a) a vowel followed by a
non-vowel other than w, x or Y and preceded by a non-vowel, or * (b) a
vowel at the beginning of the word followed by a non-vowel."
So it seems to me that the "e" should not be deleted.
Am I missing something here? Should I be reading "preceded by" to mean
"_immediately_ preceded by", meaning that the intervening "l" should
change the condition, triggering the deletion of "e"?
All help appreciated,
Martin
More information about the Snowball-discuss
mailing list