[Snowball-discuss] Another Porter2 question: "able" sample seems wrong

Martin Holmes mholmes at uvic.ca
Sun Jun 30 18:07:42 BST 2019


Hi Martin,

I've hit another issue with exactly this point in the algorithm. The 
word "knife" (to take one example) survives unchanged as far as Step 5. 
Then we have the instruction:

e  :  delete if in R2, or in R1 and not preceded by a short syllable

R1 for knife is position 5 (e), and R2 is position 6 (empty).

So:

  - the final e is not in R2; but

  - The final e is in R1, so the second condition applies:

If e is preceded by a short syllable, then it should be deleted.

What precedes the e is "knif". This seems to me to match the definition 
of a short syllable:

"Define a short syllable in a word as either (a) a vowel followed by a 
non-vowel other than w, x or Y and preceded by a non-vowel, or * (b) a 
vowel at the beginning of the word followed by a non-vowel."

Definition a) seems to apply: a vowel (i) followed by a non-vowel other 
than w, x or Y (f) and preceded by a non-vowel (n).

Therefore the e should be deleted. But in the test data it isn't.

What am I missing?

All help appreciated,
Martin


On 2019-06-04 12:36 p.m., Martin Porter wrote:
> yes, I think your last point is effectively the right way of looking
> at it. 'e' is in R1, but 'e' itself is preceded by (immediately
> preceded by) 'abl', which is not a short syllable.
> 
> The idea of all this is to distinguish (for example), ap and ape, so
> ap and aps stem to ap, and ape, apes, aping, aped stem to ape. It is a
> distinction for the "short syllable" case.
> 
> (Incidentally, I am somewhat quoting from memory here, not having
> access to my Linux machine, but I think I've got this right.)
> 
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> https://lists.tartarus.org/mailman/listinfo/snowball-discuss
> 




More information about the Snowball-discuss mailing list