[Snowball-discuss] Problems with step 5 in the Porter2 algorithm
Håvard Lindset
lindset@webpixels.net
Sat Oct 18 13:23:02 2003
Hi all,
This isn't really a question about Snowball, but a question about Step 5 in
the Porter2 algorithm. (I'm writing a stemmer in PHP)
"e
delete if in R2, or in R1 and not preceded by a short syllable"
Should I check just DIRECTLY in front of the ending e, or shouldn't there be
ANY short syllables at all in the word before the ending e?
If anyone could clarify when to remove the e, it would be mostly appreciated
:) Right now I'm finding that I'm either removing too many e's or I'm
removing too few e's
I'm using Perl Compatible Regular Expressions for most of the stemmer stuff,
so if any of you have a PCRE pattern that does what I want to, I'd love to
see it :)
Thanks!
Best regards,
Håvard Lindset