[Snowball-discuss] Dutch stemmers- "heden' and "rheden"
ranapratap.syamala at thomson.com
ranapratap.syamala at thomson.com
Mon Mar 31 17:15:10 BST 2008
Hi,
I was looking at the Dutch stemmer and came across a couple of terms
from the sample vocabulary that was provided on the website
(http://snowball.tartarus.org/algorithms/dutch/diffs.txt) that are
stemming to themselves
"heden" and "rheden".
But when I looked at the rules, it seems like Step1(b) should be
enforced and the words should be stemmed to "hed" and "rhed"
respectively.
h e d e n
|<---->| R1 (satisfies the R1 adjustment for German
stemmer that the region before R1 should contain atleast 3 letters)
According to Step1(b),
(b) en ene
delete if in R1 and preceded by a valid en-ending, and then undouble the
ending
(valid en-ending: Define a valid en-ending as a non-vowel, and not gem)
According to this rule, the "en" suffix should be deleted from the term
since it is present with in R1 and has a valid en-ending and stem to
"hed"
Similarly
r h e d e n
|<----->| R1 (satisfies the R1 adjustment for
German stemmer that the region before R1 should contain atleast 3
letters)
should be stemmed to "rhed"
I am just wondering if there is something that I am missing or am I
misinterpreting the rule??
Thanks
Rana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20080331/b7e17c09/attachment.htm
More information about the Snowball-discuss
mailing list