[Snowball-discuss] Dutch stemmers- "heden' and "rheden"

ranapratap.syamala at thomson.com ranapratap.syamala at thomson.com
Mon Mar 31 17:15:10 BST 2008


Hi,
 
I was looking at the Dutch stemmer and came across a couple of terms
from the sample vocabulary that was provided on the website
(http://snowball.tartarus.org/algorithms/dutch/diffs.txt) that are
stemming to themselves
 
"heden" and "rheden".
 
But when I looked at the rules, it seems like Step1(b) should be
enforced and the words should be stemmed to "hed" and "rhed"
respectively.
 
h    e    d    e    n
                 |<---->|  R1 (satisfies the R1 adjustment for German
stemmer that the region before R1 should contain atleast 3 letters)
 
According to Step1(b), 
  
(b) en   ene 
delete if in R1 and preceded by a valid en-ending, and then undouble the
ending 
 
(valid en-ending: Define a valid en-ending as a non-vowel, and not gem)
 
According to this rule, the "en" suffix should be deleted from the term
since it is present with in R1 and has a valid en-ending and stem to
"hed"
 
Similarly 
 
r    h    e    d    e    n
                      |<----->|  R1 (satisfies the R1 adjustment for
German stemmer that the region before R1 should contain atleast 3
letters)
 
should be stemmed to "rhed"
 
I am just wondering if there is something that I am missing or am I
misinterpreting the rule??
 
Thanks
Rana
 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20080331/b7e17c09/attachment.htm 


More information about the Snowball-discuss mailing list