[Snowball-discuss] Dutch stemmers- "heden' and "rheden"
Olly Betts
olly at survex.com
Tue Apr 1 13:10:17 BST 2008
On Mon, Mar 31, 2008 at 11:15:10AM -0500, ranapratap.syamala at thomson.com wrote:
> But when I looked at the rules, it seems like Step1(b) should be
> enforced and the words should be stemmed to "hed" and "rhed"
> respectively.
>
> h e d e n
> |<---->| R1 (satisfies the R1 adjustment for German
> stemmer that the region before R1 should contain atleast 3 letters)
>
> According to Step1(b),
>
> (b) en ene
> delete if in R1 and preceded by a valid en-ending, and then undouble the
> ending
I think the subtlety you're missing is that a false condition in the
action for a suffix doesn't invalidate a suffix as being the longest.
Step 1:
Search for the longest among the following suffixes, and perform
the action indicated
(a) heden
replace with heid if in R1
(b) en ene
delete if in R1 and preceded by a valid en-ending,
and then undouble the ending
(c) s se
delete if in R1 and preceded by a valid s-ending
The longest among the suffixes is "heden", so we perform the action
"replace with heid if in R1". This action has no effect on the word
since "heden" isn't in R1. And that's it for step 1.
Cheers,
Olly
More information about the Snowball-discuss
mailing list