[Snowball-discuss] Dutch stemmers- "heden' and "rheden"

Olly Betts olly at survex.com
Tue Apr 1 13:10:17 BST 2008


On Mon, Mar 31, 2008 at 11:15:10AM -0500, ranapratap.syamala at thomson.com wrote:
> But when I looked at the rules, it seems like Step1(b) should be
> enforced and the words should be stemmed to "hed" and "rhed"
> respectively.
>  
> h    e    d    e    n
>                  |<---->|  R1 (satisfies the R1 adjustment for German
> stemmer that the region before R1 should contain atleast 3 letters)
>  
> According to Step1(b), 
>   
> (b) en   ene 
> delete if in R1 and preceded by a valid en-ending, and then undouble the
> ending 

I think the subtlety you're missing is that a false condition in the
action for a suffix doesn't invalidate a suffix as being the longest.

    Step 1:

        Search for the longest among the following suffixes, and perform
	the action indicated

	    (a) heden
		replace with heid if in R1

	    (b) en   ene
		delete if in R1 and preceded by a valid en-ending,
		and then undouble the ending

	    (c) s   se
		delete if in R1 and preceded by a valid s-ending 

The longest among the suffixes is "heden", so we perform the action
"replace with heid if in R1".  This action has no effect on the word
since "heden" isn't in R1.  And that's it for step 1.

Cheers,
    Olly



More information about the Snowball-discuss mailing list