[Snowball-discuss] The Danish stemmer

Michael Kellberg michael at kellberg.com
Mon Jul 13 22:03:32 BST 2009


Hi

I'm trying to implement the Danish stemmer in C# for a project I'm working
on. I've got it working and it gets 99% of the words right (according to
http://snowball.tartarus.org/algorithms/danish/diffs.txt). My question
refers to the last 1% since I can’t seem to find any bugs in my
implementation.

Of the first 1000 words I get these errors
Word/My stemmer/diffs.txt
adelig/ad/ade
ageren/ag/ager
ahers/ah/aher
alene/al/alen
alens/al/alen
alerne/al/alern
alernes/al/alern
andt/and/andt

If you look at the first word "adelig" I can’t see how that becomes "ade"
with this stemmer since it says in step 3:
If the word ends *igst*, remove the final *st* Search for the longest among
the following suffixes in *R*1, and perform the action indicated.

(*a*) *ig   lig   elig   els* delete, and then repeat step 2 so that would
remove elig and the stem would become "ad" and not "ade". Is there something
I've overlooked or is it an error in the algorithm or diffs.txt?


Regards

Michael Kellberg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20090713/c28fc9be/attachment.htm 


More information about the Snowball-discuss mailing list