[Snowball-discuss] The Danish stemmer
Michael Kellberg
michael at kellberg.com
Mon Jul 13 22:03:32 BST 2009
Hi
I'm trying to implement the Danish stemmer in C# for a project I'm working
on. I've got it working and it gets 99% of the words right (according to
http://snowball.tartarus.org/algorithms/danish/diffs.txt). My question
refers to the last 1% since I can’t seem to find any bugs in my
implementation.
Of the first 1000 words I get these errors
Word/My stemmer/diffs.txt
adelig/ad/ade
ageren/ag/ager
ahers/ah/aher
alene/al/alen
alens/al/alen
alerne/al/alern
alernes/al/alern
andt/and/andt
If you look at the first word "adelig" I can’t see how that becomes "ade"
with this stemmer since it says in step 3:
If the word ends *igst*, remove the final *st* Search for the longest among
the following suffixes in *R*1, and perform the action indicated.
(*a*) *ig lig elig els* delete, and then repeat step 2 so that would
remove elig and the stem would become "ad" and not "ade". Is there something
I've overlooked or is it an error in the algorithm or diffs.txt?
Regards
Michael Kellberg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20090713/c28fc9be/attachment.htm
More information about the Snowball-discuss
mailing list