[Snowball-discuss] Norwegian stemmer question
Blake Madden
madindayton at outlook.com
Sat Jul 26 14:50:27 BST 2025
Hello,
I was trying to understand the recent changes to the Norwegian stemmer, in particular step 1 for "ers". The rule states:
(b) ers
find the longest suffix preceding ers, and perform the action indicated.
(i> amm ast ind kap kk lt nk omm pp v øst
do nothing
(ii> giv hav skap
delete ers suffix
Something I'm confused by is that "balders" gets stemmed to "bald", according to the output files. Why is the "ers" removed in this case? It isn't proceeded by "giv", "hav", or "skap", so it shouldn't be deleted. And nothing in step 2 or 3 is looking at "ers", so it shouldn't be getting removed there.
Also, I'm confused by where it says for "amm ast ind kap kk lt nk omm pp v øst" to "do nothing"? Why are these explicitly mentioned? If it isn't "giv hav skap", then nothing should happen anyway, right? If "ers" is proceeded by "bald", I would expect for it to not delete anything.
I tried looking at the Snowball code and I noticed this:
'giv' 'hav' 'skap' ''
(delete)
There is a blank '' included in the list of values in front of the suffix that would trigger a delete. What does that imply, it's not explained in the docs.
Thank you for any clarification,
Blake
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/pipermail/snowball-discuss/attachments/20250726/61d88aff/attachment.htm>
More information about the Snowball-discuss
mailing list