[Snowball-discuss] Norwegian stemmer question

Olly Betts olly at survex.com
Wed Aug 20 20:46:11 BST 2025


On Sun, Aug 17, 2025 at 07:24:30AM +0100, Martin Porter wrote:
> Have you received a reply to your query? I'm afraid it has been in my spam
> folder since July 26, and I have only just noticed it.

I also failed to spot the email (I've been away and not entirely keeping
on top of my email) but Blake also opened an issue and this got resolved
via that:

https://github.com/snowballstem/snowball/issues/249

> Like you I'm puzzled, but not in quite the same way. The rule for -ers goes
> 
> (*i*) *giv   hav   skap* delete *ers* suffix (*ii*) *amm   ast   ind   kap
>   kk   lt   nk   omm   pp   v   øst* do nothing (*iii*) if none of these
> suffixes are present delete *ers* suffix

You're looking at the description after I fixed it (thanks for Blake's
ticket).  Previously the (iii) case was effectively missing from the
description.

> (If I've understood correctly) the tests in (i) seem to be irrelevant,
> since ers will get deleted anyway by (iii). 'none' in (iii) must refer
> to the list in (ii). balders -> bald by (iii).  The snowball code
> reflects this exactly but 'giv' 'hap' 'skap' seem again to be
> irrelevant.

The key point is that suffixes in (i) serve as exceptions for suffixes
in (ii): 'giv' is an exception for 'v', 'hap' and 'skap' are exceptions
for 'p'.  If (i) was removed all the words it matched for would instead
match (ii) not (iii).

E.g. in general we want to leave suffix -vers alone, but "arbeidsgivers"
should have -ers removed to conflate with related words.

I thought I'd restructured the rules to be clear enough, but obviously
not so I'll add a note to clarify this point.

> All this is a development from my original Norwegian stemmer at
> http://snowball.tartarus.org/algorithms/norwegian/stemmer.html , presumably
> to get a more delicate test for removal of -ers.

Yes, there were a couple of cases improved - one was very short words
ending -ers where the suffix wasn't in R1 which would get -s removed
because that suffix was in R1; the other was longer words where removing
-ers was too aggressive.  The discussion is here (though we tried a few
things before settling on the current solution):

https://github.com/snowballstem/snowball/issues/175

Cheers,
    Olly



More information about the Snowball-discuss mailing list