[Snowball-discuss] The Norwegian stemmer algorithm

Ask Solem Hoel ask@gan.no
Wed, 28 Nov 2001 22:29:44 +0100

Hi Martin,
	thank you for the quick response.
Because of your @softhome.net e-mail, the spamfilter I use
(http://junkfilter.zer0.org), moved it right to my junkmail mailbox
so I didn't see it until now :(

On Tue, Nov 27, 2001 at 08:14:14AM -0700, Martin Porter sent:

> Thanks, Ask, that is most interesting. I think it would be useful eventua=
> to have a collection of links to resources from the Snowball site. Could =
> put your version in?=20

So far the norwegian version is here:
This one works perfectly with the norwegian diffs.txt from=20

But as Oleg said, we need to agree on a namespace and a interface
for perl ported snowball stemmers.

My co-worker here is also porting it to Java.

> I'm sorry about that. 3.1 is part of an old numbering scheme which I thou=
> I'd eliminated. I'll fix it. Go to the porter stemmer for the definition =
> R1 and R2, although I guess you must know what the definiton is.


> Mmmm - I think no-one is reading the Snowball manual :-) . It sets p1 to =
> if is less than 3. So p1 is (a) after the first non-vowel following a vow=
> or (b) after the 3rd letter, whichever position is further right. Basical=
> 2 letters is too little for a residual stem in German, and I think Norweg=

Ok, that sorted it out.=20
Now I've also printed and studied the snowball manual :)

> Any observations on the stemmer would be useful - I know little about
> Norwegian. Is a stemmer for Nynorsk of any importance?

We need nynorsk for the project we're working on right now, so somehow
we must come up with the algorithm.

But as far as I can tell, this algorithm already takes a lot of nynorsk,
because -ar, -ande, -ast, -ane, -eleg, -eig and -leg is not "bokm=E5l" but

> Incidentally, how did you come across snowball? It is widely known as yet=

Well, I'm working on an XML-content indexer
(http://www.unixmonks.net/xanonton+xiri) and needed a
stemmer, and my coworker pointed me to snowball.sourceforge.net.

/ Ask Solem Hoel        | GAN Media             \
: +47 48054613          | +47 22707439          :
\ www.unixmonks.net     | www.gan.no/media      /

Snowball-discuss mailing list

VirusChecked by the Incepta Group plc