[Snowball-discuss] The Norwegian stemmer algorithm

Ask Solem Hoel ask@gan.no
Wed, 28 Nov 2001 22:29:44 +0100


Hi Martin,
	thank you for the quick response.
Because of your @softhome.net e-mail, the spamfilter I use
(http://junkfilter.zer0.org), moved it right to my junkmail mailbox
so I didn't see it until now :(

On Tue, Nov 27, 2001 at 08:14:14AM -0700, Martin Porter sent:

> Thanks, Ask, that is most interesting. I think it would be useful eventua=
lly
> to have a collection of links to resources from the Snowball site. Could =
we
> put your version in?=20

Sure!
So far the norwegian version is here:
http://www.unixmonks.net/~ask/Stemmer-Norwegian-0.5.tar.gz
This one works perfectly with the norwegian diffs.txt from=20
snowball.sourceforge.net

But as Oleg said, we need to agree on a namespace and a interface
for perl ported snowball stemmers.

My co-worker here is also porting it to Java.

> I'm sorry about that. 3.1 is part of an old numbering scheme which I thou=
ght
> I'd eliminated. I'll fix it. Go to the porter stemmer for the definition =
of
> R1 and R2, although I guess you must know what the definiton is.

Thanks!

>=20
> Mmmm - I think no-one is reading the Snowball manual :-) . It sets p1 to =
3
> if is less than 3. So p1 is (a) after the first non-vowel following a vow=
el,
> or (b) after the 3rd letter, whichever position is further right. Basical=
ly,
> 2 letters is too little for a residual stem in German, and I think Norweg=
ian.

Ok, that sorted it out.=20
Now I've also printed and studied the snowball manual :)

> Any observations on the stemmer would be useful - I know little about
> Norwegian. Is a stemmer for Nynorsk of any importance?

We need nynorsk for the project we're working on right now, so somehow
we must come up with the algorithm.

But as far as I can tell, this algorithm already takes a lot of nynorsk,
because -ar, -ande, -ast, -ane, -eleg, -eig and -leg is not "bokm=E5l" but
nynorsk.

> Incidentally, how did you come across snowball? It is widely known as yet=
.

Well, I'm working on an XML-content indexer
(http://www.unixmonks.net/xanonton+xiri) and needed a
stemmer, and my coworker pointed me to snowball.sourceforge.net.

--=20
/ Ask Solem Hoel        | GAN Media             \
: +47 48054613          | +47 22707439          :
\ www.unixmonks.net     | www.gan.no/media      /

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss

_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________