[Snowball-discuss] Undoubling in Dutch stemmer

Edwin de Jonge ejne at rnd.vb.cbs.nl
Mon Dec 13 13:07:24 GMT 2004


Hi Martin,

I'll try to give the ratio of misses and hits for the sample vocabulary,
for my proposed changes.

I've looked at your transcription of the Kraaij Pohlmann stemmer and in
my opinion it does a good job in stemming the sample vocabulary, i.e. I
find the resulting list better than the result of the Dutch stemmer. You
are right that the k-p  version is more obscure than the dutch snowball
stemmer.

I've read the KP snowball file and most operations do make sense to me,
but I think that this kp-snowball version can be rewritten to make it
much more clear. What I could do is annotate the rules given in the KP
stemmer and send it to you.
Only if you are interested in these annotations of course...

Regards,

Edwin

> -----Original Message-----
> From: Martin Porter [mailto:martin.porter at grapeshot.co.uk]
> Sent: zondag 12 december 2004 14:04
> To: Edwin de Jonge; Blake Madden; snowball-discuss at lists.tartarus.org
> Subject: RE: [Snowball-discuss] Undoubling in Dutch stemmer
>
>
> Edwin, Blake,
>
> Sorry to have been a while in replying.
>
> If you get into this, you really should look at the Kraaij
> Pohlmann stemmer, which attempts the vowel lengthening you
> mention. I have translated it into Snowball, and you can find
> the result at
>
> http://www.snowball.tartarus.org/kp/stemmer.html
>
> (This is not linked to from elsewhere in the Snowball site, I
> believe.)
>
> There is also a link from this page to the UPLIFT project
> page, where their program can be downloaded.
>
> The difficulty with the K-P stemmer is understanding the
> linguistic intentions behind the rules.
>
> If you develop the rules you mention, you must of course
> check their behaviour against a sample Dutch vocabulary, and
> assess, rule by rule, whether it is improving or degrading
> the stemming process. More exactly: any rule has its hits and
> misses. You compare the ratio of misses to hits and reject
> the rule if the ratio is uncomfortably large. When developing
> the Snowball stemmer and comparing it with the K-P stemmer, I
> recall trying to avoid these less successful rules.
>
> I would like to look into this myself again, but don't quite
> have the time at present.
>
> Tell us how you get on,
>
> Martin
>
>
>
>
###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft
Exchange.
For more information, connect to http://www.F-Secure.com/




More information about the Snowball-discuss mailing list