[Snowball-discuss] Dutch stemmer: undouble "nn", "mm", "ff"?

Arjen van der Meijden arjen@glas.its.tudelft.nl
Thu Jan 1 19:39:02 2004


Martin Porter wrote:

> Edwin,
> 
> Thanks for that idea, which I'll try out. There are a number of outstanding
> suggestions to work through, and I must set some time aside to look at them
> early this year.
> 
> A new idea of mine: I think apostrophe ought to form part of the alphabet of
> Dutch, and indeed of English. I haven't really had time to put that in though.

Would that stem words like these Dutch words:
cd'tje -> cd
tv'tje -> tv
a4'tje -> a4
baby'tje -> baby
("smaller versions of" abbreviations are  "smallerized" with 'tje, as do 
words ending at a consonant and a 'y')
pcb's -> pcb
foto's -> foto
taxi's -> taxi
(plural forms of abbreviations and words ending at an a, i, o, u, y have 
an 's ending)
Wanda's vis -> Wanda, vis
Kees' auto -> Kees, auto
Henks fiets -> Henk, fiets
(ownerships are with a 's, unless the word already ends with a s (or 
s-sound). If there is a consonant at the end, than just an s)

Or are these already handled?
Anyway, the Dutch language seems terrible to stem very well. At least it 
does to me. There are a lot of rules and to almost all rules a few 
exceptions on those rules. :)

Best regards,

Arjen van der Meijden