[Snowball-discuss] German suffix stripping not complete
Martin Porter
martin.porter at grapeshot.co.uk
Mon Jan 30 10:05:46 GMT 2006
Karl,
I'm sure the reason it was not done is that the group is so small. What you
would certainly need to do is to check for the ending -los, and not remove
the -s in that case. If you take the sample vocabulary provided with German,
you then get the following residual list,
ambros amos autos bartholomaios büros chaos credos fotos
haemorrheos heros hos infos jethros jos lebensmittelembargos
migros moos mythos pharaos platos salomos studios theophrastos
wahlbüros wos
25 words in all. -s could be removed with benefit or without harm from all,
or almost all, of these words. There is some overlap here with your own word
list.
Thank you for pointing this out. I will review the German algorithm at some
point in the future, and possibly incorporate your sugestion,
Martin
>Hello list,
>
>I'm wondering if there is a good reason for the German stemmer not to
>suffix strip the s in words ending on 'os'.
>Autos, kinos, echos, bu"ros, silos, pianos, et.c.
>
>Here are some words you can consider.
>
>Albatros, apropos, chaos, epos, kosmos, gros, rigoros, grandios, los, haarlos.
>
>All I can think of will be pretty much ok suffix stripped.
>
>_______________________________________________
More information about the Snowball-discuss
mailing list