[Snowball-discuss] german_stemmer -isse

Martin Porter martin at porterloo.wanadoo.co.uk
Mon Oct 26 10:46:16 GMT 2009


Wolfgang,

The problem is of course with nouns ending -is in German. wildnis has (I
take it) plural wildnisse, which the algorithm stems to wildnis and so on.
Have you any idea how many words there might be in this category? I've
looked through the snowball German vocab, and very many words ending -is are
foreign words (thesis, paris ...) where (I believe?) the -isse plural rule
will not apply. Ku"rbis is not so very common a word, though suitably
seasonal with October 31 approaching!

Usually the -sse ending is correctly stemmed to -ss, but looking through the
vocabulary, it is clear that -isse would  often be better stemmed to -is. Do
you have a view on that?

http://snowball.tartarus.org/algorithms/german/diffs.txt

I'm a bit rusty on German, as you see. I'm rather preocuppied with other
work at the moment, but could find time to look at this problem with a
German in a couple of weeks. Meanwhile, at your end, you'll have to endure
the mis-stemming for the time being,

Martin



At 02:50 PM 10/22/2009 +0200, Wolfgang Klinger wrote:
>
>
>*hiya!*
>
>I use the sphinx search engine and have problems with
>libstemmer_de.
>
>I have text that includes the german word "Kürbis".
>The plural of "Kürbis" is "Kürbisse". Now if I search für "Kürbisse"
>I would expect results für "Kürbis" too (that's why I use libstemmer).
>
>Obviously libstemmer_de creates "kurbiss" as stemmed form instead
>of "kurbis" and therefore I get no results.
>Is that a known problem? How can I solve it?
>
>
>tia, kind regards
>Wolfgang
>






More information about the Snowball-discuss mailing list