[Snowball-discuss] German Stemmer

Tobias N. Sasse tobi at byte23.de
Tue Nov 3 15:28:38 GMT 2009


Hi Guys,

I am a german computer science student and currently doing research in  
textual analytic systems. I need stemmers for all kinds of languages  
(a good start would be English, German, French, Spanish...)

I had a quick look at the German version on your site and sady  
recognized that the german version produces tons of errors. For  
instance a

  "katze" -> "katz"
  "kätzchen" -> "katzch"
  "kätzchens" -> "katzch"

is wrong, there is no german word "katzch" it should be "katze" (the  
actual stem) and "katz" is also wrong, the trailing "e" is missing...

So my question is: do you know an improved version, or an alternate  
algorithm? What about the other languages, and how is the quality in  
there - I am not a linguist, thus can't judge their quality....

Thanks for your info
Tobi

---
Tobias N. Sasse

tobi at byte23.de
http://tobi.byte23.de




More information about the Snowball-discuss mailing list