[Snowball-discuss] German Stemmer

Tobias N. Sasse tobi at byte23.de
Tue Nov 3 17:12:10 GMT 2009


Hi Richard,

thanks for your input. I am not very familiar to these algorithms -  
from my understanding, please correct me if I am wrong, a stemming  
algorithm reduces words to a common stem, which is not necessarily a  
correct word in the language itself. Which is ok for my use-case, as  
long as not too many words with different meanings refer to the same  
stem.

Richard Boulton wrote:
> These are not errors.  The stemming algorithm is not meant to return
> correct words - all it is intended to do is produce the same result
> for words with a closely related meaning, and a different result for
> words with a different meaning.

That sounds fine for me, I am curious - how many nouns are reduced? I  
don't want "carport" to be reduced to "car" as this could be a problem  
in my scenario. I know this is a difficult task, as it requires a lot  
of knowledge on the particular language and grammar...

Further I'd like to know if there is data I can exploit for research:  
I am looking for stopword lists, synonym tables etc, I have been  
looking around for a while now but never found something useful...  
Most stopword lists only  contain some dozent words :-/

Thanks for your replies!

---
Tobias N. Sasse

tobi at byte23.de
http://tobi.byte23.de




More information about the Snowball-discuss mailing list