[Snowball-discuss] German Stemmer
Tobias N. Sasse
tobi at byte23.de
Tue Nov 3 17:12:10 GMT 2009
Hi Richard,
thanks for your input. I am not very familiar to these algorithms -
from my understanding, please correct me if I am wrong, a stemming
algorithm reduces words to a common stem, which is not necessarily a
correct word in the language itself. Which is ok for my use-case, as
long as not too many words with different meanings refer to the same
stem.
Richard Boulton wrote:
> These are not errors. The stemming algorithm is not meant to return
> correct words - all it is intended to do is produce the same result
> for words with a closely related meaning, and a different result for
> words with a different meaning.
That sounds fine for me, I am curious - how many nouns are reduced? I
don't want "carport" to be reduced to "car" as this could be a problem
in my scenario. I know this is a difficult task, as it requires a lot
of knowledge on the particular language and grammar...
Further I'd like to know if there is data I can exploit for research:
I am looking for stopword lists, synonym tables etc, I have been
looking around for a while now but never found something useful...
Most stopword lists only contain some dozent words :-/
Thanks for your replies!
---
Tobias N. Sasse
tobi at byte23.de
http://tobi.byte23.de
More information about the Snowball-discuss
mailing list