[Snowball-discuss] German Stemmer
Tobias N. Sasse
tobi at byte23.de
Wed Nov 4 13:36:27 GMT 2009
Hi Richard,
thanks for your answers!
> The stemming is fairly conservative, usually.
Ok, that's a nice thing (for me) :-)
> Generally, I don't use stopwords for the work I do with search
> engines, and precalculated lists of stopwords are often of little use:
> you tend to need custom ones to match your dataset. However, the
> snowball ones may be of some help to you, anyway.
Well, I'd like to work with relations of words, and I think that
stopwords do not add any sense to my knowledge base, and thus produce
a lot of data I don't need. But on the other hand I don't want to
create a stopword list by hand for each input, as I want to minimize
the dependencies to the input dataset... A common one with words like
"a, an, the, and ..." would be nice though. Your link seems to be a
good start and further experiments will show if the use of stopword
lists will be necessary or not.
Why don't you find stopword removal useful in your scenario?
> Snowball doesn't have synonym tables: I'd suggest looking up wordnet
> for them.
Cool, Wordnet seems to be an interesting project - I will have a
closer look on that. The more and more time I spend on text analytical
systems I find the need for some literature on languages and
linguistics :-)
Best wishes,
Tobi
---
Tobias N. Sasse
tobi at byte23.de
http://tobi.byte23.de
More information about the Snowball-discuss
mailing list