[Snowball-discuss] German Stemmer

Tobias N. Sasse tobi at byte23.de
Wed Nov 4 13:36:27 GMT 2009


Hi Richard,

thanks for your answers!

> The stemming is fairly conservative, usually.

Ok, that's a nice thing (for me) :-)

> Generally, I don't use stopwords for the work I do with search
> engines, and precalculated lists of stopwords are often of little use:
> you tend to need custom ones to match your dataset.  However, the
> snowball ones may be of some help to you, anyway.

Well, I'd like to work with relations of words, and I think that  
stopwords do not add any sense to my knowledge base, and thus produce  
a lot of data I don't need. But on the other hand I don't want to  
create  a stopword list by hand for each input, as I want to minimize  
the dependencies to the input dataset... A common one with words like  
"a, an, the, and ..." would be nice though. Your link seems to be a  
good start and further experiments will show if the use of stopword  
lists will be necessary or not.

Why don't you find stopword removal useful in your scenario?

> Snowball doesn't have synonym tables: I'd suggest looking up wordnet  
> for them.

Cool, Wordnet seems to be an interesting project - I will have a  
closer look on that. The more and more time I spend on text analytical  
systems I find the need for some literature on languages and  
linguistics :-)

Best wishes,
Tobi

---
Tobias N. Sasse

tobi at byte23.de
http://tobi.byte23.de




More information about the Snowball-discuss mailing list