[Snowball-discuss] Personal pronoun "his" in Snowball EnglishStemmer

Steve Legrand steveleg at hotmail.com
Sat May 21 14:59:12 BST 2005


Is there a module in the Snowball stemmer by which I could exclude certain 
words from the stemming process? I am using the Java version and get the 
word "his" indexed as "hi".  "Him" and "he" are indexed as such with no 
changes. I know the algorithm tries to optimize between various things and 
the stemmed words do not always make sense outside the indexing process. 
This, however, prevents me from retrieving phrases such as "his palm". 
Instead I use the phrase "hi palm" for the retrieval. In future, I will 
probably have a larger group of normal English words I need to keep in their 
original form in the index. For this reason I would like to know whether 
there is a module in Snowball I could tweak to exclude certain words from 
the stemming process, or would it be better to tweak the words before 
entering them to the stemmer? I need to code this in.

Cheerio,
Steve Legrand

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/




More information about the Snowball-discuss mailing list