[Snowball-discuss] handling english plural words in dutch stemmer

Srinivasan Ramaswamy ursvasan at gmail.com
Wed Jan 11 01:54:27 GMT 2012


Hi All,

I use the dutch snowball stemmer. It does well for dutch words, but sometimes I
have to handle some english words too. For example tvs, cameras, ipods, etc. I
noticed that these words doesnt get stemmed. 

tvs =>(after stemming) tvs
cameras =>(after stemming) cameras

http://snowball.tartarus.org/texts/germanic.html

Here I read that the dutch stemmer is intended only for native words of
contemporary Dutch. 

But in the current world where some english words are very common across the
world, I thought snowball stemmer already might have a workaround. If it doesnt
have one, can some one suggest me a work around which would scale.

I work on a product search engine. I index the words offline and then search for
them later. (just to give you guys some context)
I thought about these
- create a dictionary of such english words and use english stemmer for them,
but that wudnt scale
- detect the language and use the appropriate stemmer. since my keywords are
short detecting lang might be a big challenge
- if the word didnt change after stemming, use english stemmer. This might have
severe unintended consequences. 

Any thoughts or suggestions would be highly appreciated.

Thanks
Srini




More information about the Snowball-discuss mailing list