[Snowball-discuss] stemmer algorithm exclusions

Ilya Obshadko xfyre at xfyre.com
Thu Jul 19 21:04:04 BST 2012


Hello,

I need to modify Russian grammar for Snowball so that I could handle
certain words that shouldn't be stemmed. For example, distributed Snowball
grammar file has the following definition for nouns:

    define noun as (
        [substring] among (
            '{a}' '{e}{v}' '{o}{v}' '{i}{e}' '{'}{e}' '{e}'
            '{i}{ia}{m}{i}' '{ia}{m}{i}' '{a}{m}{i}' '{e}{i}' '{i}{i}'
            '{i}' '{i}{e}{i`}' '{e}{i`}' '{o}{i`}' '{i}{i`}' '{i`}'
            '{i}{ia}{m}' '{ia}{m}' '{i}{e}{m}' '{e}{m}' '{a}{m}' '{o}{m}'
            '{o}' '{u}' '{a}{kh}' '{i}{ia}{kh}' '{ia}{kh}' '{y}' '{'}'
            '{i}{iu}' '{'}{iu}' '{iu}' '{i}{ia}' '{'}{ia}' '{ia}'
            '{e}{ts}' '{ts}{a}' '{ts}{y}' '{ts}{u}' '{ts}{e}' '{ts}{o}{m}'
'{ts}{o}{v}' '{ts}{a}{m}'
            '{e}{n}{i}' '{e}{n}{e}{m}' '{e}{n}{a}' '{e}{n}' '{e}{n}{a}{m}'
'{e}{n}{a}{m}{i}' '{e}{n}{a}{kh}'
                (delete)
        )
    )

I need to handle the word '{a}{d}{a}{m}' so it's left as is, without going
through this rule and being stemmed to '{a}{d}'.
There are probably more exclusions that should be handled like this one.
Could anyone suggest me how to do that?

Thanks in advance!

-- 
Ilya Obshadko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20120719/bbff342a/attachment.htm>


More information about the Snowball-discuss mailing list