[Snowball-discuss] stemmer algorithm exclusions
Ilya Obshadko
xfyre at xfyre.com
Thu Jul 19 21:04:04 BST 2012
Hello,
I need to modify Russian grammar for Snowball so that I could handle
certain words that shouldn't be stemmed. For example, distributed Snowball
grammar file has the following definition for nouns:
define noun as (
[substring] among (
'{a}' '{e}{v}' '{o}{v}' '{i}{e}' '{'}{e}' '{e}'
'{i}{ia}{m}{i}' '{ia}{m}{i}' '{a}{m}{i}' '{e}{i}' '{i}{i}'
'{i}' '{i}{e}{i`}' '{e}{i`}' '{o}{i`}' '{i}{i`}' '{i`}'
'{i}{ia}{m}' '{ia}{m}' '{i}{e}{m}' '{e}{m}' '{a}{m}' '{o}{m}'
'{o}' '{u}' '{a}{kh}' '{i}{ia}{kh}' '{ia}{kh}' '{y}' '{'}'
'{i}{iu}' '{'}{iu}' '{iu}' '{i}{ia}' '{'}{ia}' '{ia}'
'{e}{ts}' '{ts}{a}' '{ts}{y}' '{ts}{u}' '{ts}{e}' '{ts}{o}{m}'
'{ts}{o}{v}' '{ts}{a}{m}'
'{e}{n}{i}' '{e}{n}{e}{m}' '{e}{n}{a}' '{e}{n}' '{e}{n}{a}{m}'
'{e}{n}{a}{m}{i}' '{e}{n}{a}{kh}'
(delete)
)
)
I need to handle the word '{a}{d}{a}{m}' so it's left as is, without going
through this rule and being stemmed to '{a}{d}'.
There are probably more exclusions that should be handled like this one.
Could anyone suggest me how to do that?
Thanks in advance!
--
Ilya Obshadko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20120719/bbff342a/attachment.htm>
More information about the Snowball-discuss
mailing list