[Snowball-discuss] RE: Snowball-discuss digest, Vol 1 #5 - 1 msg
Svetlana Pereyaslavets
smp29@cs.waikato.ac.nz
Mon Sep 9 09:49:01 2002
Dear Martin,
I am not a linguist, but a native Russian speaker. May I try to give some
explanation on this suffix.
Free, but hopefully helpful :-)
It is a very common in Russian adjectives and adverbs when we deal with a
construction:
*****basic*construction********
prefix-root - "other optional" suffix - n (Oleg's question) - <adjective
ending> ( = yi/iy/oy....through all genders and declinations)
*******************************
The rule has the following options:
1. prefix-root - n - <adjective ending >
1.1. The root itself ends on -n-
In this case we will encounter -nn- after stripping the adjective ending,
and we SHOULD REMOVE one -n- (that is the suffix).
Such words usually don't have prefixes (so can be easily compared to the
dictionary).
Example : kon-n-yi (adjective from "kon'"=horse)
1.2 The root ends on any other letter
we SHOULD REMOVE the -n- (that is the suffix).
Example: ruch-n-oy (adjective from "ruka"= hand).
2. prefix-root - "other optional"suffix - n - adjective ending
2.1. other optional suffix = - an- or - yan -
- a- or -ya- SHOULD BE REMOVED TOGETHER with the suffix -n-.
Example: "sherst-yan-oy" (=woolen).
THREE exceptions from this rule would fall under case 2.2:
"stekl-yan -n - <adjective ending>" (adj from glass)
"olov-yan -n - <adjective ending>" (adj from tin)
"derev-yan -n - <adjective ending>" (adj from wood)
2.2. other optional suffix = -on - or -en-
REMOVE -n- and following -en- or -on-.
Example: "osob - en- n- <adjective ending>" (=special)
ONE exception from this rule would fall under case 2.1:
"ran -en- <adjective ending>" (= injured)
2.3. HARD CASE (RUSSIAN LEXICAL DIVERSITY IS INVOLVED) - I can't suggest a
solution right now, as I need time to think how to detect that without
knowledge of the natural language:
other optional suffix = -in
2.3.1. If the following substitution is valid refer to 1.1. or 2.1
(i.e. -n- SHOULD BE REMOVED, following -in- siffix MAY and probably SHOULD
be removed depending on the required detailisation)
- a (noun)
/
root - in -|
\
-n- <adjective ending>
Example: "star-in-a"-"star-in-n -yi" (= old)
2.3.2 If the substitution above is not valid refer to 1.2. or 2.2. with
the same reservation.
Example: "mysh-in - <adjective ending> " (adjective from "mysh'"= mouse)
3. PARTICIPLE II may look the same as an adjective for an end-stripping
stemmer.
In Participles II, the scheme is :
word - {-on, -en, -an, -yan} - n - <adjective ending>
Where "word" is VERY LIKELY to consist of "prefix-root" (i.e. there is a
high probability that participle II would have a prefix).
It may look too complicated, please email if you need to clarify something.
Or, please allow me some time to return to this topic and come up with a
digestable algorithm. Actually, I was planning to test Russian stemmer in
the scope of my student research in December this year.
Kind regards
Svetlana
-----Original Message-----
From: snowball-discuss-admin@lists.tartarus.org
[mailto:snowball-discuss-admin@lists.tartarus.org]On Behalf Of
snowball-discuss-request@lists.tartarus.org
Sent: Monday, September 09, 2002 5:45 PM
To: snowball-discuss@lists.tartarus.org
Subject: Snowball-discuss digest, Vol 1 #5 - 1 msg
Send Snowball-discuss mailing list submissions to
snowball-discuss@lists.tartarus.org
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
or, via email, send a message with subject or body 'help' to
snowball-discuss-request@lists.tartarus.org
You can reach the person managing the list at
snowball-discuss-admin@lists.tartarus.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Snowball-discuss digest..."
Today's Topics:
1. Re: russian stemmer (Martin Porter)
--__--__--
Message: 1
To: Oleg Bartunov <oleg@sai.msu.su>
From: martin_porter@softhome.net (Martin Porter)
Cc: snowball-discuss@lists.tartarus.org
Date: Sun, 08 Sep 2002 23:12:26 -0600
Subject: [Snowball-discuss] Re: russian stemmer
Oleg,
I've had a look at -n-ogo, -n-yi etc endings through the Russian vocabulary,
and feel that I would need to take linguistic advice before I could make any
progress with -n- removal.
As you may recall, I did the Russian stemmer with a linguist, Pat Miles, who
lives some 60 miles away, and is not really a computer user. Also, Pat
charges for his work, which is a further inconvenience to me! I'd rather try
to get free linguistic help now through the open source community. Is there
anyone you know in Russia who might experiment a bit further with the
Snowball stemmer to see if they could make improvements here?
Martin
>current russian stemmer seems doesn't treat adjective endings like:
>'nogo', 'nomu', 'nyi' ...., so
>veslopidnogo (bicycle) -> velosipedn~ogo
>velosipednyi -> velosipedn~yi
> while better to have
>velosipednogo -> velosiped~nogo
>velosipednyi -> velosiped~nyi
>
>I'm not a linguist, so I don't know how properly distinguish
>'nogo' from 'ogo' etc. Probably there is some grammar rules.
--__--__--
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
End of Snowball-discuss Digest