[Snowball-discuss] RE: Snowball-discuss digest, Vol 1 #5 - 1 msg

Svetlana Pereyaslavets smp29@cs.waikato.ac.nz
Mon Sep 9 09:49:01 2002


Dear Martin,
I am not a linguist, but a native Russian speaker. May I try to give some
explanation on this suffix.
Free, but hopefully helpful :-)
It is a very common in Russian adjectives and adverbs when we deal with a
construction:

*****basic*construction********
  prefix-root  - "other optional" suffix - n (Oleg's question) - <adjective
ending> ( = yi/iy/oy....through all genders and declinations)
*******************************
The rule has the following options:
1.  prefix-root  - n - <adjective ending >

	1.1. The root itself ends on  -n-
	In this case we will encounter -nn- after stripping the adjective ending,
and we SHOULD REMOVE one -n- (that is the 	suffix).
	Such words usually don't have prefixes (so can be easily compared to the
dictionary).
	Example : kon-n-yi (adjective from "kon'"=horse)

	1.2  The root ends on any other letter

	we SHOULD REMOVE the -n- (that is the suffix).
	Example: ruch-n-oy (adjective from "ruka"= hand).

2.  prefix-root  - "other optional"suffix - n - adjective ending


	2.1. other optional suffix = - an- or - yan -
 	- a- or -ya- SHOULD BE REMOVED TOGETHER with the suffix -n-.

	Example: "sherst-yan-oy" (=woolen).

	THREE exceptions from this rule would fall under case 2.2:
	"stekl-yan -n - <adjective ending>" (adj from glass)
	"olov-yan -n - <adjective ending>" (adj from tin)
	"derev-yan -n - <adjective ending>" (adj from wood)

	2.2.  other optional suffix = -on - or -en-

	REMOVE -n- and following -en- or -on-.

	Example: "osob - en- n- <adjective ending>" (=special)

	ONE exception from this rule would fall under case 2.1:
	"ran -en- <adjective ending>" (= injured)

	2.3. HARD CASE (RUSSIAN LEXICAL DIVERSITY IS INVOLVED) - I can't suggest a
solution right now, as I need time to think 	how to detect that without
knowledge of the natural language:

 	other optional suffix = -in

		2.3.1. If the following substitution is valid refer to 1.1. or 2.1
(i.e. -n- SHOULD BE REMOVED, following -in- 		siffix MAY and probably SHOULD
be removed depending on the required detailisation)

            		  - a (noun)
		            /
		root - in -|
            		\
				  -n- <adjective ending>


		Example: "star-in-a"-"star-in-n -yi" (= old)

		2.3.2 If the substitution  above is not valid refer to 1.2. or 2.2. with
the same reservation.

		Example: "mysh-in - <adjective ending> " (adjective from "mysh'"= mouse)

3. PARTICIPLE II may look the same as an adjective for an end-stripping
stemmer.
In Participles II, the scheme is :

 word - {-on, -en, -an, -yan} - n - <adjective ending>

Where "word" is VERY LIKELY to consist of "prefix-root" (i.e. there is a
high probability that participle II would have a prefix).




It may look too complicated, please email if you need to clarify something.
Or, please allow me some time to return to this topic and come up with a
digestable algorithm. Actually, I was planning to test Russian stemmer in
the scope of my student research in December this year.

Kind regards

Svetlana






-----Original Message-----
From: snowball-discuss-admin@lists.tartarus.org
[mailto:snowball-discuss-admin@lists.tartarus.org]On Behalf Of
snowball-discuss-request@lists.tartarus.org
Sent: Monday, September 09, 2002 5:45 PM
To: snowball-discuss@lists.tartarus.org
Subject: Snowball-discuss digest, Vol 1 #5 - 1 msg


Send Snowball-discuss mailing list submissions to
	snowball-discuss@lists.tartarus.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.tartarus.org/mailman/listinfo/snowball-discuss
or, via email, send a message with subject or body 'help' to
	snowball-discuss-request@lists.tartarus.org

You can reach the person managing the list at
	snowball-discuss-admin@lists.tartarus.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Snowball-discuss digest..."


Today's Topics:

   1. Re: russian stemmer (Martin Porter)

--__--__--

Message: 1
To: Oleg Bartunov <oleg@sai.msu.su>
From: martin_porter@softhome.net (Martin Porter)
Cc: snowball-discuss@lists.tartarus.org
Date: Sun, 08 Sep 2002 23:12:26 -0600
Subject: [Snowball-discuss] Re: russian stemmer


Oleg,

I've had a look at -n-ogo, -n-yi etc endings through the Russian vocabulary,
and feel that I would need to take linguistic advice before I could make any
progress with -n- removal.

As you may recall, I did the Russian stemmer with a linguist, Pat Miles, who
lives some 60 miles away, and is not really a computer user. Also, Pat
charges for his work, which is a further inconvenience to me! I'd rather try
to get free linguistic help now through the open source community. Is there
anyone you know in Russia who might experiment a bit further with the
Snowball stemmer to see if they could make improvements here?

Martin

>current russian stemmer seems doesn't treat adjective endings like:
>'nogo', 'nomu', 'nyi' ...., so
>veslopidnogo (bicycle) -> velosipedn~ogo
>velosipednyi -> velosipedn~yi
> while better to have
>velosipednogo -> velosiped~nogo
>velosipednyi ->  velosiped~nyi
>
>I'm not a linguist, so  I don't know how properly distinguish
>'nogo' from 'ogo' etc. Probably there is some grammar rules.




--__--__--

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss


End of Snowball-discuss Digest