[Snowball-discuss] Snowball Russian algorithm - question

Martin Porter martin.f.porter at gmail.com
Tue Nov 29 18:35:13 GMT 2022


Oleksandr,

(You must forgive me: I do not know the Russian language. The Russian
stemmer was a collaboration of myself with Patrick Miles, a
professional Russian translator.)

The critical point is the note in the algorithm description,

"a tempting way of running the stemmer is to set a minimum stem length
of zero, and thereby reduce to null all words which are made up
entirely of suffix parts. We have been a little more cautious, and
have insisted that a minimum stem contains one vowel."

and then, "RV is the region after the first vowel" and then "all tests
take place in the the RV part of the word".

In всплыла, в,с,п,л are consonants and ы is a vowel, so RV just
contains "ла". In the RV region we find ending ла therefore, but not
ыла.

The algorithm makes an exception of вспл (surface) because it contains no vowel.

Perhaps the algorithm is in error here, but that is the reason for the result.

Martin
------------------

On 11/29/22, Oleksandr Bratashov <abratashov at gmail.com> wrote:
> Hello Martin, thank you very much for your contribution to Snowball!
>
> Currently, I'm exploring how the Russian algorithm works to implement the
> same for the Ukrainian language and had one question that is described
> here:
> https://github.com/snowballstem/snowball/issues/173
>
> Olly Bets told me that you've implemented it (I'm curious if you know the
> Russian language!), so could help me to figure out how the steps of this
> algorithm are running?
>
> Or whom can I ask about it?
>
> Thanks!
>
> Sasha,
> Lviv, Ukraine
>



More information about the Snowball-discuss mailing list