[Snowball-discuss] Ukrainian stemmer?

Martin Porter martin.porter at grapeshot.co.uk
Tue Jan 29 11:32:19 GMT 2008



Carl,

I am sorry to not to have replied sooner, but work has led me away from
Snowball in the past few months.

In my experience creating a stemming algorithm takes a relatively short
amount of time, but the time is always stretched over a long period.
This is because you need to get the necessary resources together:
linguistic skills, vocabulary corpus to work on, a dictionary, a
reliable up-to-date grammar and so on. You do not necessarily need a
linguist (and sometimes a language expert can create confusion by not
really understanding the purpose of the stemmer) but of course you will
need someone who is prepared study the language's morphology and grammar
to some extent, and to apply that knowledge.

Work has been done on Ukrainian. There is a stemmer by "Alex Kobyakov",
and published work on Bulgarian refers to work by Kovalenko on
Ukrainian. You can find references to their work by Google searches. 
Coding up algorithmically described stemmers in snowball is dead easy,
so it is a good plan to see if you get access to work already done. 

Snowball discuss has had an earlier enquiry about Ukrainian from a
certain sector119 at mail.ru, but that was as long ago as 2003.
Nevertheless, he/she knew Ukranian and did express interest.

Martin




On Fri, 2008-01-04 at 10:38 -0500, Carl Erickson wrote:
> I've been asked by a client to determine the feasibility of  
> supporting Ukrainian in a project using the Ruby search engine  
> library Ferret. Ferret uses Snowball. I think it's likely that I can  
> convince our client to release the Ukrainian Snowball work under the  
> BSD license.
> 
> It would be really helpful to know, even roughly, how much effort is  
> required to create a basic Ukrainian stemmer. Is a native speaker  
> paired with an experience programmer enough, or do I need a Ukrainian  
> linguist? Am I looking at days or weeks or months of effort?
> 
> thanks,
> Carl
> ---
> Carl Erickson




More information about the Snowball-discuss mailing list