[Snowball-discuss] Polish stemmer?

Dawid Weiss dawid.weiss at cs.put.poznan.pl
Wed Aug 29 16:50:30 BST 2007


Ok, maybe that was a bit of an overstatement -- I don't think Polish is _much_ 
more complex compared to Russian (don't know about Finnish). It's just my gut 
feeling that rule-based stemmers don't work too well for Polish (quite many 
combinations at the morphology level). Now, having said that the Morfologik 
stemmer I mentioned is built using inflected-form-generation rules (from base 
forms), so it should be possible to reuse this knowledge somehow if one wanted 
to create a Snowball stemmer. If you're willing to undertake such effort, 
Agnieszka, don't let anyone discourage you (and in particular don't let me 
discourage you).

I would be actually very curious about the level of quality such a stemmer can 
achieve (manually constructed rules). I know for a fact a number of people would 
benefit from it.

Dawid


Martin Porter wrote:
> On Wed, 2007-08-29 at 08:16 +0200, Dawid Weiss wrote:
>> Hi Agnieszka,
>>
>> (I am not a snowball developer, but...) It won't be easy to handle the
>> complexity of Polish in a set of Snowball rules. 
> 
> Dawid,
> 
> Do you have any strong evidence for that? I would not have thought
> Polish was more complex than Finnish, or Russian for example.
> 
> Martin
> 



More information about the Snowball-discuss mailing list