[Snowball-discuss] Polish stemmer?
Dawid Weiss
dawid.weiss at cs.put.poznan.pl
Wed Aug 29 16:50:30 BST 2007
Ok, maybe that was a bit of an overstatement -- I don't think Polish is _much_
more complex compared to Russian (don't know about Finnish). It's just my gut
feeling that rule-based stemmers don't work too well for Polish (quite many
combinations at the morphology level). Now, having said that the Morfologik
stemmer I mentioned is built using inflected-form-generation rules (from base
forms), so it should be possible to reuse this knowledge somehow if one wanted
to create a Snowball stemmer. If you're willing to undertake such effort,
Agnieszka, don't let anyone discourage you (and in particular don't let me
discourage you).
I would be actually very curious about the level of quality such a stemmer can
achieve (manually constructed rules). I know for a fact a number of people would
benefit from it.
Dawid
Martin Porter wrote:
> On Wed, 2007-08-29 at 08:16 +0200, Dawid Weiss wrote:
>> Hi Agnieszka,
>>
>> (I am not a snowball developer, but...) It won't be easy to handle the
>> complexity of Polish in a set of Snowball rules.
>
> Dawid,
>
> Do you have any strong evidence for that? I would not have thought
> Polish was more complex than Finnish, or Russian for example.
>
> Martin
>
More information about the Snowball-discuss
mailing list