[Snowball-discuss] Re: Italian Stemmer with C#

Martin Porter martin.porter at grapeshot.co.uk
Thu Sep 1 17:15:21 BST 2005


Federico,

The differences you note are because alzare is a verb with a very short stem
- alz - and my Italian stemmer demands a longer stem length before it takes
anything off. So the difference must be in determining R1 and R2.

Short verb stem are a problem for the stemmers in the romance languages:
rier in French, orare in Italian etc. 

If you believe you are getting better overall results with a different
measure of R1 and R2, let me know the rules you are using! 

Martin

------------------

>Dear Mr. Porter,
>  I've found "Snowball" page during my search in the
>internet about available stemmer softwares.
>I've created (starting from a German version program
>on http://www.codeproject.com/csharp/destemming.asp)
>an Italian version using your rules describe in the
>page on italian language.
>
>After some tests between my code and your snowball
>results on the italian languages, I noticed some
>differences. There can be a little mismatch into the
>code (in mine program or in snowball)?
>
>For example:
>  alzandogli: for snowball became alzandogl, mine
>translate into alzand
>  alzarla: for snowball became alzarl, mine translate
>into alzar
>  alzarsi: for snowball became alzars, mine translate
>into alzar
>  alzargli: for snowball became alzargl, mine
>translate into alzar
>
>And many others...
>Which version is the correct one?
>
>Thanks for your reply and Best regards,
>  Federico Pieri




More information about the Snowball-discuss mailing list