[Snowball-discuss] Problems with the italian stemmer algorithm

Martin Porter martin.porter at grapeshot.co.uk
Thu Sep 9 08:14:15 BST 2004


Peter,

What is happening is that the longer suffix -atrice is being found, but not
removed because the residual stem is too short. This terminates step 1
(which only performs an action on the single longest suffix found) without
the -ice ending being considered. 

It could be that the algorithm could be improved in the case you report, and
I will look into that, but in the meantime this is the explanation of the
problem you had.

Martin



>hello, as i have been trying to implement the italian stemmer algorithm
>described in http://www.snowball.tartarus.org/italian/stemmer.html i faced some
>minor differences between the output of the snowball implementation and mine.
>all this mismatches are related with the "ici" "ico" "ice" suffixes.
>for example the word "mediatrice" has the R2 region "rice"
>
>so during step 1 there is a match in terms of the "ice" suffix and it is
>deleted. So the stem of mediatrice is mediatr (my implementation)
>
>but in the output file of the snowball implementation the word "mediatrice"
gets
>conflated to "mediatric" - so i'm currently a little confused - maybe i compute
>the region R2 wrong or must there be a complete match between the region R2 and
>the suffix in order to delete it?





More information about the Snowball-discuss mailing list