[Snowball-discuss] Problems with the italian stemmer algorithm

pprett at sbox.tugraz.at pprett at sbox.tugraz.at
Wed Sep 8 15:13:55 BST 2004


hello, as i have been trying to implement the italian stemmer algorithm
described in http://www.snowball.tartarus.org/italian/stemmer.html i faced some
minor differences between the output of the snowball implementation and mine.
all this mismatches are related with the "ici" "ico" "ice" suffixes.
for example the word "mediatrice" has the R2 region "rice"

so during step 1 there is a match in terms of the "ice" suffix and it is
deleted. So the stem of mediatrice is mediatr (my implementation)

but in the output file of the snowball implementation the word "mediatrice" gets
conflated to "mediatric" - so i'm currently a little confused - maybe i compute
the region R2 wrong or must there be a complete match between the region R2 and
the suffix in order to delete it?

thx & regards

peter




More information about the Snowball-discuss mailing list