[Snowball-discuss] very minor improvement to the Italian algorithm

Paolino Paperino aovestdipaperino at winisp.net
Sat Jul 26 16:13:37 BST 2008


Hi all,
while debugging my port of the Italian algorithm to C#, I found that with a minor change I could get better results with some words.
The change is in Step1:
I moved the following suffixes (atrice, atrici) from the first set, to the second set. This is because 'atrice/i' in Italian is the feminine form of 'atore/i'. Then I changed - in step1 - where these suffixes must be searched: R1 instead of R2.

The results, using the sample set provided, look better on all the diffs except one (applicazione should stem into 'applic' instead of 'appl').
Here is a sample:
in: aizzatori out aizz  exp: aizzator
in: amatore out am  exp: amator
in: amatori out am  exp: amator
in: applicazione out appl  exp: applic
in: applicazioni out appl  exp: applic
in: armatore out arm  exp: armator
in: astrazione out astr  exp: astrazion
in: attuazione out attu  exp: attuazion
in: aviazione out avi  exp: aviazion
in: bruciatore out bruc  exp: bruciator
in: cacciatore out cacc  exp: cacciator
in: calciatore out calc  exp: calciator
in: calciatori out calc  exp: calciator
[...]
in: formazione out form  exp: formazion
in: formazioni out form  exp: formazion
in: giocatore out gioc  exp: giocator
in: giocatori out gioc  exp: giocator
in: giocatrici out gioc  exp: giocatric
in: gradazioni out grad  exp: gradazion
in: guastatori out guast  exp: guastator
[...]

I hope this helps.
-Paolino
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20080726/ec103b72/attachment.htm 


More information about the Snowball-discuss mailing list