[Snowball-discuss] very minor improvement to the Italian algorithm
Paolino Paperino
aovestdipaperino at winisp.net
Sat Jul 26 16:13:37 BST 2008
Hi all,
while debugging my port of the Italian algorithm to C#, I found that with a minor change I could get better results with some words.
The change is in Step1:
I moved the following suffixes (atrice, atrici) from the first set, to the second set. This is because 'atrice/i' in Italian is the feminine form of 'atore/i'. Then I changed - in step1 - where these suffixes must be searched: R1 instead of R2.
The results, using the sample set provided, look better on all the diffs except one (applicazione should stem into 'applic' instead of 'appl').
Here is a sample:
in: aizzatori out aizz exp: aizzator
in: amatore out am exp: amator
in: amatori out am exp: amator
in: applicazione out appl exp: applic
in: applicazioni out appl exp: applic
in: armatore out arm exp: armator
in: astrazione out astr exp: astrazion
in: attuazione out attu exp: attuazion
in: aviazione out avi exp: aviazion
in: bruciatore out bruc exp: bruciator
in: cacciatore out cacc exp: cacciator
in: calciatore out calc exp: calciator
in: calciatori out calc exp: calciator
[...]
in: formazione out form exp: formazion
in: formazioni out form exp: formazion
in: giocatore out gioc exp: giocator
in: giocatori out gioc exp: giocator
in: giocatrici out gioc exp: giocatric
in: gradazioni out grad exp: gradazion
in: guastatori out guast exp: guastator
[...]
I hope this helps.
-Paolino
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20080726/ec103b72/attachment.htm
More information about the Snowball-discuss
mailing list