[Snowball-discuss] do R1, R2 and RV need to be updated after deleting something?

alfonso.moscato at merqurio.it alfonso.moscato at merqurio.it
Thu Aug 26 11:27:23 BST 2021


Hello to all.

I am implementing the stemming algorithm for Italian (https://snowballstem.org/algorithms/italian/stemmer.html), and I have a doubt:

I have a word, say “praticabilità”

R1 is “icabilità”

R” is “abilità”

RV is “ticabilità”

(or at least I hope so 😊)

In step 1 there is the rule:

ità

delete if in R2

if preceded by abil, ic or iv, delete if in R2

And in step 3 there is the rule:

Delete a final a, e, i, o, à, è, ì or ò if it is in RV, and a preceding i if it is in RV 

In step 1 I delete “abilità” and the word becomes “pratic”

 

I leave RV untouched, and so it is still “ticabilità”

 

In step 3 I search for “à” in RV and I found it as last character.

So I think I have to delete 1 character and I delete wrongly “c”

 

I wonder which the correct algorithm is. Maybe I need to delete matches from R1, R2, and RV too?

Thanks in advance for your help.

Alfonso

 

Alfonso Moscato
CIO & COO
Merqurio Holding
Corso Umberto I, 23 - 80138 Napoli
Tel.+39 0815524300 
Fax.+39 0814201136 
Linea Verde: +39 800014863 
 
Diretto. +39 081 96.336.22 
Mobile. +39 348 36.79.384

 

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/pipermail/snowball-discuss/attachments/20210826/eda7c117/attachment.htm>


More information about the Snowball-discuss mailing list