[Snowball-discuss] Spanish Stemmer question
Ignacio Perez
ignacio.perez at gmail.com
Mon Oct 22 19:52:58 BST 2007
My name is Ignacio Perez and I'm both a linguistics student at Buenos Aires
University (Argentina) and a Java programmer. I'm working now with the
spanish stemmer and I found a couple of problems with the following rule:
'idad'
'idades'
(
R2 delete
try (
[substring] among(
'abil'
'ic'
'iv' (R2 delete)
)
)
)
The first (an easiest to solve) problem I find here is that we should add
'ibil' next to 'abil' beacause the suffixes '-ibilidad' and '-abilidad' are
the same one for different verb conjugation (the first for "-er" and "-ir"
and the second for "-ar"). This would solve problems for words like
"inteligibilidad" or "sensibilidad".
The second problem I ran to is that 'abil' (and, if added, also 'ibil') is
only removed when in R2. This leaves out words like "posibilidad" (very
usual) or "ambilidad". And I thought the rule might go with something like
this "remove abil when in R2 or when in R1 except when preceeded by "inh".
Kind regards.
Ignacio Perez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20071022/fb83d114/attachment.html
More information about the Snowball-discuss
mailing list