[Snowball-discuss] Spanish Stemmer question

Ignacio Perez ignacio.perez at gmail.com
Mon Oct 22 19:52:58 BST 2007


My name is Ignacio Perez and I'm both a linguistics student at Buenos Aires
University (Argentina) and a Java programmer. I'm working now with the
spanish stemmer and I found a couple of problems with the following rule:

            'idad'
            'idades'
            (
                R2 delete
                try (
                    [substring] among(
                        'abil'
                        'ic'
                        'iv'   (R2 delete)
                    )
                )
            )


The first (an easiest to solve) problem I find here is that we should add
'ibil' next to 'abil' beacause the suffixes '-ibilidad' and '-abilidad' are
the same one for different verb conjugation (the first for "-er" and "-ir"
and the second for "-ar"). This would solve problems for words like
"inteligibilidad" or "sensibilidad".

The second problem I ran to is that 'abil' (and, if added, also 'ibil') is
only removed when in R2. This leaves out words like "posibilidad" (very
usual) or "ambilidad". And I thought the rule might go with something like
this "remove abil when in R2 or when in R1 except when preceeded by "inh".

Kind regards.

Ignacio Perez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20071022/fb83d114/attachment.html


More information about the Snowball-discuss mailing list