[Snowball-discuss] Spanish Stemmer question

Martin Porter martin.porter at grapeshot.co.uk
Wed Oct 24 09:41:59 BST 2007


Ignacio,

I will look into this at some point.

The stemmers were developed by working with large sample vocabularies,
and you then often find that a rule that seems plausible has to be
withdrawn because there are too many cases where it is causing errors.
In any event, a new rule must be tested against a large vocabulary, and
the words affected, for better or worse, checked and assessed.  

I'll do this with your two suggestions when I come back to working on
snowball. My guess is that your first rule could certainly be added in;
the second may lead to errors as well as improvements.

Are you intending to include these extras in your version of the
stemmer? If so, keep us informed.

Thanks for your interest. 

Martin



On Mon, 2007-10-22 at 15:52 -0300, Ignacio Perez wrote:
> My name is Ignacio Perez and I'm both a linguistics student at Buenos
> Aires University (Argentina) and a Java programmer. I'm working now
> with the spanish stemmer and I found a couple of problems with the
> following rule: 
>             'idad'
>             'idades'
>             (
>                 R2 delete
>                 try (
>                     [substring] among(
>                         'abil'
> 
>                         'ic'
>                         'iv'   (R2 delete)
>                     )
>                 )
>             )
> 
> The first (an easiest to solve) problem I find here is that we should
> add 'ibil' next to 'abil' beacause the suffixes '-ibilidad' and
> '-abilidad' are the same one for different verb conjugation (the first
> for "-er" and "-ir" and the second for "-ar"). This would solve
> problems for words like "inteligibilidad" or "sensibilidad". 
> 
> The second problem I ran to is that 'abil' (and, if added, also
> 'ibil') is only removed when in R2. This leaves out words like
> "posibilidad" (very usual) or "ambilidad". And I thought the rule
> might go with something like this "remove abil when in R2 or when in
> R1 except when preceeded by "inh". 
> 
> Kind regards.
> 
> Ignacio Perez
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss




More information about the Snowball-discuss mailing list