[Snowball-discuss] Problem with spanish stemmer

Ignacio Perez ignacio.perez at gmail.com
Mon Oct 29 23:29:00 GMT 2007


I'm working with the spanish stemmer and I'm having sort of a problem with
the verb suffixes. The input I'm stemming is not orthographically perfect
and I can not rely on the accents for stemming. I thought, then, I could
remove all accents from my input and from the stemmer (for most of verb
suffixes this does not represent a problem since "iéarmos", "íamos",
"ábamos", "áramos", etc. are surely a suffix even when they're expressed as
"iearmos", "iamos", "abamos", "aramos"; there is no ambiguity). Surprisingly
(for me) the stemmer did not behave as I expected and words like "tomaramos"
were split "tomar-amos".
Evidently I'm not understanding the behaviour of the stemmer and these
accents had more value for it.

So, how can I use the stemmer making it not accent-sensitive?

Thanks a lot

Ignacio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20071029/9e54e4a4/attachment.html


More information about the Snowball-discuss mailing list