[Snowball-discuss] Problem with spanish stemmer
Ignacio Perez
ignacio.perez at gmail.com
Wed Oct 31 15:46:46 GMT 2007
Thanks a lot to both of you. My problem was that I was attempting to modify
the generated code instead of re compiling the stemmer.
Sorry, and thanks again
Ignacio
On 10/31/07, Martin Porter <martin.porter at grapeshot.co.uk> wrote:
>
>
> Ignacio,
>
> What you have outlined should work, and I would have to look at your
> approach in some detail to see where the problem lies. (Something
> incidentally that I do not currently have the time the do!)
>
> I have just done a simple test in which the line of suffixes,
>
> '{a'}ramos' 'i{e'}ramos' 'i{e'}semos' '{a'}semos'
>
> is additionally preceded by the line
>
> 'aramos' 'ieramos' 'iesemos' 'asemos'
>
> and this works fine, your word "tomaramos" splitting as "tom-aramos".
>
> So I can suggest that as an approach: supplement the algorithm with
> extra endings, corresponding to the accented forms but with the accent
> removed. I suggest you build it up bit by bit, and test it out as each
> new ending, or set of endings, is included.
>
> This problem has arisen before. See for example the email of Andrew
> Green 19 May 2007.
>
> Martin
>
> On Mon, 2007-10-29 at 20:29 -0300, Ignacio Perez wrote:
> > I'm working with the spanish stemmer and I'm having sort of a problem
> > with the verb suffixes. The input I'm stemming is not orthographically
> > perfect and I can not rely on the accents for stemming. I thought,
> > then, I could remove all accents from my input and from the stemmer
> > (for most of verb suffixes this does not represent a problem since
> > "iéarmos", "íamos", "ábamos", "áramos", etc. are surely a suffix even
> > when they're expressed as "iearmos", "iamos", "abamos", "aramos";
> > there is no ambiguity). Surprisingly (for me) the stemmer did not
> > behave as I expected and words like "tomaramos" were split
> > "tomar-amos".
> > Evidently I'm not understanding the behaviour of the stemmer and these
> > accents had more value for it. '{a'}ramos' 'i{e'}ramos' 'i{e'}semos'
> > '{a'}semos'
> >
> > So, how can I use the stemmer making it not accent-sensitive?
> >
> > Thanks a lot
> >
> > Ignacio
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20071031/5c1f1ddb/attachment.htm
More information about the Snowball-discuss
mailing list