[Snowball-discuss] Basque language

Jimmy O'Regan joregan at gmail.com
Thu Jul 22 10:35:51 BST 2010


On 22 July 2010 10:04, Martin Porter <martin at porterloo.wanadoo.co.uk> wrote:
>
> Mikel,
>
> Hi! Apologies if you waited some time for reply after your first email to
> us. A problem at our end. (Remember to post your replies also to
> snowball-discuss at lists.tartarus.org.)
>
> I don't think we should worry too much about preparing sbl, diff, voc, doc
> files etc. The easiest think, I think, is to add your stemmers in a raw form
> (see the Armenian stemmer on the snowball site, added yesterday), and we
> expand your contribution later if you wish to do so.
>
> You say,
>
>
>>Hi Martin,
>>Here is the first valid version of the Catalan Stemmer. This
>>means we actually have a valid euskera and catalan version.
>
>>Which do you think is the next step? etc
>
> I do not quite understand. Do you mean you have two stemmers, one for
> Catalan, one for Euskera? Euskera, I discover, is the Basque word for
> "Basque". Anyway, Basque = Euskera. Is that right? Which name would be best
> on the snowball site?

Maybe mention both? Basque is the English word, but the language code
(eu) derives from the native name.

> Is you last attachment a stemmer for Catalan, not
> Basque/Euskera? I take it a Catalan stemmer would be close to the snowball
> Spanish stemmer, while a Basque stemmer would be rather different.
>

It would be closer to the Portuguese one, because enclitic pronouns
are attached with hyphens and apostrophes: dímelo (es) = digues-m'ho
(ca).

> (All my language/linguistics books are boxed up at the moment, so I can't
> check easily without your help.)
>


-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.



More information about the Snowball-discuss mailing list