[Snowball-discuss] less than zero (enriching Italian stemmer)

adriano allora adriano.allora at gmail.com
Fri Apr 29 15:19:35 BST 2011


Hi Martin!

Wow, you've just opened a new world to me! It's my first C script and, you
see, I can be stupid, but I never be coward when I see something
completely... °___°

No. ok, seriously: I downloaded a gzip archive named snowball_web_and_code
which contains all the source code for snowball.
I opened it and see several things very interesting (for instance adding
some stopwords, but it's not necessary doing all now: there is time for
further improvements), so: thank you for this.
But I beg your pardon: now I'm not sure about what I have to do.
1) can I simply change the files stem_ISO_8859.sbl and stem_MS_DOS_LATIN.sbl
and in the directory named algorithms/italian? if not: where is the source
file I have to change in order to add morphemes to Italian algorithm?
2) after changing the algorithm what I exactly have to do? It's reasonable
to assume that compiling it (gcc -O -o Snowball compiler/*.c) will not
result in in the python module. and the guide to wrappers didn't help me.
hmmm... is there a howto for this cases?

Thank you for your patience, I figure my questions seem silly to a person
who know all that stuff I ignore, but probably for this kind of software
it's necessary that progammers mix with grammarians.

thank you a lot for all

adriano

2011/4/29 Martin Porter <martin at porterloo.wanadoo.co.uk>

>
> Ciao, Adriano!
>
> This problem has come up before. The generated tables in the C/Java code
> have a structure that is determined by a clever algorithm that does fast
> lookups of the endings. If you add in extra endings the tables need to be
> completely changed. The only way to extend the tables therefore is to alter
> the snowball source, download a compile the snowball compiler, and generate
> new C/Java code.
>
> Martin
>
>
>
> >....Well, I'd like to do something not so complicate: simply add some
> morphemes
> >to the Italian stemmer ....
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20110429/040ea1dc/attachment.htm>


More information about the Snowball-discuss mailing list