[Snowball-discuss] less than zero (enriching Italian stemmer)

Martin Porter martin at porterloo.wanadoo.co.uk
Mon May 2 10:13:52 BST 2011


Adriano,

Yes, you alter the .sbl file, compile the snowball compiler, and translate
the .sbl file into C or java. If you're a software beginner I suggest you
get local help if you can. This will be easier than trying to take you
through the steps from the snowball-discuss board.

I'm not quite sure where python comes into all this ...

Martin

At 10:19 AM 4/29/2011 -0400, adriano allora wrote:
>Hi Martin!
>
>Wow, you've just opened a new world to me! It's my first C script and, you
>see, I can be stupid, but I never be coward when I see something
>completely... °___°
>
>No. ok, seriously: I downloaded a gzip archive named snowball_web_and_code
>which contains all the source code for snowball.
>I opened it and see several things very interesting (for instance adding
>some stopwords, but it's not necessary doing all now: there is time for
>further improvements), so: thank you for this.
>But I beg your pardon: now I'm not sure about what I have to do.
>1) can I simply change the files stem_ISO_8859.sbl and stem_MS_DOS_LATIN.sbl
>and in the directory named algorithms/italian? if not: where is the source
>file I have to change in order to add morphemes to Italian algorithm?
>2) after changing the algorithm what I exactly have to do? It's reasonable
>to assume that compiling it (gcc -O -o Snowball compiler/*.c) will not
>result in in the python module. and the guide to wrappers didn't help me.
>hmmm... is there a howto for this cases?
>
>Thank you for your patience, I figure my questions seem silly to a person
>who know all that stuff I ignore, but probably for this kind of software
>it's necessary that progammers mix with grammarians.
>
>thank you a lot for all
>
>adriano
>
>2011/4/29 Martin Porter <martin at porterloo.wanadoo.co.uk>
>
>>
>> Ciao, Adriano!
>>
>> This problem has come up before. The generated tables in the C/Java code
>> have a structure that is determined by a clever algorithm that does fast
>> lookups of the endings. If you add in extra endings the tables need to be
>> completely changed. The only way to extend the tables therefore is to alter
>> the snowball source, download a compile the snowball compiler, and generate
>> new C/Java code.
>>
>> Martin
>>
>>
>>
>> >....Well, I'd like to do something not so complicate: simply add some
>> morphemes
>> >to the Italian stemmer ....
>>






More information about the Snowball-discuss mailing list