[Snowball-discuss] less than zero (enriching Italian stemmer)

adriano allora adriano.allora at gmail.com
Fri Apr 29 05:35:18 BST 2011


Hi to all,

first of all: sorry, I'm not a programmer and I read the manual page but I
didn't understand how to do what I want to do. And my knowledge of English
language isn't good enough, so I hope I can explain what I need.
Well, I'd like to do something not so complicate: simply add some morphemes
to the Italian stemmer: at this time it doesn't stem correctly superlative
adjectives. For example, bello and bellissimo are only two different forms
of the same word:

bello (handsome, beauty) -> bell
bellissimo (very handsome, very beauty) -> bellissim

should be

bello (handsome, beauty) -> bell
bellissimo (very handsome, very beauty) -> bell

so, I opened the files stem_ISO_8859_1_italian.c and stem_UTF8_italian.c and
i modified this block:

static const symbol s_4_0[2] = { 'i', 'c' };
static const symbol s_4_1[4] = { 'a', 'b', 'i', 'l' };
static const symbol s_4_2[2] = { 'o', 's' };
static const symbol s_4_3[2] = { 'i', 'v' };

static const struct among a_4[4] =
{
/*  0 */ { 2, s_4_0, -1, -1, 0},
/*  1 */ { 4, s_4_1, -1, -1, 0},
/*  2 */ { 2, s_4_2, -1, -1, 0},
/*  3 */ { 2, s_4_3, -1, 1, 0}
};

this way:

static const symbol s_4_0[2] = { 'i', 'c' };
static const symbol s_4_1[4] = { 'a', 'b', 'i', 'l' };
static const symbol s_4_2[2] = { 'o', 's' };
static const symbol s_4_3[2] = { 'i', 'v' };
static const symbol s_4_4[5] = { 'i', 's', 's', 'i', 'm' };
static const symbol s_4_5[5] = { 'e', 'r', 'r', 'i', 'm' };

static const struct among a_4[6] =
{
/*  0 */ { 2, s_4_0, -1, -1, 0},
/*  1 */ { 4, s_4_1, -1, -1, 0},
/*  2 */ { 2, s_4_2, -1, -1, 0},
/*  3 */ { 2, s_4_3, -1, 1, 0},
/*  4 */ { 2, s_4_4, -1, -1, 0},
/*  5 */ { 2, s_4_5, -1, -1, 0}
};

I hoped just adding two new morphemes could be enough. I rebuilt and
reinstall the pystemmer module but it doesn't work.
Can someone help me? Where I've done my mistake?
Please don't be too much specific: you're writing to a guy who can just open
a shell and put in few commands.

thank you!

alladr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20110429/eba31d06/attachment.htm>


More information about the Snowball-discuss mailing list