[Snowball-discuss] less than zero (enriching Italian stemmer)
adriano allora
adriano.allora at gmail.com
Fri Apr 29 05:35:18 BST 2011
Hi to all,
first of all: sorry, I'm not a programmer and I read the manual page but I
didn't understand how to do what I want to do. And my knowledge of English
language isn't good enough, so I hope I can explain what I need.
Well, I'd like to do something not so complicate: simply add some morphemes
to the Italian stemmer: at this time it doesn't stem correctly superlative
adjectives. For example, bello and bellissimo are only two different forms
of the same word:
bello (handsome, beauty) -> bell
bellissimo (very handsome, very beauty) -> bellissim
should be
bello (handsome, beauty) -> bell
bellissimo (very handsome, very beauty) -> bell
so, I opened the files stem_ISO_8859_1_italian.c and stem_UTF8_italian.c and
i modified this block:
static const symbol s_4_0[2] = { 'i', 'c' };
static const symbol s_4_1[4] = { 'a', 'b', 'i', 'l' };
static const symbol s_4_2[2] = { 'o', 's' };
static const symbol s_4_3[2] = { 'i', 'v' };
static const struct among a_4[4] =
{
/* 0 */ { 2, s_4_0, -1, -1, 0},
/* 1 */ { 4, s_4_1, -1, -1, 0},
/* 2 */ { 2, s_4_2, -1, -1, 0},
/* 3 */ { 2, s_4_3, -1, 1, 0}
};
this way:
static const symbol s_4_0[2] = { 'i', 'c' };
static const symbol s_4_1[4] = { 'a', 'b', 'i', 'l' };
static const symbol s_4_2[2] = { 'o', 's' };
static const symbol s_4_3[2] = { 'i', 'v' };
static const symbol s_4_4[5] = { 'i', 's', 's', 'i', 'm' };
static const symbol s_4_5[5] = { 'e', 'r', 'r', 'i', 'm' };
static const struct among a_4[6] =
{
/* 0 */ { 2, s_4_0, -1, -1, 0},
/* 1 */ { 4, s_4_1, -1, -1, 0},
/* 2 */ { 2, s_4_2, -1, -1, 0},
/* 3 */ { 2, s_4_3, -1, 1, 0},
/* 4 */ { 2, s_4_4, -1, -1, 0},
/* 5 */ { 2, s_4_5, -1, -1, 0}
};
I hoped just adding two new morphemes could be enough. I rebuilt and
reinstall the pystemmer module but it doesn't work.
Can someone help me? Where I've done my mistake?
Please don't be too much specific: you're writing to a guy who can just open
a shell and put in few commands.
thank you!
alladr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20110429/eba31d06/attachment.htm>
More information about the Snowball-discuss
mailing list