[Snowball-discuss] Error in the vocabulary for Italian stemmer?
Martin Porter
martin at porterloo.wanadoo.co.uk
Wed Jun 16 10:09:15 BST 2010
>For people who want to do it the same way it would be good, if you could
make it a bit clearer in the descriptions that one should not search for the
longest suffix that can be deleted, as this might be a source for
misunderstandings.
I think the descriptions, if carefully read, are clear on this point, but
the important lesson here is that the snowball system does achieve what the
original Porter stemmer description did not, namely, it results in exact
definitions of the algorithms, since errors in recoding are detectable and
correctable. Incidentally, that particular error (searching for the longest
suffix that can be deleted rather than the longest suffix, and then seeing
if it is deletable) was built into an early encoding of the Porter stemmer
which was standardly used for many years, and lies behind the note in the
description of Snowball at http://snowball.tartarus.org/texts/introduction.html,
"A good test is to type in agreement. It should stem to agreement the same
word. If it stems to agreem there is an error."
More information about the Snowball-discuss
mailing list