[Snowball-discuss] Swedish stems need patching
Martin Porter
martin.porter at grapeshot.co.uk
Tue Jan 29 16:54:34 GMT 2008
Janko,
Either -an, -ans was overlooked when the stemmer was written, or the
effect of including -an, -ans was found to be generally damaging because
of the additional mis-stemming it produced (Karl Wettin's example
list...) At this distance from the original work I cannot say which is
true.
Presumably you have included -an, -ans in your version? If so, could you
report back later on whether you encounter mis-stemming problems? We can
continue from there.
(Something you have to do in introducing a new rule is to compare the
sample vocabulary stemmed with the extra rule and stemmed without the
extra rule. You draw up 3 columns,
word stem1 stem2 *
... ... ...
and put '*' at the end when stem1 and stem2 differ. Then you go through
by hand looking for the * indicator and monitoring the improvements and
degradations. As a native speaker of Swedish, you would of course be
much better placed than me to judge the result.)
I hope this helps,
Martin
On Thu, 2008-01-17 at 18:34 +0100, Janko Luin wrote:
> I have recently implemented an acts_as_ferret based search engine on a
> Swedish site, and ran into the Swedish stemmer head-on. It's mostly
> very good, but misses two common noun forms: '-an' and '-ans'.
>
More information about the Snowball-discuss
mailing list