[Snowball-discuss] Swedish stems need patching
Karl Wettin
karl.wettin at gmail.com
Sat Jan 19 19:17:16 GMT 2008
17 jan 2008 kl. 18.34 skrev Janko Luin:
> I have recently implemented an acts_as_ferret based search engine on
> a Swedish site, and ran into the Swedish stemmer head-on. It's
> mostly very good, but misses two common noun forms: '-an' and '-
> ans'. Compare with the example list:
>
> klocka => klock
> klockan => klockan
> klockans => klockan
>
> These should all be "klock".
abundans
acceptans
adekvans
afrikaans
allesammans
allians
alltsammans
ambulans
These are all non noun forms suffixed with "ans". There is a lot more.
You might want to run against SAOL or something to make sure your new
rules really create unique stems.
--
karl
More information about the Snowball-discuss
mailing list