[Snowball-discuss] Swedish stems need patching

Karl Wettin karl.wettin at gmail.com
Sat Jan 19 19:17:16 GMT 2008


17 jan 2008 kl. 18.34 skrev Janko Luin:

> I have recently implemented an acts_as_ferret based search engine on  
> a Swedish site, and ran into the Swedish stemmer head-on. It's  
> mostly very good, but misses two common noun forms: '-an' and '- 
> ans'. Compare with the example list:
>
> klocka => klock
> klockan => klockan
> klockans => klockan
>
> These should all be "klock".

abundans
acceptans
adekvans
afrikaans
allesammans
allians
alltsammans
ambulans

These are all non noun forms suffixed with "ans". There is a lot more.  
You might want to run against SAOL or something to make sure your new  
rules really create unique stems.


-- 
karl



More information about the Snowball-discuss mailing list