[Snowball-discuss] Swedish stems need patching

Janko Luin janko at deltaprojects.se
Thu Jan 17 17:34:11 GMT 2008


I have recently implemented an acts_as_ferret based search engine on a  
Swedish site, and ran into the Swedish stemmer head-on. It's mostly  
very good, but misses two common noun forms: '-an' and '-ans'. Compare  
with the example list:

klocka => klock
klockan => klockan
klockans => klockan

These should all be "klock".

In diff form:

Index: stem_ISO_8859_1.sbl
===================================================================
--- stem_ISO_8859_1.sbl	(revision 500)
+++ stem_ISO_8859_1.sbl	(working copy)
@@ -40,7 +40,7 @@
              'a' 'arna' 'erna' 'heterna' 'orna' 'ad' 'e' 'ade' 'ande'  
'arne'
              'are' 'aste' 'en' 'anden' 'aren' 'heten' 'ern' 'ar' 'er'  
'heter'
              'or' 'as' 'arnas' 'ernas' 'ornas' 'es' 'ades' 'andes'  
'ens' 'arens'
-            'hetens' 'erns' 'at' 'andet' 'het' 'ast'
+            'hetens' 'erns' 'at' 'andet' 'het' 'ast' 'an' 'ans'
                  (delete)
              's'
                  (s_ending delete)
Index: stem_MS_DOS_Latin_I.sbl
===================================================================
--- stem_MS_DOS_Latin_I.sbl	(revision 500)
+++ stem_MS_DOS_Latin_I.sbl	(working copy)
@@ -40,7 +40,7 @@
              'a' 'arna' 'erna' 'heterna' 'orna' 'ad' 'e' 'ade' 'ande'  
'arne'
              'are' 'aste' 'en' 'anden' 'aren' 'heten' 'ern' 'ar' 'er'  
'heter'
              'or' 'as' 'arnas' 'ernas' 'ornas' 'es' 'ades' 'andes'  
'ens' 'arens'
-            'hetens' 'erns' 'at' 'andet' 'het' 'ast'
+            'hetens' 'erns' 'at' 'andet' 'het' 'ast' 'an' 'ans'
                  (delete)
              's'
                  (s_ending delete)


Med vänliga hälsningar
Janko Luin

___________________________________________________________________________
The Delta Projects, Janko Luin, utvecklare, janko at deltaprojects.se
telefon: +46 (0)8-667 76 90, mobil: +46 (0)739-78 29 27⠀

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20080117/8fb30561/attachment.html


More information about the Snowball-discuss mailing list