[Snowball-discuss] German stemmer: hemmung -> hemmung, enzymhemmung -> enzymhemm

F Wolff friedel at translate.org.za
Tue Apr 27 19:50:30 BST 2010


Op Di, 2010-04-27 om 16:27 +0200 skryf Martin Porter:
> Richard,
> 
> The stemming anomalies you note don't matter so much for general IR work,
> but do for the work that you are doing. It seems to me that you need a
> German word splitter, so that enzymhemmung is split to enzym+hemmung etc.
> Lemmatization systems do this. You can find sources by typing "german
> lemmatization" into Google. In the past I've been involved with two
> companies that do this work, Inxight and Teragram. Since working with them,
> both have been taken over by larger companies. Their work was proprietory,
> with a licence fee arrangement for their use. 
> 
> Are there open source solutions here? I do not know. If you, or anyone else,
> can share better information than I have it would be useful,
> 
> Martin

I am partly assuming that it will work, but the Hunspell spell checker
used in OpenOffice.org, Mozilla products and elsewhere can do
morphological analysis, which should include supports for compounding
for German. It might provide a good start.

Keep well
Friedel


--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/how-should-we-do-high-contrast-application-icons




More information about the Snowball-discuss mailing list