[Snowball-discuss] german stemmer / Kürbisse

Grant Ingersoll gsingers at apache.org
Wed Oct 28 01:04:02 GMT 2009


Hi Wolfgang,

I can't speak for Sphinx, but I think in general with stemming you  
will always run into these kinds of situations where words that you  
think should stem a particular way don't.  The approach we take in  
Lucene/Solr is to either have a protected word list or another  
TokenFilter (in our chain) that handles the exceptions that we deem  
important.  YMMV with other search engines.

-Grant

On Oct 22, 2009, at 8:50 AM, Wolfgang Klinger wrote:

>
> *hiya!*
>
> I use the sphinx search engine and have problems with
> libstemmer_de.
>
> I have text that includes the german word "Kürbis".
> The plural of "Kürbis" is "Kürbisse". Now if I search für "Kürbisse"
> I would expect results für "Kürbis" too (that's why I use libstemmer).
>
> Obviously libstemmer_de creates "kurbiss" as stemmed form instead
> of "kurbis" and therefore I get no results.
> Is that a known problem? How can I solve it?
>
>
> tia, kind regards
> Wolfgang
>
>
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss





More information about the Snowball-discuss mailing list