<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Hello,<br/>

<br/>

I have a problem using the german2 stemmer on hyphenated compound words with ElasticSearch.</div>


<div> </div>


<div>As an example I have 2 words: "Export-Schnittstelle" and "Schnittstelle", for these words the stemmer creates "Export-Schnittstell" or "Schnittstell" respectively, which is great because with the right tokenization I can now search for "Schnitstelle" (which the stemmer within my search analyzer will transform to "Schnittstell") and it will match the second part from the word "Export-Schnittstelle" aka "Export-Schnittstell". <br/>

<br/>

Now I would expect that this is how it works for all hyphenated compound words. But unfortunately that's not the case. So I now have 2 other words "PA-Schiene" and "Schiene". Here the stemmer creates two completely different words: "PA-Schi" and "Schien".  </div>


<div> </div>


<div>Can someone explain to my why this is and if there is a way to fix this? Maybe by using different stemming, like light_german oder minimal_german?<br/>

<br/>

I'll post this on the elasticsearch discussion board as well, as I'm not sure if this is entirely the right place for the question, but I'd be thankful for any insight on the topic.<br/>

<br/>

Thanks in advance.<br/>

<br/>

Best Regards<br/>

Simon</div></div></body></html>