[Snowball-discuss] Contributing to a Yiddish stemmer
Martin Porter
martin at porterloo.wanadoo.co.uk
Tue Apr 5 19:06:23 BST 2011
Will,
Hi. Your request is a bit unusual ... let's see what I could suggest.
If you're not a programmer, I would not advise trying to write programs.
What you might do is to formulate a set of rules for normalising the
vocabulary of Yiddish, and present it on the internet as a "challenge" for
others to code up. The rules could be set out like one of the stemmer
definitions in the snowball site,
http://snowball.tartarus.org/algorithms/german/stemmer.html
I think you should also try & contact others with an interest in retrieval
of texts in Yiddish. Searching in Google is perhaps the best way forward here.
I did not realise the stemming algorithms might be useful in translation.
I'm so involved in IR I tend to think of them as just an adjuct to IR work.
I take it that willhelton.com is your eponymous website. I may mail again
after thinking it over further, meanwhile (if you don't mind) I'll post this
to snowball-discuss, which sometimes generates extra useful ideas, and to
Pat Miles, who helped create the German and Russsian stemmers at snowball,
Martin
At 02:58 PM 4/5/2011 +0100, Will Helton wrote:
>
>Dear Dr Porter,
>
>I am a freelance translator working mainly from German to English. I do,
>however, had the need from time to time to do translations from Yiddish
>to English.
>
>For my translation projects I use the open source CAT tool OmegaT, which
>draws on the Snowball/Lucene stemmers and tokenizers. Currently, there
>is not a stemmer/tokenizer for Yiddish, however, and I'd like to help in
>creating one if I can.
>
>I'm afraid I have no programming knowledge, so am not sure how I could
>help, but thought I would contact you and see.
>
>In the simplest of terms (perhaps a dangerous statement to make),
>Yiddish works from a very slightly modified German grammar, so hopefully
>that would help simply things at least a bit.
>
>If you can advise how I can help contribute to this, I would be glad to try.
>
>Many thanks for your time.
>
>Regards,
>
>Will Helton
>
>
>
More information about the Snowball-discuss
mailing list