[Snowball-discuss] russian stemmer in java

Антон Потапов Антон Потапов"
Tue Nov 19 19:40:01 2002


Hello,

First of all, I'd like to tell you that I was simply happy 
to find such an astonishing set of stemmers and am very grateful. 
Your work is priceless and brilliant. 

I have a question about russian stemmer in java. The problem is that I cannot use russian stemmer to stem russian words. The russian java stemmer makes text file which contains each word on each new line, but it does nothing with the word. Stemmer writes word to the file as is. I think it is the problem with encoding.

To open file I use:

BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(args[1]), "KOI8_R"));

where is args[1] == text file in russian

BUT, when I read file:

 int character;
 while ((character = reader.read()) != -1) {
   char ch = (char) character;
   input.append(Character.toLowerCase(ch));
   System.out.println(input.toString());
 }

the out put is NOT in KOI-8R :(


Please advice.

Thanks!

Regards,
Anton Potapov