[Snowball-discuss] Java stemmers

Richard Boulton richard@tartarus.org
23 Jan 2002 20:24:17 +0000


Two further notes about the java stemmer output:

i)  Some stemmers, (eg, the french stemmer) will currently fail to
compile because the java output from snowball contains constructs such
as:
    switch (foo) {
       case a:
          return false;
          break;
       ...
    }

The compiler will complain in this situation because the "break;"
statement can never be reached.  This occurs whenever a fail statement
is used.  To solve it properly, I'm going to have to add some code to
check whether code is unreachable and prevent it being output if so.

ii) I've tried modifying the TestApp to try to see the cause of the
performance problems: it looks initially as if Martin's right.  Time to
stem english calling stem() once on each word is 5.4 seconds; calling
stem() 10 times for each word takes 6.6 seconds. => about 0.13 seconds
are taken to do the stemming, and the rest is set up and IO time.

I tried modifying the IO so that it buffers it, and checking that the IO
really is being buffered by using strace (it is), but havn't been able
to improve matters at all.

It'll probably be a little while before I have time to look at this in
detail: I shall ask around to see if anyone has any good ideas why the
IO is so slow...

-- 
Richard

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss