[Snowball-discuss] Java stemmers

Martin Porter martin_porter@softhome.net
Mon, 21 Jan 2002 07:23:39 -0700


Commit whenever you like, it sounds good to me.

There are a few queries: would it not be best to sort out the unicode issue
if we are supporting Java?

Is there just another codegenerator module that gets linked in with the rest
of Snowball? Is it in ANSI C?

I wonder how you got around the use of goto's in my codegenerator ...

I hit a speed issue with the Java version of the Porter stemmer that had the
same order-of-magnitude difference from the C version that you report. I
found that all the time was being lost in IO. You can easily test that by
calling the stemmer up twice per word, to see how much time is spent in the
central stemming bit. It would be interesting if you had the same problem.

I found that no amount of fiddling with libraries improved things, and that
was in fact one of the major things that rather put me off Java.

Martin



_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss