[Snowball-discuss] Java stemmers

Richard Boulton richard@tartarus.org
27 Jan 2002 16:51:18 +0000


Just to note: I had a quick fiddle with the Java stemmer support
application, and seem to have solved (most of) the slowness problems.

I modified the applications so that it explicitly reads in blocks of 8k,
(rather than using a BufferedInputStream, which I would have thought
would internally buffer, but didn't seem to).  This improves the run
time to 0.8 seconds for stemming english/voc.txt, and 0.2 seconds for
each repetition of the stem step.  This is much closer to what would be
hoped for; a great deal of the remaining time could be setup time, but
this performance should be good enough for now.

See:
http://cvs.sf.net/cgi-bin/viewcvs.cgi/snowball/website/net/sf/snowball/TestApp.java.diff?r1=1.2&r2=1.3
for the patch I applied.

Still to be done is to implement flow analysis so that the java compiler
won't complain about unreachable code: this should be reasonably simple,
but if anyone is desperate for a particular stemmer in the mean time,
they can always edit the generated code to remove unreachable
statements.

-- 
Richard

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss