[Snowball-discuss] memory use in java version

Mike Wertheim mike.wertheim at gmail.com
Fri Nov 16 06:38:56 GMT 2007


I'm working on a java web application that will be doing stemming frequently.

I need to decide how many SnowballProgram java objects to keep in
memory.  Currently, I have 14 of these objects in memory -- one for
each language.  When the code needs to stem a word, it gets the
SnowballProgram object for the desired language, synchronizes on that
object, calls the object's "stem" method, and then leaves the
synchronized block.

I'm concerned that this synchronization may become a performance
bottleneck, so I'm considering having a larger number of
SnowballProgram objects in memory.

I'd like to find out how much memory a SnowballProgram object keeps
around between invocations of the "stem" method.  If that's too
general of a question, then I'd like to know how much memory an
englishStemmer keeps in memory after having calculated the stems for
1000 different words.  (Does the SnowballProgram keep an internal
cache of already-stemmed words?)

If anyone else has worked on a similar java web app, I'd appreciate
hearing about what choices you made regarding memory usage, threads,
caching, etc.


Thanks!
Mike



More information about the Snowball-discuss mailing list