[Snowball-discuss] Re: Possible memory leak in Snowballs Java stemmer

Richard Boulton richard@tartarus.org
Thu May 27 18:36:02 2004


Wolfram,

Sorry for the slow reply - for some reason your email didn't make it to 
my machine, so I only found out about it today when Martin pointed it 
out to me.

I havn't done any Java work on snowball for a fairly long time, but your 
analysis makes sense to me.  One thing I'm not sure about is whether 
this is a general problem that you're experiencing, or whether it is 
just an issue with your version of the JVM (or rather, its associated 
class library).  I can imagine that other implementations of Java might 
handle the stringbuffer allocation differently.  (Or maybe the behaviour 
is specified by the Java specification?)

I'm fairly happy to include your changes, but slightly worried that, for 
a version of Java which didn't exhibit the resource usage problems 
you're seeing when making hollow strings from stringbuffers, your 
changes would force an unnecessary string copy.

I wonder if making a new StringBuffer in setCurrent(), rather than 
modifying the existing StringBuffer, would fix the problem.  I fear that 
this would cause lots of temporary objects to be created, which could be 
less efficient (by making lots of work for the garbage collector to do).


>>Either the user or you library can do something like this
>>   String myStem = new String( germanStemmer.getCurrent());

This has the advantage of not forcing a string copy for applications 
where only a few stems are being calculated.  However, it's an ugly 
workaround for a problem in Java, IMHO.

>>I really would like to hear from your team, if you could reproduce my 
>>problem and find the solution helpful.

I havn't got time to try and reproduce the problem right now.  What 
would be very helpful would be if you could send a minimal Java program 
which exhibited the problem for you. (I imagine something like calling 
stem multiple times on a given word, and storing the result in a vector 
would be an appropriate approach.)  I could then verify that it's not 
just your Java setup which exhibits the problem.

>>Or did I overlook some other (memory saving) means of getting the 
>>desired stem?

Not that I can think of.

Comments would be welcome from any Java experts on the list.

-- 
Richard