[Snowball-discuss] Re: Snowball/Java

Richard Boulton richard@tartarus.org
06 Mar 2002 15:19:17 +0000


On Mon, 2002-03-04 at 15:56, Chris Cleveland wrote:
> Did you see my note on the Snowball list a week or so ago? I was
> wondering how to get the Java stemmers, and couldn't find them in CVS.

Sorry: I've been busy and have only just got a piece of time to sort
this out.

The stemmers aren't in CVS because they're generated by running the
snowball program.  However, I've now set things up so that they will be
automatically generated and placed into an archive on the website.

They are now available from http://snowball.sf.net/snowball_java.tgz

A couple of notes about these:  the stemmers have not yet been tested
extensively: the english stemmer generates correct output for the test
vocabulary, except for one issue with character sets.  I havn't tested
the other stemmers at all.

I am in the process of writing documentation for the stemmers; in the
meantime, I think they should be fairly self-explanatory.

In the archive, there are a couple of support classes (Among.java and
SnowballProgram.java), and a directory containing a class for each
stemming algorithm.  There is also a test program "TestApp", which takes
as parameters a stemming algorithm name (with an initial capital letter,
eg "English"), and a filename, and returns a stemmed version of the
file.

Also, note that I've written these stemmers against JDK 1.3; I believe
it shouldn't be too hard to port to older JDKs, however.

Please, do let me know how you get on.

-- 
Richard

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss