[Snowball-discuss] Updated Python interface and new Jython interface to Snowball

Olivier Bornet Olivier.Bornet at idiap.ch
Fri Feb 25 13:56:49 GMT 2005


Hi again,

I'm working on a IR project mainly coded in Python. In this project, I
was doing stemming with the Python class PorterStemmer from "The Porter
Stemming Algorithm" web site[1]. As we want to support different
language than english, I'm now switching to the Snowball stemming
system.

Our project is based on a Python library which is used from either
Python or Java programs. Thanks to PyStemmer[2], the switch from
PorterStemmer class to Snowball was done without problems for Python
programs.

The major problem I have had was for integrating the Snowball stemming
system inside the Java programs. Because the stemming is not done in the
Java code, but in the Python library used by Java (via Jython[3]). Using
Jython is very interesting for allowing the Java code to use the Python
libraries. Unfortunately, in this case, the Python code can't use C
extension, as it is done with PyStemmer.

So, I have created a Python interface to the Java code generated by
Snowball. This enable our Python library to use the Snowball stemming
system from either native Python code (via PyStemmer) or from Java code
(with Jython).

To resume this, we have two way of using Snowball in our project:

  a. Python native program -> our Python library -> Snowball as C
     extension to Python
  b. Java program -> our Python library -> Snowball as Java extension to
     Python

So, in short: I have now a Python interface adapted to the current
Snowball CVS (snowball/snowball directory) and a new Jython interface to
the same Snowball CVS. If there is some interest, I will be happy to
share these interfaces with Snowball. I'm ok to either commit these
changes to the CVS, send to this mailing list, or put on a specific web
site.

Thanks in advance for your feedback, and thanks for Snowball.

        Olivier

[1] http://www.tartarus.org/~martin/PorterStemmer/
[2] http://sourceforge.net/projects/pystemmer/
[3] http://www.jython.org/

-- 
   . __    . ___  __.  | Olivier Bornet         Olivier.Bornet at idiap.ch
  / /  `  / /  / /  /  | IDIAP             http://www.idiap.ch/~bornet/
 / /   / / /--/ /--'   | CP 592        http://www.idiap.ch/~bornet/pgp/
/ /__.' / /  / /       | CH-1920 Martigny           PGP-key: 0xC53D9218
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20050225/c747b3f8/attachment.bin


More information about the Snowball-discuss mailing list