[Snowball-discuss] Pystemmer

Uwe Schmitt uschmitt at mineway.de
Thu Sep 18 15:28:07 BST 2008


Hi,

I upgraded the Python wrapper today and want to report about
it:

1) I changed setup.py to use cython instead of pyrex.
   cython replaces pyrex and  results in 40% speed improvement
   according to the  included benchmark.py script.


2) I took the libstemmer_c from
   http://snowball.tartarus.org/dist/libstemmer_c.tgz
   (I tried libstemmer_c from the svn trunk, but got
    some problems)

3) In order to get setup.py work, I had to remove
   one line from Pytemmers MANIFEST file.
   Else I got duplicate symbols during linking.
   As a consequence, the Python module only supports
   UTF-8 encoding.
   I prefer this to using different ISO-88XX encodings
   depending on the stemmed language.

I could provide the result as a tar ball, if anybody
is interested in it.

Greetings, Uwe

-- 
Dr. rer. nat. Uwe Schmitt
F&E Mathematik
 
mineway GmbH
Science Park 2
D-66123 Saarbrücken
 
Telefon: +49 (0)681 8390 5334
Telefax: +49 (0)681 830 4376
 
uschmitt at mineway.de
www.mineway.de
 
Geschäftsführung: Dr.-Ing. Mathias Bauer
Amtsgericht Saarbrücken HRB 12339





More information about the Snowball-discuss mailing list