[Snowball-discuss] Pure-Python implementation of the Snowball stemmers

Florian Brucker mail at florianbrucker.de
Tue Aug 5 20:06:06 BST 2014


Hello,

I recently required a pure-Python implementation of the Snowball
stemmers (my target platform doesn't support Python C extensions and
hence I cannot use pystemmer). Since I couldn't find an existing
pure-Python version I wrote one.

Instead of directly translating the stemming algorithms to Python I
wrote a Snowball-to-Python compiler which translates Snowball programs
into (pure) Python modules. It's called "sbl2py" and can be found on PyPI:

    https://pypi.python.org/pypi/sbl2py

Using sbl2py I then translated the existing Snowball stemmers into
Python modules and wrapped them in an API that matches that of
pystemmer. The resulting "purestemmer" package can also be found on PyPI:

    https://pypi.python.org/pypi/purestemmer

The module has been tested extensively and gives the same results as
pystemmer for all words in the Snowball test collection.

The performance of purestemmer is currently not great: On average it's
about 100x slower than pystemmer. Nevertheless it might come in handy
for some people. There's also probably a lot of room for improvements,
since I dind't actively optimize the Python code that sbl2py generates.

If you're interested, the code for both sbl2py and purestemmer is on GitHub:

    https://github.com/torfuspolymorphus/sbl2py
    https://github.com/torfuspolymorphus/purestemmer


Best regards,
Florian



More information about the Snowball-discuss mailing list