[Snowball-discuss] Pure-Python implementation of the Snowball stemmers
mail at florianbrucker.de
Tue Aug 5 20:06:06 BST 2014
I recently required a pure-Python implementation of the Snowball
stemmers (my target platform doesn't support Python C extensions and
hence I cannot use pystemmer). Since I couldn't find an existing
pure-Python version I wrote one.
Instead of directly translating the stemming algorithms to Python I
wrote a Snowball-to-Python compiler which translates Snowball programs
into (pure) Python modules. It's called "sbl2py" and can be found on PyPI:
Using sbl2py I then translated the existing Snowball stemmers into
Python modules and wrapped them in an API that matches that of
pystemmer. The resulting "purestemmer" package can also be found on PyPI:
The module has been tested extensively and gives the same results as
pystemmer for all words in the Snowball test collection.
The performance of purestemmer is currently not great: On average it's
about 100x slower than pystemmer. Nevertheless it might come in handy
for some people. There's also probably a lot of room for improvements,
since I dind't actively optimize the Python code that sbl2py generates.
If you're interested, the code for both sbl2py and purestemmer is on GitHub:
More information about the Snowball-discuss