[Snowball-discuss] Pure-Python implementation of the Snowball stemmers

Florian Brucker mail at florianbrucker.de
Thu Aug 7 22:42:45 BST 2014


the performance is indeed far from optimal. So far my focus was on
getting everything working correctly rather than fast, so there should
be some low hanging fruits w.r.t. performance. However, most platforms
do support Python C extensions, and there is no doubt that a pure Python
implementation will never be as fast as pystemmer. The main point of
purestemmer is to provide good stemming algorithms on platforms were you
cannot use pystemmer. Better a slow stemmer than no stemmer :)

Still, I'll see what I can do to speed things up a little.


On 07.08.2014 10:13, Martin Porter wrote:
> Florian,
> That is very nice news.
> About a decade ago, I'd have created (or tried to create)  tarballs of
> your work on the snowball site, with announcements and links for
> downloading. Nowadays it is much easier and more useful to find it on
> GitHub, and just to have a significant mention on the snowball site
> with a link. I'll try and put that in place in the next few days.
> The performance is a little disappointing however. I realise Python is
> slower than C, but I thought it was still reasonably competitive with
> C.
> There was an external Pascal codegenerator done a few years ago, but
> apart from that this is the only codegenerator othere than the
> "native" ones to C and java.
> Martin

More information about the Snowball-discuss mailing list