[Snowball-discuss] Lithuanian stemmer in snowballstemmer Python library
Olly Betts
olly at survex.com
Wed Mar 4 19:24:03 GMT 2020
On Wed, Mar 04, 2020 at 08:52:40AM +0100, Konstantin Gavras wrote:
> I found the solution to my problem. I had the PyStemmer library
> installed on my conda system as well.
Glad you managed to resolve this.
> This caused snowballstemmer to use PyStemmer functions, as mentioned in
> the repo by shibukawa
>
> "if *PyStemmer* is installed, |snowballstemmer.stemmer| returns
> |PyStemmer|'s |Stemmer| objects. This |Stemmer| object has same methods
> (|Stemmer.stemWord()|, |Stemmer.stemWords()|).".
>
> Unfortunately, PyStemmer only supports a fraction of the snowballstemmer
> languages.
The PyStemmer on pypi is unfortunately really out of date. We really
need someone to volunteer to act as pypi maintainer for it, but so far
nobody has. It's not a task I want to take on myself, especially as
I don't use Python much.
Meanwhile you can install a version which wraps Snowball 2.0.0 direct
from the github repo:
pip install git+git://github.com/snowballstem/pystemmer
I think ideally I'd prefer somebody else to take over PyStemmer entirely
- we don't try to support bindings to the C stemmer from any other
languages and it's only for historical reasons that PyStemmer is under
snowballstem. With limited resources I think it makes more sense to
focus snowballstem.org on the algorithms and the compiler.
Part of that could be trying to further close the gap between the pure
Python stemmers and PyStemmer. Currently they're something like 15
times slower (but that used to be 30 times), or 2-2.6 times slower if
you use PyPy (used to be 9 times), but if you look at the generated
code it seems likely that could be further improved.
Cheers,
Olly
More information about the Snowball-discuss
mailing list