[Snowball-discuss] Lithuanian stemmer

Gerrit De Meulder de_meulder_gerrit at hotmail.com
Sat Mar 9 18:43:50 GMT 2019


Hello Jakub,

What you need is a compiled version of the stemmer (snowball.exe, if you're on Windows like me)  ,
and then convert the snowball file with this program to a python script

I'm no expert on Python or C, but maybe I can help you along the road :

1) I assume maybe you cloned the source from https://github.com/snowballstem/pystemmer ?
note that there is also a https://github.com/snowballstem/snowball/tree/master/python/snowballstemmer
(newer version?) under snowball project itself :

2) From what I understand of the readme.md file of Pystemmer project,
it uses the c-compiler as a plugin, but maybe that's no longer necessary,
if you compile *.sbl snowball  file to *.py, as follows :

3) You will have to "make" the snowball compiler with a "make" file,
and also the same for any snowball-language algorithm you generate.

4) I'm on windows, so if you're using Linux or mac I don't know how this works  and you will have to adjust for OS what follows,
but when I compiled a local copy of the "snowball.exe" standalone compiler,
from https://github.com/snowballstem/snowball/
I used the GCC compiler (GNU Compiler Collection), which comes with android studio,
by running a batch file in the directory that contains the c- sources and headers,
The batch file  contains this (single line, use without [ ] brackets):

[gcc space.c tokeniser.c analyser.c generator.c driver.c generator_csharp.c generator_java.c generator_js.c generator_python.c generator_rust.c generator_go.c generator_pascal.c -o snowball.exe]

5) Quick solution (otherwise you have to work with PATH variable etc.):
Copy/move this "snowball.exe" to the directory with the algorithms, it can then compile snowball algorithms to python, java, c etc ...
like so: (also line from a batch file,use without [ ] brackets,this can be more lines if you compile more languages)

[snowball lithuanian.sbl -PY -u -o LithuanianStemmer -n LithuanianStemmer -p org.tartarus.snowball.SnowballStemmer -U 20190309]

note : -PY is used to generate python files and 2019(/)03(/)09 is the date I used to serialize.
note 2: -u was for using UTF8, but in the latest version this seems only necessary for "c"-compilation, you can probably remove this.
-> if it works well, you should now have "LithuanianStemmer.py"

6) from https://github.com/snowballstem/snowball/tree/master/python/snowballstemmer, you need the two files:
among.py<https://github.com/snowballstem/snowball/blob/master/python/snowballstemmer/among.py> and basestemmer.py<https://github.com/snowballstem/snowball/blob/master/python/snowballstemmer/basestemmer.py>, because they are referenced in the "LithuanianStemmer.py" file.

Happy Coding,

Gerrit De Meulder

________________________________
Van: Snowball-discuss <snowball-discuss-bounces at lists.tartarus.org> namens Jakub Młynarz <mlynarzsrem at gmail.com>
Verzonden: woensdag 27 februari 2019 17:57
Aan: snowball-discuss at lists.tartarus.org
Onderwerp: [Snowball-discuss] Lithuanian stemmer

Hi,
I'm going to create a project in Python which should use Lithuanian stemmer.
I have read that the Lithuanian stemmer algorithm was created in 2018 but when I tried to use PyStemmer or pure snowball stemmer library there was no possibility to use that algorithm. Is there any possibility to do this? Could I get any hints? I also have a problem with building a project using 'make' command. I need 'algorithms.mk<http://algorithms.mk>' file and I have no idea how to create it. How can I do it?

Thank you in advance
Jakub Młynarz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/pipermail/snowball-discuss/attachments/20190309/295780a5/attachment.html>


More information about the Snowball-discuss mailing list