[Snowball-discuss] A New Stemmer for Pali
Khemarato Bhikkhu
khemarato.bhikkhu at gmail.com
Sun Apr 28 11:11:58 BST 2024
Dear Snowball,
I'm a volunteer for SuttaCentral.net working on improving their search.
It's currently using ArangoDB, so I thought a good first step might be to
teach ArangoDB to natively "understand" Pali by adding a Pali stemmer to
Snowball.
Here's my first stab at it:
https://github.com/snowballstem/snowball/pull/197
Any and all feedback would be greatly appreciated. I'm especially curious
to know if Snowball supports separating compound words (by adding a space
between components?) and also how polished an algorithm should be to get
checked in. Do you want the algorithms to be polished and stable before
they get merged, or do you support a process of more continuous improvement?
Best regards from Thailand,
Khemarato Bhikkhu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/pipermail/snowball-discuss/attachments/20240428/0491501f/attachment.htm>
More information about the Snowball-discuss
mailing list