[Snowball-discuss] A New Stemmer for Pali

Khemarato Bhikkhu khemarato.bhikkhu at gmail.com
Sun Apr 28 11:11:58 BST 2024


Dear Snowball,

I'm a volunteer for SuttaCentral.net working on improving their search.
It's currently using ArangoDB, so I thought a good first step might be to
teach ArangoDB to natively "understand" Pali by adding a Pali stemmer to
Snowball.

Here's my first stab at it:
https://github.com/snowballstem/snowball/pull/197

Any and all feedback would be greatly appreciated.  I'm especially curious
to know if Snowball supports separating compound words (by adding a space
between components?) and also how polished an algorithm should be to get
checked in.  Do you want the algorithms to be polished and stable before
they get merged, or do you support a process of more continuous improvement?

Best regards from Thailand,
Khemarato Bhikkhu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/pipermail/snowball-discuss/attachments/20240428/0491501f/attachment.htm>


More information about the Snowball-discuss mailing list