[Snowball-discuss] Fwd: How to add Tamil Support to stemmer?

Richard Boulton richard at tartarus.org
Wed Mar 27 13:01:38 GMT 2013


On 27 March 2013 12:03, Shrinivasan T <tshrinivasan at gmail.com> wrote:
> The patch for stemmer for tamil language is here.
> https://github.com/rdamodharan/tamil-stemmer/blob/master/snowball-tamil.patch
>
> We apply the patch and compile stemmer to make it work with tamil language.
>
> How to add the patch to the upstream stemmer?

"rdamodharan" has actually done exactly what's needed for this, by
submitting a pull request on github to our repository;
https://github.com/snowballstem/snowball/pull/2  Unfortunately, I
haven't had a chance to look at this so far; I will make sure to make
time to do so over the next few days.

I have no way of evaluating the results of this stemmer, but am
willing to take the word of Tamil speakers as to whether the algorithm
is of use.  There may be some changes to the code that should be made
to improve performance, as Martin mentioned.  One thing that would be
of great use is a sample dataset, similar to that in
https://github.com/snowballstem/snowball-data/blob/master/english/voc.txt,
together with a sample file containing the corresponding expected
output.

-- 
Richard



More information about the Snowball-discuss mailing list