[Snowball-discuss] Re: stemming addition

Martin Porter martin_porter@SoftHome.net
Fri May 30 10:39:01 2003


Michael,

I think -ist was omitted from the original algorithm mainly because few
words in the test vocabulary had that ending. Looking at the evidence again,
it could I think be usefully added in the way you recommend. I regard the
original stemmer as "frozen", but as you probably know, there is a more
developed one at http://snowball.tartarus.org/english/stemmer.html, which
also lacks -ist, and which I may add in. (I am planning some revisions to
the stemmer - perhaps later this year.)

Thanks for your interest and help,

Martin

 
At 16:51 21/05/2003 -0400, Michael Holmes wrote:
>Mr. Porter,
>
>The addition of
>
>case 't': if (ends("\03" "ist")) { r("\00" ""); break; }
>
>in step 3 of your algorithm (C version) allows it to handle 'economist',
>'archaeologist', etc. without messing up 'gist', 'fist', and the like.
>Do you see any problems with that inclusion?
>
>Michael Holmes
>Georgia Institute of Technology
>mph@cc.gatech.edu
>
>