[Snowball-discuss] Possible Error in Porter2 Example Stems

Shane Taylor shanet at webassign.net
Thu May 1 17:58:03 BST 2014


Thanks! I think I see now why my implementation differed; I incorrectly
removed "li" after the "ently" match failed because "ently" was not in R1.
I'm correcting my implementation.

*Shane Taylor*
WebAssign | Technical Writer
Website <http://www.webassign.net> | Instructor
Help<http://www.webassign.net/manual/instructor_guide/>| Student
Help <http://www.webassign.net/manual/student_guide/>


On Thu, May 1, 2014 at 12:33 PM, Martin Porter <martin.f.porter at gmail.com>wrote:

> Shane,
>
> That is interesting. As the algorithm is defined, fluently should go
> to fluentli, although the way you've done it seems to give a better
> result!
>
> The reason is that in step 2 the longest suffix found is "ently", and
> that is not removed because the residual stem "fl" is then too short.
> And in step 2 (like many of the other steps), you only take action on
> the longest found suffix.
>
> One could redo my algorithm at this point, although since there's only
> one word difference between your implementation and the defined
> algorithm (at least in the sample vocabulary), it is not of course
> desperately important.
>
> Incidentally, I did have doubts about the treatment of -ly after
> introducing it into Porter2, but I think it's too late to backtrack
> now. (For example, the surname "Carly" stems to "car", which is
> emarrassing.)
>
> I bet there's a Javascript Porter2 somewhere, but I've lost track of
> all the implementations of the stemmers that are now around. Does
> anyone else on snowball-discuss know of one?
>
> Martin
>
> -------------------------------------------
>
> On 4/30/14, Shane Taylor <shanet at webassign.net> wrote:
> > I've implemented an XSL version of the Porter2 algorithm for use in
> > generating a stemmed help index.
> >
> > I've used the example vocabulary & stems to test my implementation (
> > http://snowball.tartarus.org/algorithms/english/diffs.txt), and
> everything
> > passes except for one word: fluently.
> >
> > My implementation stems it to "fluent" but the example shows "fluentli".
> > Looking at other "-ently" words, they all get stemmed in the example
> > vocabulary to the same stem as the root word, except for "fluently". And,
> > as I follow along the algorithm manually, I can't figure out why fluently
> > wouldn't do the same.
> >
> > So, I think the example stems are incorrect in this one instance.
> >
> > Thank you!
> >
> > PS: If you're aware of an open source and fully compliant JavaScript
> > implementation of Porter2, that would save me work on the next phase of
> my
> > project: implementing the same stemming for search terms entered in the
> > online help.
> >
> > *Shane Taylor*
> > WebAssign | Technical Writer
> > Website <http://www.webassign.net> | Instructor
> > Help<http://www.webassign.net/manual/instructor_guide/>| Student
> > Help <http://www.webassign.net/manual/student_guide/>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20140501/29176fd5/attachment.html>


More information about the Snowball-discuss mailing list