[Snowball-discuss] Possible Error in Porter2 Example Stems
Martin Porter
martin.f.porter at gmail.com
Thu May 1 17:33:05 BST 2014
Shane,
That is interesting. As the algorithm is defined, fluently should go
to fluentli, although the way you've done it seems to give a better
result!
The reason is that in step 2 the longest suffix found is "ently", and
that is not removed because the residual stem "fl" is then too short.
And in step 2 (like many of the other steps), you only take action on
the longest found suffix.
One could redo my algorithm at this point, although since there's only
one word difference between your implementation and the defined
algorithm (at least in the sample vocabulary), it is not of course
desperately important.
Incidentally, I did have doubts about the treatment of -ly after
introducing it into Porter2, but I think it's too late to backtrack
now. (For example, the surname "Carly" stems to "car", which is
emarrassing.)
I bet there's a Javascript Porter2 somewhere, but I've lost track of
all the implementations of the stemmers that are now around. Does
anyone else on snowball-discuss know of one?
Martin
-------------------------------------------
On 4/30/14, Shane Taylor <shanet at webassign.net> wrote:
> I've implemented an XSL version of the Porter2 algorithm for use in
> generating a stemmed help index.
>
> I've used the example vocabulary & stems to test my implementation (
> http://snowball.tartarus.org/algorithms/english/diffs.txt), and everything
> passes except for one word: fluently.
>
> My implementation stems it to "fluent" but the example shows "fluentli".
> Looking at other "-ently" words, they all get stemmed in the example
> vocabulary to the same stem as the root word, except for "fluently". And,
> as I follow along the algorithm manually, I can't figure out why fluently
> wouldn't do the same.
>
> So, I think the example stems are incorrect in this one instance.
>
> Thank you!
>
> PS: If you're aware of an open source and fully compliant JavaScript
> implementation of Porter2, that would save me work on the next phase of my
> project: implementing the same stemming for search terms entered in the
> online help.
>
> *Shane Taylor*
> WebAssign | Technical Writer
> Website <http://www.webassign.net> | Instructor
> Help<http://www.webassign.net/manual/instructor_guide/>| Student
> Help <http://www.webassign.net/manual/student_guide/>
>
More information about the Snowball-discuss
mailing list