[Snowball-discuss] Porter2 algorithm question
Martin Holmes
mholmes at uvic.ca
Thu May 30 18:00:50 BST 2019
Hi all,
I'm in the process of implementing the Porter2 algorithm in XSLT 3 based
on the description here:
<https://snowballstem.org/algorithms/english/stemmer.html>
and I'm a little confused by the use of the word "longest" in Step 2 and
Step 3. Step 3, for example, says:
<quote>
Step 3:
Search for the longest among the following suffixes, and, if found
and in R1, perform the action indicated.
tional: replace by tion
ational: replace by ate
...
</quote>
My initial assumption was that "longest" meant that you would search
first for the longest suffix, and replace that; only if that didn't
succeed would you search for a shorter instance. However, in the first
two suffixes in this list, the second is longer and contains the first.
I'm puzzled as to why they're presented in this order (and they seem to
be processed in that order too, if I'm reading the Snowball code
correctly). So the instruction would seem to suggest that you replace
"ational" first, then replace "tional", but the order of items
contradicts this.
Could anyone clarify this?
All help appreciated,
Martin Holmes
UVic Humanities Computing and Media Centre
More information about the Snowball-discuss
mailing list