[Snowball-discuss] Porter2 algorithm question

Martin Holmes mholmes at uvic.ca
Thu May 30 18:00:50 BST 2019


Hi all,

I'm in the process of implementing the Porter2 algorithm in XSLT 3 based 
on the description here:

<https://snowballstem.org/algorithms/english/stemmer.html>

and I'm a little confused by the use of the word "longest" in Step 2 and 
Step 3. Step 3, for example, says:


<quote>
Step 3:
     Search for the longest among the following suffixes, and, if found 
and in R1, perform the action indicated.

     tional:   replace by tion
     ational:   replace by ate
...
</quote>

My initial assumption was that "longest" meant that you would search 
first for the longest suffix, and replace that; only if that didn't 
succeed would you search for a shorter instance. However, in the first 
two suffixes in this list, the second is longer and contains the first. 
I'm puzzled as to why they're presented in this order (and they seem to 
be processed in that order too, if I'm reading the Snowball code 
correctly). So the instruction would seem to suggest that you replace 
"ational" first, then replace "tional", but the order of items 
contradicts this.

Could anyone clarify this?

All help appreciated,
Martin Holmes
UVic Humanities Computing and Media Centre




More information about the Snowball-discuss mailing list