[Snowball-discuss] Small changes to English stemmer
Martin Porter
martin.porter at grapeshot.co.uk
Tue Jan 10 15:02:54 GMT 2006
There have been two small changes to the English (Porter2) stemming algorithm.
The first is that the Rule
ied ies
replace by ie if preceded by just one letter, otherwise by i
has been changed to
ied ies
replace by i if preceded by more than one letter, otherwise by ie
There is a corresponding change in the Snowball script:
'ied' 'ies'
((next atlimit <-'ie') or <-'i')
'ied' 'ies'
((hop 2 <-'i') or <-'ie')
This ONLY affects the two 'words' ied and ies. Formerly they stemmed to i, now
they stem to ie.
The second is that the line,
do ( ['y'] v <-'Y' set Y_found)
which did not match the Rule
Set initial y ... to Y,
has been changed to
do ( ['y'] <-'Y' set Y_found)
which does.
(The problem was whether to make the rule match the coding or the coding match
the rule. The point is that in English initial y, when followed by consonant,
is a vowel, but that only archaic words have this shape:- yclept and so on. I
have decided to keep things simple and treat initial y as a consonant in all
cases.)
Both these changes are trivial.
There is a rule to remove initial apostrophe in the stemmer, which I have
come to
think is a bit feeble, but it can be left in for now.
Martin
More information about the Snowball-discuss
mailing list