[Snowball-discuss] -ize and -ise, -ization and -isation
Martin Porter
martin.f.porter at gmail.com
Wed Jul 10 12:08:55 BST 2013
Milan,
Yes, it is intended, and your point is made from time to time. It has
come up on snowball discuss, although I can't find an example of it
now which is more than superficial. Here is an email reply from me
that dates back to 2001,
-------------------------------------------------
FROM martin.porter at muscat.com (Martin Porter)
TO "Andre McQuaid" <andre_mcquaid at hotmail.com>
ON Thu Feb 22 09:37:59 2001
Re: Stemming American English vs. English
-------------------------
Dear André,
I don't think you need worry too much about English/American spelling
differences, as far as the Porter stemming algorithm is concerned. The main
difference is that -ize and -ise endings are (as you note) applied
differently in American and English usage, and the algorithm treats -ize as
an ending but not -ise.
Many people have adapted the algorithm by adding -ise to the list of
endings, but on balance I think that is a mistake. There are too many words
ending -ise where -ise should not be removed.
American spelling is much more logical than English, and -ize/-ise usage is
no exception. So in fact the Porter stemmer probably does better with
American English than with English English!
As a matter of fact -ize usage in England used to be much closer to the
American style than it now is. Here are Thackeray's -ize endings from
Vanity Fair (published 1847):
agonized
apologize apologized
authorized
capitalized
characterize
cicatrized
civilized
harmonized
idolizes
particularize
patronize patronized patronizes
proselytizer
realize realized
recognize recognized
tyrannize tyrannized
victimized victimizer
Today many of these words would have to be spelled -ise in England, e.g.
characterise, realise, recognise ....
Hope this helps,
Martin
-------------------------------------------------end of quote
If exactly equal treatment between -ise and -ize is important, I'd be
inclined to respell -ize as -ise within the stemmer, even though that
results in the ending being left on.
Martin
On Wed, Jul 10, 2013 at 10:04 AM, Milan Bouchet-Valat
<nalimilan at club-internet.fr> wrote:
> Hi!
> I'm wondering whether it is intended that both the original Porter and the newer English stemmers consider US forms ending with -ize or -ization different from GB forms ending with -ise or -ization. . . . . . .
More information about the Snowball-discuss
mailing list