[Snowball-discuss] -ize and -ise, -ization and -isation

Martin Porter martin.f.porter at gmail.com
Wed Jul 10 12:08:55 BST 2013


Milan,

Yes, it is intended, and your point is made from time to time. It has
come up on snowball discuss, although I can't find an example of it
now which is more than superficial. Here is an email reply from me
that dates back to 2001,

-------------------------------------------------

FROM martin.porter at muscat.com (Martin Porter)
TO "Andre McQuaid" <andre_mcquaid at hotmail.com>
    ON Thu Feb 22 09:37:59 2001

Re: Stemming American English vs. English
-------------------------


 Dear André,

 I don't think you need worry too much about English/American spelling
 differences, as far as the Porter stemming algorithm is concerned. The main
 difference is that -ize and -ise endings are (as you note) applied
 differently in American and English usage, and the algorithm treats -ize as
 an ending but not -ise.

 Many people have adapted the algorithm by adding -ise to the list of
 endings, but on balance I think that is a mistake. There are too many words
 ending -ise where -ise should not be removed.

 American spelling is much more logical than English, and -ize/-ise usage is
 no exception. So in fact the Porter stemmer probably does better with
 American English than with English English!

 As a matter of fact -ize usage in England used to be much closer to the
 American style than it now is. Here are Thackeray's -ize endings from
 Vanity Fair (published 1847):

 agonized
 apologize apologized
 authorized
 capitalized
 characterize
 cicatrized
 civilized
 harmonized
 idolizes
 particularize
 patronize patronized patronizes
 proselytizer
 realize realized
 recognize recognized
 tyrannize tyrannized
 victimized victimizer

 Today many of these words would have to be spelled -ise in England, e.g.
 characterise, realise, recognise ....

 Hope this helps,

 Martin


-------------------------------------------------end of quote

If exactly equal treatment between -ise and -ize is important, I'd be
inclined to respell -ize as -ise within the stemmer, even though that
results in the ending being left on.

Martin

On Wed, Jul 10, 2013 at 10:04 AM, Milan Bouchet-Valat
<nalimilan at club-internet.fr> wrote:
> Hi!

> I'm wondering whether it is intended that both the original Porter and the newer English stemmers consider US forms ending with -ize or -ization different from GB forms ending with  -ise or -ization. . . . . . .



More information about the Snowball-discuss mailing list