[Snowball-discuss] -ize and -ise, -ization and -isation
Milan Bouchet-Valat
nalimilan at club-internet.fr
Wed Jul 10 13:20:38 BST 2013
Le mercredi 10 juillet 2013 à 12:08 +0100, Martin Porter a écrit :
> Milan,
>
> Yes, it is intended, and your point is made from time to time. It has
> come up on snowball discuss, although I can't find an example of it
> now which is more than superficial. Here is an email reply from me
> that dates back to 2001,
OK, thanks for the pointers. I suspected it had been raised already, but I could not find a reference.
> -------------------------------------------------
>
> FROM martin.porter at muscat.com (Martin Porter)
> TO "Andre McQuaid" <andre_mcquaid at hotmail.com>
> ON Thu Feb 22 09:37:59 2001
>
> Re: Stemming American English vs. English
> -------------------------
>
>
> Dear André,
>
> I don't think you need worry too much about English/American spelling
> differences, as far as the Porter stemming algorithm is concerned. The main
> difference is that -ize and -ise endings are (as you note) applied
> differently in American and English usage, and the algorithm treats -ize as
> an ending but not -ise.
>
> Many people have adapted the algorithm by adding -ise to the list of
> endings, but on balance I think that is a mistake. There are too many words
> ending -ise where -ise should not be removed.
>
> American spelling is much more logical than English, and -ize/-ise usage is
> no exception. So in fact the Porter stemmer probably does better with
> American English than with English English!
>
> As a matter of fact -ize usage in England used to be much closer to the
> American style than it now is. Here are Thackeray's -ize endings from
> Vanity Fair (published 1847):
>
> agonized
> apologize apologized
> authorized
> capitalized
> characterize
> cicatrized
> civilized
> harmonized
> idolizes
> particularize
> patronize patronized patronizes
> proselytizer
> realize realized
> recognize recognized
> tyrannize tyrannized
> victimized victimizer
>
> Today many of these words would have to be spelled -ise in England, e.g.
> characterise, realise, recognise ....
>
> Hope this helps,
>
> Martin
>
>
> -------------------------------------------------end of quote
>
> If exactly equal treatment between -ise and -ize is important, I'd be
> inclined to respell -ize as -ise within the stemmer, even though that
> results in the ending being left on.
You're the expert, not me. ;-)
It seems to me equal treatment is a good thing, but on the other hand this means that the quality of US English stemming would be decreased. So I do not know what is best. If more similar cases exist, it could be interesting to have two versions of the English stemmer: one optimized for US English, and one ensuring for consistency. But maybe that's not worth it.
Regards
More information about the Snowball-discuss
mailing list