[Snowball-discuss] -ize and -ise, -ization and -isation

Milan Bouchet-Valat nalimilan at club-internet.fr
Wed Jul 10 13:20:38 BST 2013


Le mercredi 10 juillet 2013 à 12:08 +0100, Martin Porter a écrit :
> Milan,
> 
> Yes, it is intended, and your point is made from time to time. It has
> come up on snowball discuss, although I can't find an example of it
> now which is more than superficial. Here is an email reply from me
> that dates back to 2001,
OK, thanks for the pointers. I suspected it had been raised already, but I could not find a reference.

> -------------------------------------------------
> 
> FROM martin.porter at muscat.com (Martin Porter)
> TO "Andre McQuaid" <andre_mcquaid at hotmail.com>
>     ON Thu Feb 22 09:37:59 2001
> 
> Re: Stemming American English vs. English
> -------------------------
> 
> 
>  Dear André,
> 
>  I don't think you need worry too much about English/American spelling
>  differences, as far as the Porter stemming algorithm is concerned. The main
>  difference is that -ize and -ise endings are (as you note) applied
>  differently in American and English usage, and the algorithm treats -ize as
>  an ending but not -ise.
> 
>  Many people have adapted the algorithm by adding -ise to the list of
>  endings, but on balance I think that is a mistake. There are too many words
>  ending -ise where -ise should not be removed.
> 
>  American spelling is much more logical than English, and -ize/-ise usage is
>  no exception. So in fact the Porter stemmer probably does better with
>  American English than with English English!
> 
>  As a matter of fact -ize usage in England used to be much closer to the
>  American style than it now is. Here are Thackeray's -ize endings from
>  Vanity Fair (published 1847):
> 
>  agonized
>  apologize apologized
>  authorized
>  capitalized
>  characterize
>  cicatrized
>  civilized
>  harmonized
>  idolizes
>  particularize
>  patronize patronized patronizes
>  proselytizer
>  realize realized
>  recognize recognized
>  tyrannize tyrannized
>  victimized victimizer
> 
>  Today many of these words would have to be spelled -ise in England, e.g.
>  characterise, realise, recognise ....
> 
>  Hope this helps,
> 
>  Martin
> 
> 
> -------------------------------------------------end of quote
> 
> If exactly equal treatment between -ise and -ize is important, I'd be
> inclined to respell -ize as -ise within the stemmer, even though that
> results in the ending being left on.
You're the expert, not me. ;-)

It seems to me equal treatment is a good thing, but on the other hand this means that the quality of US English stemming would be decreased. So I do not know what is best. If more similar cases exist, it could be interesting to have two versions of the English stemmer: one optimized for US English, and one ensuring for consistency. But maybe that's not worth it.


Regards



More information about the Snowball-discuss mailing list