[Snowball-discuss] English stemmer and 'ian' suffix

jf at dockes.org jf at dockes.org
Sat May 25 09:06:06 BST 2013


Martin Porter writes:
 > Dear J F,
 > 
 > Thanks for you enquiry.
 > 
 > There is no special reason, except that removing -ian often does not
 > help things much. Think of agrarian, patrician, prussian, utilitarian
 > etc. A real problem is that with -ian, you sometimes want to remove
 > the whole three letters, as in orwellian, keynesian, which you cite,
 > sometimes just -an, as in antiquarian, historian, italian, and
 > sometimes just -n, as in indian, persian, bolivian. In general, the
 > snowball stemmers avoid dealing with the rarer suffixes, and this is
 > discussed in the introductory document, so in that sense I guess it
 > has come up before.
 > 
 > Martin

Thank you for this very clear explanation, I was going to ask about using a
dictionary, but then, at last, I found the introductory document, which
comes on the first Google page for "stemming dictionary". For someone
supposedly dealing with searches, I don't seem to be too good at performing
them :)

J.F.


 > On Fri, May 24, 2013 at 8:07 AM,  <jf at dockes.org> wrote:
 > > Hello,
 > >
 > > Is there is a reason why the English stemmer does not seem to
 > > handle a 'ian' suffix: politician, orwellian, keynesian... ?
 > >
 > > I guess that the question already came up ?
 > >
 > > Regards,
 > >
 > > J.F. Dockès



More information about the Snowball-discuss mailing list