[Snowball-discuss] English stemmer and 'ian' suffix
jf at dockes.org
jf at dockes.org
Sat May 25 09:06:06 BST 2013
Martin Porter writes:
> Dear J F,
>
> Thanks for you enquiry.
>
> There is no special reason, except that removing -ian often does not
> help things much. Think of agrarian, patrician, prussian, utilitarian
> etc. A real problem is that with -ian, you sometimes want to remove
> the whole three letters, as in orwellian, keynesian, which you cite,
> sometimes just -an, as in antiquarian, historian, italian, and
> sometimes just -n, as in indian, persian, bolivian. In general, the
> snowball stemmers avoid dealing with the rarer suffixes, and this is
> discussed in the introductory document, so in that sense I guess it
> has come up before.
>
> Martin
Thank you for this very clear explanation, I was going to ask about using a
dictionary, but then, at last, I found the introductory document, which
comes on the first Google page for "stemming dictionary". For someone
supposedly dealing with searches, I don't seem to be too good at performing
them :)
J.F.
> On Fri, May 24, 2013 at 8:07 AM, <jf at dockes.org> wrote:
> > Hello,
> >
> > Is there is a reason why the English stemmer does not seem to
> > handle a 'ian' suffix: politician, orwellian, keynesian... ?
> >
> > I guess that the question already came up ?
> >
> > Regards,
> >
> > J.F. Dockès
More information about the Snowball-discuss
mailing list