[Snowball-discuss] french stemmer

Martin Porter martin at porterloo.wanadoo.co.uk
Thu Feb 17 08:27:16 GMT 2011


Dear Fracois-Xavier,

Yes, I saw the post, was not sure how to reply, and so procrastinated. (I
was intending to write quite a long note.) But your reminder prompts me to
say something.

There is, as you probably know, an English stemmer, which is (or is supposed
to be) an improvement on the original Porter stemmer. The reason I felt
competent to produce the second stemmer is that I am a native speaker of
English, and so could make judgements about the performance of the earlier
stemmer. You could say there are two forms A and B: the Porter stemmer is
form A, put together quickly after looking at the grammar and morphology of
English, and the English stemmer is form B, a refinement of form A, done by
a native speaker after some years practical experience with using form A.

The problem with the other snowball stemmers made by me is that they are all
in form A, and actually delicate improvements (going towards a form B)
should be done by people with better knowledge of the languages concerned
than I have. Consequently your email was important, but it's not clear (here
at snowball) what to do with it at the moment.

An idea I had was to collect these suggestions, language by language, and
publish them in a significant place on the snowball site, as a resource to
others wishing to make further improvements. The truth is we do not get many
such suggestions. Here is one regarding the Russian stemmer from 13 Feb 2004,

"May be you can help me, how to add just one exception: stem Kiev => Kiev." 

This has been stuck in my mind for the past seven years! Occasionally there
are more general criticisms, for example,

http://article.gmane.org/gmane.comp.search.snowball/1046/match=swedish

but they require the same treatment, and at present are similarly left
unresolved. Any suggestions on the way forward by "snowball regulars" would
be useful here,

Martin

At 05:39 PM 2/16/2011 +0100, fxbois at kernix.com wrote:
>
>Hi Martin,
>
>I juste wonder if you had seen my post in december 2010:
>http://lists.tartarus.org/mailman/private/snowball-discuss/2010-December/th
read.html
>
>Thank you in advance and sorry to contact you like this.
>
>-- 
>François-Xavier BOIS
>KerniX Software
>15, rue Cels
>75014 PARIS
>fxbois at kernix.com
>tel : 01 53 98 73 43
>






More information about the Snowball-discuss mailing list