[Snowball-discuss] french stemmer

François-Xavier Bois fxbois at kernix.com
Thu Feb 17 08:40:58 GMT 2011


Hi Martin,

I fully understand your point of view. I think, like you write, that 
listing all the possible "errors" language by language would be great 
because we could perhaps find new rules/behaviours . (Moreover it would 
naturaly build an excpetion thesaurus).
With this listing, I would be happy to try to add/modify rules for the 
french stemmer.

Thank you for your wonderful work.

François-Xavier BOIS
KerniX Software
15, rue Cels
75014 PARIS
fxbois at kernix.com
tel : 01 53 98 73 43


Le 17/02/2011 09:27, Martin Porter a écrit :
> Dear Fracois-Xavier,
>
> Yes, I saw the post, was not sure how to reply, and so procrastinated. (I
> was intending to write quite a long note.) But your reminder prompts me to
> say something.
>
> There is, as you probably know, an English stemmer, which is (or is supposed
> to be) an improvement on the original Porter stemmer. The reason I felt
> competent to produce the second stemmer is that I am a native speaker of
> English, and so could make judgements about the performance of the earlier
> stemmer. You could say there are two forms A and B: the Porter stemmer is
> form A, put together quickly after looking at the grammar and morphology of
> English, and the English stemmer is form B, a refinement of form A, done by
> a native speaker after some years practical experience with using form A.
>
> The problem with the other snowball stemmers made by me is that they are all
> in form A, and actually delicate improvements (going towards a form B)
> should be done by people with better knowledge of the languages concerned
> than I have. Consequently your email was important, but it's not clear (here
> at snowball) what to do with it at the moment.
>
> An idea I had was to collect these suggestions, language by language, and
> publish them in a significant place on the snowball site, as a resource to
> others wishing to make further improvements. The truth is we do not get many
> such suggestions. Here is one regarding the Russian stemmer from 13 Feb 2004,
>
> "May be you can help me, how to add just one exception: stem Kiev =>  Kiev."
>
> This has been stuck in my mind for the past seven years! Occasionally there
> are more general criticisms, for example,
>
> http://article.gmane.org/gmane.comp.search.snowball/1046/match=swedish
>
> but they require the same treatment, and at present are similarly left
> unresolved. Any suggestions on the way forward by "snowball regulars" would
> be useful here,
>
> Martin
>
> At 05:39 PM 2/16/2011 +0100, fxbois at kernix.com wrote:
>> Hi Martin,
>>
>> I juste wonder if you had seen my post in december 2010:
>> http://lists.tartarus.org/mailman/private/snowball-discuss/2010-December/th
> read.html
>> Thank you in advance and sorry to contact you like this.
>>
>> -- 
>> François-Xavier BOIS
>> KerniX Software
>> 15, rue Cels
>> 75014 PARIS
>> fxbois at kernix.com
>> tel : 01 53 98 73 43
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20110217/d678a474/attachment-0001.htm>


More information about the Snowball-discuss mailing list