[Snowball-discuss] Removing été from French stop words

Philippe Ouellet philippe at camellia-sinensis.com
Thu Apr 16 01:14:55 BST 2020


Could you give me the link to the file in the repo? I have no idea where that is.

I only did notice été, because one of our product name contains that word, but you are right about aura and avions. 

Do we need to remove avions from the stop word, if it get changed to its singular form during analysis?

I am having second thought: removing été could have a great impact on the search result, someone searching for “summer” would result in finding all results containing the past tense form of “to be”: the impact is huge.

Is there a way to make “a été” the stop word instead?

--
Philippe Ouellet
Web Developer
https://camellia-sinensis.com

> On Apr 15, 2020, at 20:05, Olly Betts <olly at survex.com> wrote:
> 
> On Wed, Apr 15, 2020 at 05:44:35PM -0400, Philippe Ouellet wrote:
>> I propose to remove “été” and “étés” from the French stop words. It is
>> true that they are a form for the verb “to be”, but it also mean
>> “summer”, which should not be a stop word.
> 
> Sounds good to me.  It looks like there are other entries with the same
> problem - e.g.  "aura" and "avions" are both also nouns.
> 
> But my French is rudimentary at best.  Could you review the whole list
> and open a PR against the snowball-website repo with your proposed
> changes?
> 
> I'd suggest we comment out such entries ("|" is the comment character
> here) with a note as to why they are omitted.  That should help avoid
> future requests to add the apparently missing entries.
> 
> Cheers,
>    Olly

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/pipermail/snowball-discuss/attachments/20200415/09f84b68/attachment.htm>


More information about the Snowball-discuss mailing list