[Snowball-discuss] Stemming French words which have a plural in "x"

Yann Barsamian Yann.Barsamian at ulb.ac.be
Tue Apr 30 13:25:28 BST 2019


Hi everybody,

I'm a French researcher in computer science, and am using Snowball for 
my research (textual indexing of documents). First of all, thank you for 
providing such a tool !


I see that Snowball's French stemming algorithm treats plurals in "eaux" 
(like "châteaux" is the plural of "château") or in "aux" ("chevaux" is 
the plural of "cheval"), together with adjectives in "eux", but not 
other plurals in "x". The problem with French is that it has a lot of 
irregular plurals. I was wondering if it was possible to update the 
French stemmer of Snowball to reflect this fact ?

For example, I noticed that "voeux" is not stemmed as "voeu" (also 
"jeux" is not stemmed as "jeu", etc.). Is there something that can be 
done to enhance the stemming algorithm ? It could also include the 
following (non-exhaustive, but just to have a baseline for further 
discussions) irregular plurals :

* 7 names (hibou, caillou, chou, bijou, genou, joujou, pou) have their 
plural in "x" (hiboux, cailloux, choux, bijoux, genoux, joujoux, poux)
* "yeux" is the plural of "oeil"


I also noticed some missed stemmings :

* "illustrent" and "illustre" are not stemmed as "illustr" as it could 
be


I wanted to check in the archives of the mailing list if this has 
already been discussed before, but it is not very convenient to search 
in the archives, so I'm sorry if this has already been debated before.


If this seems interesting for you, and if you have unit tests for the 
French stemming algorithm (if I update the algorithm, at least there 
should be some tests to make sure everything else still works as 
intended !), I would be willing to contribute to the algorithm to 
enhance it, even though for now I do not really now how to do it.


Have a nice day,

Yann



More information about the Snowball-discuss mailing list