[Snowball-discuss] Stemming French words which have a plural in "x"

Martin Porter martin.f.porter at gmail.com
Wed May 1 09:10:22 BST 2019


Dear Yann,

(I am fairly certain that -x plurals in French has not come up in
snowball-discuss hitherto.)

In any new rule in a stemmer you have to balance the success it may
have by correctly stemming one part of the vocabulary by the failure
it may incur in incorrectly stemming another part. To do that you
generate a 3-column version of

http://snowball.tartarus.org/algorithms/french/diffs.txt

where the third column is like the second, but with the addition of
the new rule. You then study the changes between columns 2 and 3. And
you will get problems because the endings -eux and and -oux are
adjectival endings (like English -ous) as well as noun-plural endings.
So "hiboux" is the plural of "hibou" but "jaloux" is not the plural of
"jalou". In developing the French stemmer I would have allowed for
safe removal of -x in the -eaux context, but not in the contexts you
mention therefore. By assuming -eux is adjectival, the stemmer also
tries to conflate masculine and feminine forms: so "nerveuse" stems to
"nerveux" etc. -- another complication.

This is not arguing against the introduction of further general rules,
but if added, they do need to be tested with great care.

You could build in exception lists for nouns ending -ou or -eu, and
there are notes in the English stemmer suggesting how to do this. This
is often worth doing, but I think such exception lists are usually
application specific, and it is not worth while trying to make them
universal. (So failing to conflate "hibou"/"hiboux" would not matter
too much, unless the text it was applied to was in some way
"ornithological".)

Martin



More information about the Snowball-discuss mailing list