[Snowball-discuss] French stemmer test data wrong?
Martin Porter
martin.f.porter at gmail.com
Fri Jan 22 23:58:36 GMT 2021
Martin,
Yes, for your first point, I think the sentence "In steps 2a and 2b
all tests are confined to the RV region" makes the intention clear,
albeit very tersely, but the fact that marking Y and I has to be done
by looking at each letter in left-to-right order to get the right
result with a word like "croyiez" is indeed an omission in the
algorithm description, and I did not realise at the time I wrote it
down that it mattered.
It is quite a subtle point. The idea of course is to mark certain
letters as consonants when they are usually vowels, and the problem
does not occur in the other romance languages, as Y doesn't have this
double function. Indeed in Italian Y is not part of the alphabet of
the written language. I suspect it is only the French stemmer where
this care is needed.
Thank you for spotting this,
Martin
More information about the Snowball-discuss
mailing list