[Snowball-discuss] French stemmer test data wrong?
Martin Porter
martin.f.porter at gmail.com
Sun Jan 24 20:19:59 GMT 2021
To be more precise in the French algorithm definition, we should replace
"Then put into upper case u or i preceded and followed by a vowel, and
y preceded or followed by a vowel. u after q is also put into upper
case."
with this:
"Then, taking the letters in turn from the beginning to end of the
word, put u or i into upper case when it is both preceded and followed
by a vowel; put y into upper case when it is either preceded or
followed by a vowel; and put u into upper case when it follows q."
Then (perhaps) after the example
quand→qUand
add this one,
croyiez→croYiez
with the final note, "In the last example, y becomes Y because it is
between two vowels, but i does not become I because it is between Y
and e, and Y is not in the class of vowels."
Martin Holmes' corrected code does not work like this, but still
arrives at the correct answer. This is because in French yi is common
enough, while iy never happens.
More information about the Snowball-discuss
mailing list