[Snowball-discuss] French stemmer test data wrong?

Martin Porter martin.f.porter at gmail.com
Sun Jan 24 20:19:59 GMT 2021


To be more precise in the French algorithm definition, we should replace

"Then put into upper case u or i preceded and followed by a vowel, and
y preceded or followed by a vowel. u after q is also put into upper
case."

with this:

"Then, taking the letters in turn from the beginning to end of the
word, put u or i into upper case when it is both preceded and followed
by a vowel; put y into upper case when it is either preceded or
followed by a vowel; and put u into upper case when it follows q."

Then (perhaps) after the example

quand→qUand

add this one,

croyiez→croYiez

with the final note, "In the last example, y becomes Y because it is
between two vowels, but i does not become I because it is between Y
and e, and Y is not in the class of vowels."

Martin Holmes' corrected code does not work like this, but still
arrives at the correct answer. This is because in French yi is common
enough, while iy never happens.



More information about the Snowball-discuss mailing list