[Snowball-discuss] French stemmer test data wrong?

Martin Holmes mholmes at uvic.ca
Sun Jan 24 18:06:18 GMT 2021


Ah, thanks for the clarification! I didn't realize the replacements were 
designed to happen letter-by-letter, left-to-right. That means it's 
basically fortuitous that all the tests pass for my process, but having 
got it working I think I might leave it alone. Have to do the equivalent 
JS version now. :-)

Cheers,
Martin

On 2021-01-22 3:58 p.m., Martin Porter wrote:
> Martin,
> 
> Yes, for your first point, I think the sentence "In steps 2a and 2b
> all tests are confined to the RV region" makes the intention clear,
> albeit very tersely, but the fact that marking Y and I has to be done
> by looking at each letter in left-to-right order to get the right
> result with a word like "croyiez" is indeed an omission in the
> algorithm description, and I did not realise at the time I wrote it
> down that it mattered.
> 
> It is quite a subtle point. The idea of course is to mark certain
> letters as consonants when they are usually vowels, and the problem
> does not occur in the other romance languages, as Y doesn't have this
> double function. Indeed in Italian Y is not part of the alphabet of
> the written language. I suspect it is only the French stemmer where
> this care is needed.
> 
> Thank you for spotting this,
> 
> Martin
> 





More information about the Snowball-discuss mailing list