[Snowball-discuss] French stemmer test data wrong?
Martin Holmes
mholmes at uvic.ca
Sun Jan 24 18:06:18 GMT 2021
Ah, thanks for the clarification! I didn't realize the replacements were
designed to happen letter-by-letter, left-to-right. That means it's
basically fortuitous that all the tests pass for my process, but having
got it working I think I might leave it alone. Have to do the equivalent
JS version now. :-)
Cheers,
Martin
On 2021-01-22 3:58 p.m., Martin Porter wrote:
> Martin,
>
> Yes, for your first point, I think the sentence "In steps 2a and 2b
> all tests are confined to the RV region" makes the intention clear,
> albeit very tersely, but the fact that marking Y and I has to be done
> by looking at each letter in left-to-right order to get the right
> result with a word like "croyiez" is indeed an omission in the
> algorithm description, and I did not realise at the time I wrote it
> down that it mattered.
>
> It is quite a subtle point. The idea of course is to mark certain
> letters as consonants when they are usually vowels, and the problem
> does not occur in the other romance languages, as Y doesn't have this
> double function. Indeed in Italian Y is not part of the alphabet of
> the written language. I suspect it is only the French stemmer where
> this care is needed.
>
> Thank you for spotting this,
>
> Martin
>
More information about the Snowball-discuss
mailing list