[Snowball-discuss] Another puzzling French stemmer question
Martin Holmes
mholmes at uvic.ca
Tue Jan 26 21:43:53 GMT 2021
Hi all,
I'm having trouble with the word égoïsme. The test data says that it
should be stemmed to égoïsm, but I get égo. Here's the process:
Preflight: Replace ë and ï with He and Hi.
Result: égoHisme
Calculating RV, R1, R2:
RV: "If the word begins with two vowels, RV is the region after the
third letter, otherwise the region after the first vowel not at the
beginning of the word..."
Result: RV = Hisme
R1: "R1 is the region after the first non-vowel following a vowel..."
Result: R1 = oHisme
R2: "R2 is the region after the first non-vowel following a vowel in R1..."
Result: R2 = isme
Step 1: Search for the longest among the following suffixes, and perform
the action indicated.
ance iqUe isme able iste eux ances iqUes ismes ables
istes
delete if in R2
"isme" appears in R2, so we should get égo.
What am I misunderstanding here?
All help appreciated,
Martin
More information about the Snowball-discuss
mailing list