[Snowball-discuss] Another puzzling French stemmer question

Martin Holmes mholmes at uvic.ca
Tue Jan 26 22:23:57 GMT 2021


Sorry! I had my tests set up using the old test data here:

http://snowball.tartarus.org/algorithms/french/diffs.txt

instead of the current test data in the repo. My apologies.

Cheers,
Martin

On 2021-01-26 1:43 p.m., Martin Holmes wrote:
> Hi all,
> 
> I'm having trouble with the word égoïsme. The test data says that it 
> should be stemmed to égoïsm, but I get égo. Here's the process:
> 
> Preflight: Replace ë and ï with He and Hi.
> Result: égoHisme
> 
> Calculating RV, R1, R2:
> 
> RV: "If the word begins with two vowels, RV is the region after the 
> third letter, otherwise the region after the first vowel not at the 
> beginning of the word..."
> 
> Result: RV = Hisme
> 
> R1: "R1 is the region after the first non-vowel following a vowel..."
> 
> Result: R1 = oHisme
> 
> R2: "R2 is the region after the first non-vowel following a vowel in R1..."
> 
> Result: R2 = isme
> 
> Step 1: Search for the longest among the following suffixes, and perform 
> the action indicated.
> ance   iqUe   isme   able   iste   eux   ances   iqUes   ismes   ables 
> istes
>      delete if in R2
> 
> "isme" appears in R2, so we should get égo.
> 
> What am I misunderstanding here?
> 
> All help appreciated,
> Martin
> 
> 
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> https://lists.tartarus.org/mailman/listinfo/snowball-discuss





More information about the Snowball-discuss mailing list