[Snowball-discuss] Snowball French stemming

Fred Fung fred.fung@versaterm.com
Fri Dec 12 17:35:02 2003


Hi Martin,

Thanks for the reply. I just read  the introductory page on the Snowball
site this morning and learned that the stemmers are not perfect ..... :o)

I guess the problem I encountered was about consistency. As I mentioned in
my original email, I am using this French stemmer with OpenFTS. After all
the texts have been indexed and stored, I would expect (maybe this is my
wrong expectation) that if I search for the word "français", all documents
containing the words "français" and "française" would come up. Or,
conversely, a search for the word "française" would bring up documents
containing the word "français" as well. But because of how the algorithm
works, the stemming result of these 2 words are different. Thus the search
result did not come up as expected.

So probably both words should be stemmed to "franç" or "franc" just to be
consistent ? On the other hand, maybe this word and its feminine form is
just a special case (e.g. I tried "provençale" and "provençal" and both were
stemmed to "provençal"). In any case, I have already made a note such that
it may be something I have to live with when my application is implemented.


Fred


----- Original Message -----
From: <martin.porter@grapeshot.co.uk>
To: "Fred Fung" <fred.fung@versaterm.com>;
<snowball-discuss@lists.tartarus.org>
Sent: Friday, December 12, 2003 11:04 AM
Subject: Re: [Snowball-discuss] Snowball French stemming


> Fred,
>
> Of course, the stemmers are not perfect, so errors of this type will
happen.
> Even so, there does seem to be room for improvement. -ais is a verb
ending,
> which is why it taken off franc,ais (the stemmer does not know this is not
a
> verb form). But it is also a common adjectival form: japonais, anglais,
> franc,ais etc and might be removed accordingly (mauvais is an exception
here).
> If I made this change would you be interested?
>
> Martin
>
>
>