[Snowball-discuss] Some possible improvements in English

Tolkin, Steve Steve.Tolkin@FMR.COM
Tue, 20 Nov 2001 08:57:18 -0500


Dear Martin,

Thanks so much for creating snowball and having it be open source!

On http://snowball.sourceforge.net/english/stemmer.html
you said: "Incidentally, this illustrates how much feedback to expect from
the
real users of a stemming algorithm: five words in twenty years!"

I never knew you were soliciting feedback.

Here are a few quick suggestions.  (More later if I ghet the time.)

1.  Need to specially handle certain words that end in "s"; but which are
singular.
Example:
atlas -> atla  # But want it to be atlas, to conflate with atlases
cosmos -> cosmo  # bad; cosmo probably a search for Cosmopolitan magazine.

2. Certain wrods that end -ive but whose stem is a common word.
These are likely to decrease precision
Example:
respective -> respect
productive -> product
conductive -> conduct
possessive -> possess

I think it would be better to have the -ivity  form conflate with -ive
for these, but not reduce all the way.
 
Hopefully helpfully yours,
Steve
-- 
Steven Tolkin          steve.tolkin@fmr.com      617-563-0516 
Fidelity Investments   82 Devonshire St. V1D     Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.


_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss

_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________