[Snowball-discuss] Some possible improvements in English
Tolkin, Steve
Steve.Tolkin@FMR.COM
Tue, 20 Nov 2001 08:57:18 -0500
Dear Martin,
Thanks so much for creating snowball and having it be open source!
On http://snowball.sourceforge.net/english/stemmer.html
you said: "Incidentally, this illustrates how much feedback to expect from
the
real users of a stemming algorithm: five words in twenty years!"
I never knew you were soliciting feedback.
Here are a few quick suggestions. (More later if I ghet the time.)
1. Need to specially handle certain words that end in "s"; but which are
singular.
Example:
atlas -> atla # But want it to be atlas, to conflate with atlases
cosmos -> cosmo # bad; cosmo probably a search for Cosmopolitan magazine.
2. Certain wrods that end -ive but whose stem is a common word.
These are likely to decrease precision
Example:
respective -> respect
productive -> product
conductive -> conduct
possessive -> possess
I think it would be better to have the -ivity form conflate with -ive
for these, but not reduce all the way.
Hopefully helpfully yours,
Steve
--
Steven Tolkin steve.tolkin@fmr.com 617-563-0516
Fidelity Investments 82 Devonshire St. V1D Boston MA 02109
There is nothing so practical as a good theory. Comments are by me,
not Fidelity Investments, its subsidiaries or affiliates.
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss
_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________