[Snowball-discuss] Comon strings ending in s that may not be
ordinary words
Tolkin, Steve
Steve.Tolkin@FMR.COM
Mon, 26 Nov 2001 08:34:54 -0500
Dear Martin,
I was not going to bring this up, but since you did,
there are many common short abbreviations, acronyms,
initialisms, etc. that end with "s".
For example (common in the U.S.) IRS, INS, EDS, etc.
These, and many others, will not be handled
by the new approach either. The first two above are
especially interesting because the first will conflate
with "ir" as in information retrieval, and then second
will become the probable stopword "in".
Hopefully helpfully yours,
Steve
--
Steven Tolkin steve.tolkin@fmr.com 617-563-0516
Fidelity Investments 82 Devonshire St. V1D Boston MA 02109
There is nothing so practical as a good theory. Comments are by me,
not Fidelity Investments, its subsidiaries or affiliates.
You said:
> Message: 8
> To: snowball-discuss@lists.sourceforge.net
> From: martin_porter@softhome.net (Martin Porter)
> Date: Thu, 22 Nov 2001 03:06:26 -0700
> Subject: [Snowball-discuss] Changes to Porter2
>
>
> I have made some changes to the porter2 algorithm.
>
> The documentation errors noticed by Andrew Aksyonoff have
> been corrected.
>
> -s removal has been changed. You now need a vowel somewhere before the
> letter before the s. So 'gas', 'this', 'has', 'was' keep the
> s, 'dogs',
> 'cats', 'woos', 'kiwis' lose the s. Usefully, the s is not
> removed from
> non-words like 'cvs', 'spss', 'lms' etc.
>
> In general there is a problem identifying plurals of words
> ending Xs, where
> X is vowel other than e. As you know, porter2 leaves -us
> alone but removes s
> after a,i,o. This works fairly well.
>
> I have added a few more exceptions in following suggestions
> from Steve Tolkin.
>
...
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/snowball-discuss
>
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss
_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________