[Snowball-discuss] no words ending on alism in Porter diffs txt file

Martin Porter martin.f.porter at gmail.com
Mon Oct 24 11:03:37 BST 2011


Ward, the quick answer to your two points is:

On Fri, Oct 21, 2011 at 7:58 PM, Ward Bekker (TTY) <ward at tty.nl> wrote:
> Hi,
> Two questions:
> 1) While coverage testing the Erlang implementation of the Porter algorithm
> using the "Vocabulary + stemmed equivalent" file, I noticed that there a no
> words included that end on "alism". Is this on purpose?
> See http://snowball.tartarus.org/algorithms/porter/diffs.txt

No, it is an accident.

> 2) In the Vocabulary + stemmed equivalent" file I noticed that eg.
> "terribly" is stemmed to "terribli". In the Erlang version this is stemmed
> to "terribl", which maches the way "terrible" is stemmed. That looks useful
> to the untrained eye. Is this a side effect of the abli  →  able replaced
> by bli  →  ble change?

terribly stems to terribl in the Porter algorithm, and to terribli in
the 'improved' English algorithm. which are separate algorithms on the
snowball site. Your erlang version (is that the one by Alden Dima from
tartarus.org/~martin/PorterStemmer/?) is clearlyfollowing the Porter
stemmer.

-------


But you may well ask how is the English stemmer supposed to be
improving on the Porter stemmer here. I'll investigate further and get
back to you,

Martin



More information about the Snowball-discuss mailing list