[Snowball-discuss] no words ending on alism in Porter diffs txt file

Martin Porter martin.f.porter at gmail.com
Mon Oct 24 15:54:16 BST 2011


Ward,

Okay, forget my last answer, second part --- I got 'porter' and
'english' the wrong way round, and the correct point is the following
one:

The 1980 paper describing the Porter stemmer differs from the way it
was initially implemented, and the way it's always used. These are
explained in the note 'Points of difference from the published
algorithm' in

http://tartarus.org/~martin/PorterStemmer/

But the implementation at

http://snowball.tartarus.org/algorithms/porter/stemmer.html

is of the original 1980 paper. As it says, "This is an exact
implementation of the algorithm described in the 1980 paper, unlike
the other implementations distributed by the author, which have, and
have always had, three small points of difference (clearly indicated)
from the original algorithm." This explains the difference.

In fact, follow the versions at

http://tartarus.org/~martin/PorterStemmer/

--- the snowball version is really just an academic exercise.

Sorry for the confusion!

Martin




On Fri, Oct 21, 2011 at 7:58 PM, Ward Bekker (TTY) <ward at tty.nl> wrote:
> Hi,
> Two questions:
> 1) While coverage testing the Erlang implementation of the Porter algorithm
> using the "Vocabulary + stemmed equivalent" file, I noticed that there a no
> words included that end on "alism". Is this on purpose?

(no, pure accident)

> See http://snowball.tartarus.org/algorithms/porter/diffs.txt
> 2) In the Vocabulary + stemmed equivalent" file I noticed that eg.
> "terribly" is stemmed to "terribli". In the Erlang version this is stemmed
> to "terribl", which maches the way "terrible" is stemmed. That looks useful
> to the untrained eye. Is this a side effect of the abli  →  able replaced
> by bli  →  ble change?

(see above)

> Regards,
> Ward Bekker



More information about the Snowball-discuss mailing list