[Snowball-discuss] no words ending on alism in Porter diffs txt file
Ward Bekker
ward at tty.nl
Mon Oct 24 21:57:09 BST 2011
Hi Martin,
Thx for the clarification. I've indeed used the version from Alden Dima. It contains the differences as listed in "Points of difference from the published algorithm". I'm in the process of adding test coverage and minor performance optimizations. I'll post a link to the updated version to the newsgroup when finished.
With kind regards,
Ward Bekker
_____
From: Martin Porter [mailto:martin.f.porter at gmail.com]
To: Ward Bekker (TTY) [mailto:ward at tty.nl]
Cc: snowball-discuss at lists.tartarus.org, Michel Rijnders [mailto:mies at tty.nl]
Sent: Mon, 24 Oct 2011 16:54:16 +0200
Subject: Re: [Snowball-discuss] no words ending on alism in Porter diffs txt file
Ward,
Okay, forget my last answer, second part --- I got 'porter' and
'english' the wrong way round, and the correct point is the following
one:
The 1980 paper describing the Porter stemmer differs from the way it
was initially implemented, and the way it's always used. These are
explained in the note 'Points of difference from the published
algorithm' in
http://tartarus.org/~martin/PorterStemmer/
But the implementation at
http://snowball.tartarus.org/algorithms/porter/stemmer.html
is of the original 1980 paper. As it says, "This is an exact
implementation of the algorithm described in the 1980 paper, unlike
the other implementations distributed by the author, which have, and
have always had, three small points of difference (clearly indicated)
from the original algorithm." This explains the difference.
In fact, follow the versions at
http://tartarus.org/~martin/PorterStemmer/
--- the snowball version is really just an academic exercise.
Sorry for the confusion!
Martin
On Fri, Oct 21, 2011 at 7:58 PM, Ward Bekker (TTY) <ward at tty.nl> wrote:
> Hi,
> Two questions:
> 1) While coverage testing the Erlang implementation of the Porter algorithm
> using the "Vocabulary + stemmed equivalent" file, I noticed that there a no
> words included that end on "alism". Is this on purpose?
(no, pure accident)
> See http://snowball.tartarus.org/algorithms/porter/diffs.txt
> 2) In the Vocabulary + stemmed equivalent" file I noticed that eg.
> "terribly" is stemmed to "terribli". In the Erlang version this is stemmed
> to "terribl", which maches the way "terrible" is stemmed. That looks useful
> to the untrained eye. Is this a side effect of the abli → able replaced
> by bli → ble change?
(see above)
> Regards,
> Ward Bekker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20111024/68b56557/attachment.htm>
More information about the Snowball-discuss
mailing list