[Snowball-discuss] no words ending on alism in Porter diffs txt file

Ward Bekker ward at tty.nl
Mon Oct 24 21:57:09 BST 2011


Hi Martin,


Thx for the clarification. I've indeed used the version from Alden Dima. It contains the differences as listed in "Points of difference from the published algorithm". I'm in the process of adding test coverage and minor performance optimizations. I'll post a link to the updated version to the newsgroup when finished. 


With kind regards,


Ward Bekker
  _____  

From: Martin Porter [mailto:martin.f.porter at gmail.com]
To: Ward Bekker (TTY) [mailto:ward at tty.nl]
Cc: snowball-discuss at lists.tartarus.org, Michel Rijnders [mailto:mies at tty.nl]
Sent: Mon, 24 Oct 2011 16:54:16 +0200
Subject: Re: [Snowball-discuss] no words ending on alism in Porter diffs txt file

Ward,
  
  Okay, forget my last answer, second part --- I got 'porter' and
  'english' the wrong way round, and the correct point is the following
  one:
  
  The 1980 paper describing the Porter stemmer differs from the way it
  was initially implemented, and the way it's always used. These are
  explained in the note 'Points of difference from the published
  algorithm' in
  
  http://tartarus.org/~martin/PorterStemmer/
  
  But the implementation at
  
  http://snowball.tartarus.org/algorithms/porter/stemmer.html
  
  is of the original 1980 paper. As it says, "This is an exact
  implementation of the algorithm described in the 1980 paper, unlike
  the other implementations distributed by the author, which have, and
  have always had, three small points of difference (clearly indicated)
  from the original algorithm." This explains the difference.
  
  In fact, follow the versions at
  
  http://tartarus.org/~martin/PorterStemmer/
  
  --- the snowball version is really just an academic exercise.
  
  Sorry for the confusion!
  
  Martin
  
  
  
  
  On Fri, Oct 21, 2011 at 7:58 PM, Ward Bekker (TTY) <ward at tty.nl> wrote:
  > Hi,
  > Two questions:
  > 1) While coverage testing the Erlang implementation of the Porter algorithm
  > using the "Vocabulary + stemmed equivalent" file, I noticed that there a no
  > words included that end on "alism". Is this on purpose?
  
  (no, pure accident)
  
  > See http://snowball.tartarus.org/algorithms/porter/diffs.txt
  > 2) In the Vocabulary + stemmed equivalent" file I noticed that eg.
  > "terribly" is stemmed to "terribli". In the Erlang version this is stemmed
  > to "terribl", which maches the way "terrible" is stemmed. That looks useful
  > to the untrained eye. Is this a side effect of the abli  →  able replaced
  > by bli  →  ble change?
  
  (see above)
  
  > Regards,
  > Ward Bekker
    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20111024/68b56557/attachment.htm>


More information about the Snowball-discuss mailing list