[Snowball-discuss] porter2 stemmer overstemming the letter e

Martin Porter martin at porterloo.wanadoo.co.uk
Mon Nov 24 10:05:45 GMT 2008


At 01:13 24/11/2008 -0000, Vincent Li wrote:
>
>Hi there I have a quick question about the porter2 stemmer overstemming
>the letter 'e' at the end of english words. At a glance, this appears to
>be quite common as I noticed two from the sample vocab on
>
>http://snowball.tartarus.org/algorithms/english/stemmer.html
>
>console -> consol
>conspire -> conspir
>
>vintage -> vintag
>
>
>I wont be surprised if there is somthing I am missing here, and would be
>glad if someone can enlighten me as to why the stemmer does this.

It's so that conspir/ing conflates with conspir/e and so on. For single
syllable words, reinstating the -e is important. For two or more syllable
words, reinstating the -e rarely matters.

(I guess there should be an FAQ entry for this question on the PorterStemmer
page.)




>P.S. is there a way to search through the archive of this email list?
>Apologies if this question was addressed before, I tried but failed to
>find a search.
>

Don't worry, I was caught out by this recently (the search mechanism used to
be internal -- now it's on gmane). We ought to explain how this is done on
on the website. Here is the answer in a recent email from Richard Boulton to me:

"....can't you just use the archives on gmane?

http://news.gmane.org/gmane.comp.search.snowball

These are complete archives of the mailing list, and the search box at 
the bottom is powered by xapian.  Indeed a search for "ise ending" 
returned the article you quoted.

Links of the form
http://article.gmane.org/gmane.comp.search.snowball/200
are permanent links to articles, which you can use to refer to them by."

-- end of quote.



Martin





More information about the Snowball-discuss mailing list