[Snowball-discuss] porter2 stemmer overstemming the letter e
Piers Taylor
piers-taylor at 2vu.com
Mon Nov 24 12:59:55 GMT 2008
Hi Vincent,
>It would be great for symonym searching if there
>was some general rule for putting the letter 'e'
>back in to some of these words.
um, I may be missing something here, but shouldn't
you be stemming the synonym and *then* doing the search?
That way your synonym will match the index.
That's the way I would be doing it.
With best regards,
Piers
Piers Taylor
01752 822572
07815 155301
piers-taylor at 2vu.com
On 24 Nov 2008, at 01:13, Vincent Li wrote:
> Hi there I have a quick question about the porter2 stemmer
> overstemming
> the letter 'e' at the end of english words. At a glance, this
> appears to
> be quite common as I noticed two from the sample vocab on
>
> http://snowball.tartarus.org/algorithms/english/stemmer.html
>
> console -> consol
> conspire -> conspir
>
> vintage -> vintag
>
>
> I wont be surprised if there is somthing I am missing here, and
> would be
> glad if someone can enlighten me as to why the stemmer does this.
>
> I discovered this while I was trying to inject wordnet symonyms into
> stemmed search queries and noticed the search didnt pickup any
> symonyms
> for vintage. I thought about adding this as an exception, but I
> noticed
> the two entries in the sample vocabulary on the english stemmer site
> and
> thought it might be a common thing.
>
> Just here to check if this is more of a feature than a bug really. It
> would be great for symonym searching if there was some general rule
> for
> putting the letter 'e' back in to some of these words. :)
>
> Many thanks in advance,
>
> Vincent
>
> P.S. is there a way to search through the archive of this email list?
> Apologies if this question was addressed before, I tried but failed to
> find a search.
>
>
> ----------------------------------------------------------------------------
> This message is confidential and may be privileged. It is intended
> solely for
> the named addressee. If you are not the intended recipient, please
> inform us.
> Any unauthorised dissemination, distribution or copying hereof is
> prohibited.
>
> Formicary Limited registered office in England and Wales, address 1
> Taillar
> Road, Hedon, East Yorkshire HU12 8GU, registration number 3894343,
> VAT number
> 747644304, does not guarantee that the integrity of this
> communication has been
> maintained nor that this communication is free of viruses,
> interceptions or
> interference.
> ----------------------------------------------------------------------------
>
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
More information about the Snowball-discuss
mailing list