[Snowball-discuss] porter2 stemmer overstemming the letter e

Piers Taylor piers-taylor at 2vu.com
Mon Nov 24 12:59:55 GMT 2008


Hi Vincent,

 >It would be great for symonym searching if there
 >was some general rule for putting the letter 'e'
 >back in to some of these words.

um, I may be missing something here, but shouldn't
you be stemming the synonym and *then* doing the search?
That way your synonym will match the index.
That's the way I would be doing it.

With best regards,
		  Piers

Piers Taylor
01752 822572
07815 155301
piers-taylor at 2vu.com



On 24 Nov 2008, at 01:13, Vincent Li wrote:

> Hi there I have a quick question about the porter2 stemmer  
> overstemming
> the letter 'e' at the end of english words. At a glance, this  
> appears to
> be quite common as I noticed two from the sample vocab on
>
> http://snowball.tartarus.org/algorithms/english/stemmer.html
>
> console -> consol
> conspire -> conspir
>
> vintage -> vintag
>
>
> I wont be surprised if there is somthing I am missing here, and  
> would be
> glad if someone can enlighten me as to why the stemmer does this.
>
> I discovered this while I was trying to inject wordnet symonyms into
> stemmed search queries and noticed the search didnt pickup any  
> symonyms
> for vintage. I thought about adding this as an exception, but I  
> noticed
> the two entries in the sample vocabulary on the english stemmer site  
> and
> thought it might be a common thing.
>
> Just here to check if this is more of a feature than a bug really. It
> would be great for symonym searching if there was some general rule  
> for
> putting the letter 'e' back in to some of these words. :)
>
> Many thanks in advance,
>
> Vincent
>
> P.S. is there a way to search through the archive of this email list?
> Apologies if this question was addressed before, I tried but failed to
> find a search.
>
>
> ----------------------------------------------------------------------------
> This message is confidential and may be privileged. It is intended  
> solely for
> the named addressee. If you are not the intended recipient, please  
> inform us.
> Any unauthorised dissemination, distribution or copying hereof is  
> prohibited.
>
> Formicary Limited registered office in England and Wales, address 1  
> Taillar
> Road, Hedon, East Yorkshire HU12 8GU, registration number 3894343,  
> VAT number
> 747644304, does not guarantee that the integrity of this  
> communication has been
> maintained nor that this communication is free of viruses,  
> interceptions or
> interference.
> ----------------------------------------------------------------------------
>
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss




More information about the Snowball-discuss mailing list