[Snowball-discuss] Portuguese Stemmer

Bernardo Brandão bernardo at lumis.com.br
Tue Oct 27 18:21:21 GMT 2009


Hi Guys,

 

I was investigating Lucene and came uppon the SnowBall package.  I was
testing the PortugueseStemmer to see how applicable the SnowBallAnalyzer
would be in our portal (using Lucene).  I tested stemming the word airplane
in portuguese, which is “avião” and its plural is “aviões”.  Apparently the
PortugueseStemmer will not stemm the two words to be the same (“aviões”
stemmed to “aviõ” and “avião” stemmed to “aviã”).  

 

I donwloaded the lucene contrib source files, I noticed the
PortugueseStemmer had the following comment: “This file was generated
automatically by the Snowball to Java compiler” and “Generated class
implementing code defined by a snowball script”.

Is there any way you guys can improve this Stemmer?

 

Thanks,

Bernardo Brandão -  <mailto:bernardo at lumis.com.br> bernardo at lumis.com.br
Arquiteto de Software - Produto


Lumis Tecnologia da Informação
Tel [21] 3094-7500
 <http://www.lumis.com.br/> www.lumis.com.br

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20091027/fe560ec5/attachment.htm>


More information about the Snowball-discuss mailing list