[Snowball-discuss] Portuguese Stemmer
Bernardo Brandão
bernardo at lumis.com.br
Tue Oct 27 18:21:21 GMT 2009
Hi Guys,
I was investigating Lucene and came uppon the SnowBall package. I was
testing the PortugueseStemmer to see how applicable the SnowBallAnalyzer
would be in our portal (using Lucene). I tested stemming the word airplane
in portuguese, which is avião and its plural is aviões. Apparently the
PortugueseStemmer will not stemm the two words to be the same (aviões
stemmed to aviõ and avião stemmed to aviã).
I donwloaded the lucene contrib source files, I noticed the
PortugueseStemmer had the following comment: This file was generated
automatically by the Snowball to Java compiler and Generated class
implementing code defined by a snowball script.
Is there any way you guys can improve this Stemmer?
Thanks,
Bernardo Brandão - <mailto:bernardo at lumis.com.br> bernardo at lumis.com.br
Arquiteto de Software - Produto
Lumis Tecnologia da Informação
Tel [21] 3094-7500
<http://www.lumis.com.br/> www.lumis.com.br
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20091027/fe560ec5/attachment.htm>
More information about the Snowball-discuss
mailing list