[Snowball-discuss] Spanish stemmer with accents stripped before stemming

Martin Porter martin.porter at grapeshot.co.uk
Mon May 21 12:49:51 BST 2007


I find I can't connect to 200.67.231.185, so I'm not too sure what's
going on here. Obviously to us, it a bit easier to look at the problem
from the snowball angle, rather than think about the generated java
after it's been put inside lucene! As far as the snowball script is
concerned, I believe you could strip out accents from the source,
eliminate the duplicate strings in the amongs(..) that would result, and
recompile, getting the effect you want.

(Incidentally, I have hit this problem with Spanish stemming before, but
it was a long while ago -- before the development of snowball.)

Also I'm not familiar with the java codegenerated output. I don't know
if Richard Boulton (who write the java codegenerator) has anything more
to add at this stage?

Martin






More information about the Snowball-discuss mailing list