[Snowball-discuss] Making accented letters equivalent to they unaccented letter
Olly Betts
olly at survex.com
Tue May 17 21:29:32 BST 2011
On Tue, May 17, 2011 at 04:07:36PM -0300, Tiago wrote:
> But i'm having a problem, for example, if there are the words "última" and
> "ultima". As search terms they should return the same objects.
> I was looking at the libstemmer_pt and I would like to know if its possible
> to make them the same symbol.
> If it is, can some one give me an example please.
I think libunac is the most popular way to strip accents - you can apply
it after stemming to give what you want.
Unfortunately the homepage seems to be down:
http://www.senga.org/unac/
Best link I can offer instead is the debian source package page:
http://packages.debian.org/source/sid/unac
The unac_1.8.0.orig.tar.gz link is the original source code.
Cheers,
Olly
More information about the Snowball-discuss
mailing list