[Snowball-discuss] Making accented letters equivalent to they unaccented letter

Olly Betts olly at survex.com
Tue May 17 21:29:32 BST 2011


On Tue, May 17, 2011 at 04:07:36PM -0300, Tiago wrote:
> But i'm having a problem, for example, if there are the words "última" and
> "ultima". As search terms they should return the same objects.
> I was looking at the libstemmer_pt and I would like to know if its possible
> to make them the same symbol.
> If it is, can some one give me an example please.

I think libunac is the most popular way to strip accents - you can apply
it after stemming to give what you want.

Unfortunately the homepage seems to be down:

http://www.senga.org/unac/

Best link I can offer instead is the debian source package page:

http://packages.debian.org/source/sid/unac

The unac_1.8.0.orig.tar.gz link is the original source code.

Cheers,
    Olly



More information about the Snowball-discuss mailing list