[Xapian-discuss] indexing words with alternative spellings

Oliver Flimm flimm at ub.uni-koeln.de
Tue May 11 14:46:44 BST 2010


On Tue, May 11, 2010 at 03:18:38PM +0200, Per Jessen wrote:
> Some languages (e.g. German and Danish) have special letters that are
> often written using two-letter combinations when the appropriate
> keyboard or medium is not available:
> ä = ae
> As a user of an index, I would like to be able to search for
> e.g. "schaefer" and get matches on both 'ae' and 'ä' returned. Same if
> I searched on 'schäfer'.  Is this something I would need to take into
> account when I do the indexing or?

you have to take it into account both when indexing and searching.

I'm using Xapian in a library catalogue and convert these "special"
character to the two-letter combination - both when generating terms
or postings and when processing user input. 


O. Flimm

Universitaet zu Koeln :: Universitaets- und Stadtbibliothek
IT-Dienste :: Abteilung Universitaetsgesamtkatalog
Universitaetsstr. 33 :: D-50931 Koeln
Tel.: +49 221 470-3330 :: Fax: +49 221 470-5166
flimm at ub.uni-koeln.de :: www.ub.uni-koeln.de

More information about the Xapian-discuss mailing list