[Xapian-discuss] TermGenerator question for the single quote character

Olly Betts olly at survex.com
Mon Apr 6 14:31:53 BST 2009


On Sun, Apr 05, 2009 at 07:18:08PM -0400, tata 668 wrote:
> I use the TermGenerator to index the french text "Cela m'excite" 
> (without the quotes). When I do a search for "excite" after this 
> indexation, I need it to be found. "excite" is a word on is own.
> 
> Currently "excite" is not found but "m'excite" is...

In 1.0.0, we changed to treating apostrophes as part of a word, and
updated to a newer version of Snowball where the English stemmer
deals with them.

I think the correct way for this to work is for the other stemmers
to also handle apostrophes (at least if their languages use them)
as otherwise the word tokenisation required depends on the stemmer.

> Is there a setting I'm missing so that the single quote character act as 
> a word delimiter?

No, there's no such setting currently.

Cheers,
    Olly



More information about the Xapian-discuss mailing list