[Xapian-discuss] Stemming problem

Olly Betts olly at survex.com
Wed Jul 4 14:36:35 BST 2007


On Wed, Jul 04, 2007 at 01:59:53PM +0100, James Aylett wrote:
> Most short -er words shouldn't stem the -er off, I suspect. (In
> general, verbs?) I think we're stemming if the prefix >= 6 characters?

It's generally defined in terms of constants and vowels, where
consecutive constants and vowels count once:

http://snowball.tartarus.org/texts/r1r2.html

I think I'd rather avoid maintaining variants of the snowball
algorithms, so if there are useful changes to make I think they should
be proposed to the snowball project, and if accepted, we'd then import
the updated version in due course.

Cheers,
    Olly



More information about the Xapian-discuss mailing list