[Snowball-discuss] Suggestion: 'aci'(<-'at')

James Aylett james-xapian at tartarus.org
Mon Mar 31 15:41:21 BST 2014


On 30 Mar 2014, at 18:27, Chris Hennick <christopherhe at trentu.ca> wrote:

> In English, the suffix 'acy' almost always corresponds to a cognate ending 'ate' (piracy, privacy, literacy, accuracy) or 'atic' (democracy, lunacy, trichromacy), so it'd be helpful if, like those, it stemmed to -at. (The only word I can think of where this isn't true is pharmacy, but I don't think any words derived from it would be affected.)

I think these are also counter-cases:

 * conspiracy
 * episcopacy
 * fallacy
 * lacy
 * papacy
 * racy
 * supremacy

This out of 35 words -acy in Moby, so a little less than a quarter. GCIDE has a much longer list, but I doubt it would change the ratio significantly.

I think the only two likely to cause problems would be l-acy -> l-at, r-acy -> r-at, which could be handled with a minimum length in the rule. (There's also pacy, and probably some others; the word lists I have that are easily susceptible to regular expressions aren't remotely complete.)

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Snowball-discuss mailing list