[Snowball-discuss] Suggestion: 'aci'(<-'at')
James Aylett
james-xapian at tartarus.org
Mon Mar 31 15:41:21 BST 2014
On 30 Mar 2014, at 18:27, Chris Hennick <christopherhe at trentu.ca> wrote:
> In English, the suffix 'acy' almost always corresponds to a cognate ending 'ate' (piracy, privacy, literacy, accuracy) or 'atic' (democracy, lunacy, trichromacy), so it'd be helpful if, like those, it stemmed to -at. (The only word I can think of where this isn't true is pharmacy, but I don't think any words derived from it would be affected.)
I think these are also counter-cases:
* conspiracy
* episcopacy
* fallacy
* lacy
* papacy
* racy
* supremacy
This out of 35 words -acy in Moby, so a little less than a quarter. GCIDE has a much longer list, but I doubt it would change the ratio significantly.
I think the only two likely to cause problems would be l-acy -> l-at, r-acy -> r-at, which could be handled with a minimum length in the rule. (There's also pacy, and probably some others; the word lists I have that are easily susceptible to regular expressions aren't remotely complete.)
J
--
James Aylett, occasional trouble-maker
xapian.org
More information about the Snowball-discuss
mailing list