[Snowball-discuss] Irregular Plurals and Ablaut Plurals

Kasun Gajasinghe kasunbg at gmail.com
Sat Aug 14 08:21:09 BST 2010


On Sat, Aug 14, 2010 at 12:44 AM, Nathan Wilson <Nathan.Wilson at ppc.com>wrote:

>  I am using CIS Documentum 6.5 SP3 which is using Snowball for its
> stemming logic.  I have noticed that Snowball does not handle Irregular or
> Ablaut Plurals.
>
> Can these instances be handled in the exception1 routine for the English
> stemmer?
>
>
>
> ‘men’    (<-‘man’)
>
> ‘women’              (<-‘woman’)
>
> ‘children’             (<-‘child’)
>
> ‘oxen’   (<-‘ox’)
>
> ‘ran’       (<-‘run’)
>
>
>
> There are more and care should be taken to not explode the exception1
> routine, but inclusion of the more common occurrences may be useful.
>
> Is this where Irregular and Ablaut Plurals can be handled, if not is there
> a place to handle such stemming?
>
> If these instances can be handled here is there any idea on when or if this
> will be included?
>

What you described is more related to Lemmatizing rather than Stemming.
There's a project called "MorphAdorner" which does lemmatizing. I don't have
much knowledge about it, but you may have look. All queries you specified
are correctly lemmatized to their singular/root form by it.

http://morphadorner.northwestern.edu/morphadorner/lemmatizer/example/

HTH
--Kasun

~~~*******'''''''''''''*******~~~
Kasun Gajasinghe,
University of Moratuwa,
Sri Lanka.
Blog: http://kasunbg.blogspot.com
Twitter: http://twitter.com/kasunbg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20100814/057d5951/attachment.htm>


More information about the Snowball-discuss mailing list