[Snowball-discuss] Can snowball be run backwards to generate
words?
Martin Porter
martin_porter@softhome.net
Sat, 22 Dec 2001 14:56:28 -0700
You can turn the Porter stemmer inside out, and generate all endings that
the stemmer will recognise, but there are several problems. One is that the
endings go in a circles, e.g.
ize + ation as in realization
ation + al as in operational
al + ize as in normalize
- suggesting infinite endings izationalizational... You can break the loop
by noting that four is the upper limit on the number of derivational
suffixes that can be attached to a word in English.
If you do this, you end up with really quite a lot of endings. Here is a
list I put together recently,
Inflexional: ed ing ings s
Derivational:
ic ioned *ationed *icationed
*izationed *alizationed ered *izered
*alizered *icalizered *ionalizered ated
icated ized alized *icalized
*ionalized *ationalized ance ence
able ible ate icate
ive ative icative ize
alize *icalize *ionalize *ationalize
ioning *ationing *icationing *izationing
*alizationing ering *izering *alizering
*icalizering *ionalizering ating icating
izing *alizing *icalizing *ionalizing
*ationalizing al ical ional
ational *icational *izational ful
ism alism *icalism *ionalism
*ationalism ion ation ication
ization alization er izer
*alizer *icalizer *ionalizer ator
ics ances ences ancies
encies ities icities alities
*icalities ionalities *ationalities abilities
ibilities *ivities *ativities *icativities
ables ibles nesses *ivenesses
*ativenesses *icativenesses *alnesses *icalnesses
*ionalnesses *ationalnesses *fulnesses *ousnesses
ates icates ives atives
*icatives izes *alizes *icalizes
*ionalizes *ationalizes als icals
ionals *ationals *icationals *izationals
isms *alisms *icalisms *ionalisms
*ationalisms ions ations ications
izations *alizations ers izers
*alizers *icalizers *ionalizers ators
ness iveness *ativeness *icativeness
alness *icalness ionalness *ationalness
fulness ousness ants ents
ments ements ous ant
ent ment ement ancy
ency ly ably ibly
ately *icately ively atively
*icatively ally ically ionally
ationally ously ently *mently
*emently ity icity ality
icality ionality *ationality ability
ibility ivity *ativity *icativity
- sorted by ending and arranged in 4 columns. The endings marked * are very
rare or non-existent and could be ignored. There are some extra rules:
endings beginning ion should follow s or t in the stem. This is a minimum
list: you can argue for other forms (ableness for example).
If a word is se, where s is the stem and e the ending, looking up all the s*
where * is any of these endings could be quite expensive therefore.
Sometimes classes of endings can be eliminated on grammatical grounds. For
example, ness forms nouns from adjectives, and able forms adjectives from
nouns, so you would not expect them to attach to the same word. But there
are many exceptions to rules like this.
I think ending generation helps understand stemmers, but I'm not sure that
classes of endings are utilizable by IR systems, if only because there are
so many of them.
Martin
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss