Re[2]: [Snowball-discuss] Porter strem question

Martin Porter martin_porter@softhome.net
Wed Jan 29 09:37:01 2003


>Dear Martin!
>
>Thank you for your replay.
>I'm very sorry for my English. I mean that I need for "Anti English Porter
stemming" algorythm
>for next translate:
>consign         consign
>consign         consigned
>consign         consigning
>consign         consignment
>consist         consist
>consist     =>  consisted
>consist         consistency
>consist         consistent
>consist         consistently
>consist         consisting
>consist         consists
>consol          consolation
>consol          consolations
>
>Purpose: for my own search engine
>
>Thank you
>-- 
>with best regard, RedStar

Now I see what you mean.

It is possible to do what you want, but only with the aid of a dictionary.
This is because you cannot deduce the part of speech, and therefore the
class of possble endings, from the stem of the word.

Setting such a dictionary up could be done as follows:

A) from a large sample vocabulary get the set of endings corresponding to
each reduced stem, and give the set an identifiable code: e.g.

    V = -ed, -s, -ing, -ings, -able, -ability, -abilities, -ment

V would be a basic verb form, and cover words like govern, arrange, induce,
consign ...

B) (the tricky part) Collapse all these different sets to a small number of
forms. So there would be codes V, V2, N, X ... for different classes of
ending. If a word's endings are nearly the same as X, put it into class X,
and so on. 

C) Make a dictionary of stems where you look up the word by its stem, get
the ending class, and from that generate all the forms.



The issue of ending generation has come up before in snowball dicuss. type

    backwards stemmer generation

is the search box, and look at the top four emails.