[Snowball-discuss] Some advice needed - to Snowball or not to Snowball

Martin Porter martin.f.porter at gmail.com
Tue Dec 27 21:05:57 GMT 2011


Hi Keith,

Some ideas ......

The words in the dictionary may of course include variant forms that
can be brought together by a stemming process. It may contain
'responsibility' as well as 'responsible', and may even contain plural
forms as separate entries.

So here is what you do ...

You stem the dictionary headwords, and put the forms with a common
stem in the same 'bucket'. So 'responsibility', 'responsible' .... go
in the same bucket. Given a text word, e.g. 'responsibilities', you
stem it and find the bucket of dictionary words it goes in, and then
pick out the word from the bucket to which it is most similar (you'll
need a string similarity measure). Then take that word's definition as
the meaning to offer the user.

The similarity measure could be just Hamming distance (see the
wikipedia description) or something that takes ending relations into
account (that -ities is a plural of -ity etc)

Irregular forms (brought/bring, sung/sing) can be handled expanding
the dictionary to include all variants. For verbs in English there are
only about 200 extra entries required.

Martin



On Tue, Dec 27, 2011 at 5:16 PM, Keith Whittingham
<kwhittingham at gmail.com> wrote:
> I'm looking for some advice.
>
> I'm just starting on a project to help people to learn languages. I would
> like the users to be able to, while looking at a body of text, be able to
> click on a given word and have the program give the meaning. So clicking on
> the word "meaning" might display a dictionary definition of the word "[to]
> mean" for example.
>
> . . . . . . . . . .



More information about the Snowball-discuss mailing list