[Snowball-discuss] Snowball-discuss Digest, Vol 61, Issue 3

Patrick Moran patrick.a.moran at gmail.com
Fri Feb 26 03:08:54 GMT 2010


John,

    If I may jump into this discussion - there is another related
project that may give you what you want.  WordNet (out of Princeton, I
think) is essentially the most complete dictionary I've ever seen and,
most importantly, it is hyperlinked.  Words are all connected to
related words, for example "cheese" is connected to "food", since
cheese is a type of food.  "Derivationally related form" is probably
the relationship you want.  It has a nice web interface, as well as a
GUI client for the unices and a portable C API.

    That said, WordNet has a couple drawbacks compared to the approach
you mentioned.  It is English only, it won't properly associate things
that aren't proper English words (slang that isn't in WordNet, proper
nouns etc).  But it will connect you to those forms and all the
connections are valid, as the dictionary was built by hand.  I'm sure
you can think of other strengths and weaknesses of a dictionary-based
approach.

    If the web or gui interfaces are enough, then great.  But fair
warning, having programmed with both, the libstemmer API is much nicer
to work with.  Even as a software developer I had to read the
documentation a few times over to really get WordNet's interface.

Hope some of that was helpful,
Patrick M



More information about the Snowball-discuss mailing list