[Snowball-discuss] Snowball-discuss Digest, Vol 61, Issue 3
John Gage
jsmgage at gmail.com
Fri Feb 26 03:23:09 GMT 2010
I think that I may have made in incorrect impression by saying
"orthographic" thesaurus. The French, among other remarkable traits, study
spelling as part of the school curriculum. It is called "orthography". I
am really not interested in a semantic thesaurus, merely one that groups
words that are all from the same stem and which are spelled in similar ways
together. I also am not interested in 100% accuracy. In fact, 90% accuracy
would probably be more than enough.
One could say that looking up words in just about any dictionary would do
the trick. Not quite. I want the table: a many-to-one table with many
being the words associated with a particular stem, and their stem being one.
Thank you for following along with me on this. I realize that I am both out
of my element and probably not on the correct list for these questions.
John
On Fri, Feb 26, 2010 at 4:08 AM, Patrick Moran <patrick.a.moran at gmail.com>wrote:
> John,
>
> If I may jump into this discussion - there is another related
> project that may give you what you want. WordNet (out of Princeton, I
> think) is essentially the most complete dictionary I've ever seen and,
> most importantly, it is hyperlinked. Words are all connected to
> related words, for example "cheese" is connected to "food", since
> cheese is a type of food. "Derivationally related form" is probably
> the relationship you want. It has a nice web interface, as well as a
> GUI client for the unices and a portable C API.
>
> That said, WordNet has a couple drawbacks compared to the approach
> you mentioned. It is English only, it won't properly associate things
> that aren't proper English words (slang that isn't in WordNet, proper
> nouns etc). But it will connect you to those forms and all the
> connections are valid, as the dictionary was built by hand. I'm sure
> you can think of other strengths and weaknesses of a dictionary-based
> approach.
>
> If the web or gui interfaces are enough, then great. But fair
> warning, having programmed with both, the libstemmer API is much nicer
> to work with. Even as a software developer I had to read the
> documentation a few times over to really get WordNet's interface.
>
> Hope some of that was helpful,
> Patrick M
>
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20100226/fb7a5a15/attachment.htm>
More information about the Snowball-discuss
mailing list