[Snowball-discuss] problems using the English stemmer in java
Martin Porter
martin.porter at grapeshot.co.uk
Sun Aug 29 16:23:35 BST 2004
Shai,
>is there a way that u know of to get the proper english word that results
from the generated stem ?
You need to use a complete English vocabulary (assuming the language of
application is English). For each word in the vocab, find the stem,
horses->hors
This gives a file that can be inverted,
hors->horses
There will be >=1 stemmed forms for a given stem:
hors->horse
hors->horses
hors->horsed
hors->horsing
('horse' can be a verb: to horse around etc). Choose the shortest:
hors->horse
This gives a mapping of stemmed form to real word, which can be used to
reconstruct a proper English word from a stemmed form.
There are several word lists of English available on the Internet. See for
example,
http://www.gtoal.com/wordgames/yawl/word.list
-- Martin
More information about the Snowball-discuss
mailing list