[Snowball-discuss] Question about adding rules to Snowball Stemmer
Martin Porter
martin.porter at grapeshot.co.uk
Thu Jul 15 10:40:59 BST 2004
Olga,
Thank you for your interest.
For your various questions,
1. How we can add a list of "exceptional cases" to the Stemmer -
See the section beginning with the heading "Exceptional forms in general" in
the page
http://snowball.tartarus.org/english/stemmer.html
(Various approaches are possible, but this would be my approach.)
4. Is there Java API available for Snowball?
Yes. See the section "Java generation" in
http://snowball.tartarus.org/q/use.html
5. Could you perhaps point me to some other publicly available stemmers I
could look at and play with?
There is a great deal of work around on language processing (and stemming),
but unfortunately most of it is proprietory, and therefore difficult to
review or assess. An example is
http://www.teragram.com/oem/euro_lang.htm#stemming
For stemming freeware, there is not much avaliable. For English, there is
the Lovins stemmer, see
http://www.cs.waikato.ac.nz/~eibe/stemmers/
and the Paice stemmer, see
http://www.comp.lancs.ac.uk/computing/research/stemming/
But the Lovins stemmer is also available on the Snowball site in snowball form:
http://snowball.tartarus.org/lovins/stemmer.html
The Paice stemmer does not easily translate into Snowball, otherwise it
would be there too.
For foreign language stemmers, there are often references, but almost never
proper algorithmic descriptions. For example,
http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=e
n#id2600260
describes work done in Polish.
There does not appear to be a question 3. For question 2, I am not sure what
you have in mind, and perhaps you could explain a little more fully.
Martin
More information about the Snowball-discuss
mailing list