[Snowball-discuss] Multiple errors in generated Java sources for Latin algorithm

Olly Betts olly at survex.com
Mon Jun 5 22:45:29 BST 2017


On Sun, Jun 04, 2017 at 09:51:13AM +0100, Martin Porter wrote:
> A question did one come up about the stemmer itself (I can't remember
> now what it was)

The question is actually on the snowball website page about the stemmer:

| Fig 5 of the 1996 Schinke paper doesn't correspond to the algorithm of
| fig 7, but to the algorithm with the extra rules concerning -ba-,
| -bi-, -sse- mentioned on page 182. Which is the "correct" algorithm -
| with or without those rules? If with, what is the exact criterion for
| their removal? A bigger problem is why the -nt is not removed from
| 'Apparebunt', given -nt as an ending in 6(a). Is -nt a misprint?

https://snowballstem.org/otherapps/schinke/

> Again I was once asked for the paper, which then was not traceable
> anywhere on the web. I photocopied my offprint and sent it through the
> post. If necessary I could scan it and upload it somewhere if anyone
> was desperate for a copy.

It seems to be online now:

http://caio.ueberalles.net/a_stemming_algorithm_for_latin_text_databases-schinke_et_al.pdf

Cheers,
    Olly



More information about the Snowball-discuss mailing list