[Snowball-discuss] Possible issue with generated Java code

Olly Betts olly at survex.com
Tue Nov 10 02:03:51 GMT 2020


On Sat, Apr 25, 2020 at 06:51:41PM +0200, Stefan Petkovic wrote:
> I hope this is the right place to report my findings.

Apologies - I must have missed this message at the time, and only just
noticed it in the list archives.

This is meant to be a suitable place for such things, though opening an
issue on github is harder to accidentally miss.

> While I was learning Snowball language and trying out different things
> you can do with it, I wrote a small Snowball script which when
> translated to Java language results in an endless loop.
> 
> Script in question:
> 
> > routines (
> >     Step_1
> >     Step_2
> > )
> >
> > strings ( str1 str2 )
> >
> > externals ( stem )
> >
> > stringescapes {}
> >
> > define Step_1 as (
> >     $str1 = 'animation'
> >     $str1 ( [
> >              loop 2 gopast 'a'
> >              ]
> >              -> str2
> >            )
> >)

I've tracked this down to a bug in the Java code Snowball generates for
"goto" and "gopast" which matters inside "string-$" (I think this may be
the only situation where it matters, but I haven't fully analysed yet).
The failure string should get saved and cleared on entry to goto/gopast
and restored on exit.  It is saved and restored, but the "clear" is
missing.

Looking at the code I think there's a similar but with "try" inside
"string-$" when used on a command which doesn't change the cursor, but
I've not attempted to test that yet.  Also "try" seems to be missing a
save and restore (unless this isn't needed for some reason I'm not
seeing).

I suspect other languages may be affected, since most copied Java's
approach (because the one we use for C doesn't work for most other
languages).

> Not sure how common is this scenario where you want to use tests like
> loop and gopast on a local strings, but it seams that it does not work
> when translated to Java, or maybe I made a mistake somewhere in my
> script.

None of the standard stemmers we ship use dollar on strings at all.
There's a latin stemmer implementation which does, but we don't
currently include that in the distribution because it produces two
stems for some inputs, and that doesn't fit the pattern of the other
stemmers.

But features in the language should work as advertised in all languages
(or failing that, at least fail noisily, ideally at compile time), so
thanks for raising this.

Cheers,
    Olly



More information about the Snowball-discuss mailing list