[Snowball-discuss] Multiple errors in generated Java sources for Latin algorithm
Olly Betts
olly at survex.com
Sat Jun 3 23:11:53 BST 2017
On Thu, Jun 01, 2017 at 10:10:21AM +0300, Alexander Myltsev wrote:
> I’m trying to generate Java sources for
> http://snowballstem.org/otherapps/schinke/ algorithm. I added stem.sbl from
> schinke.tgz to “snowball/algorithms/latin/stem.sbl” sources. Then updated
> GNUmakefile:
That doesn't seeem to work for me - I added it as "stem_ISO_8859_1.sbl"
instead.
> [error]
> ./src/main/java/org/tartarus/snowball/ext/latinStemmer.java:260:
> missing return statement
The java backend attempts to avoid writing out unreachable code, because
the designers of Java decided that unreachable code should be a
compile-time error. While that may make sense for human-written code,
it's unhelpful when generating code, but that's the situation we have to
work with.
There's a bug with this currently, as the end of this function clearly
can be reached. If I disable the elision of unreachable code, the
generated latinStemmer.java has a "return true;" at the end of that
function (and that's the only difference).
I'll try to fix this, but it may take me a while to get to as there's a
backlog of issues and PRs currently.
> Even if I stub the error with `return true` or `return false`, stemmer
> produces weird results. When I launch `TestApp latin in.txt –o
> out.txt` for input `datum` it produces string `datum
> datum`, but should just `dat`.
The other stemmers produce one stem which is left in the same buffer the
input is passed in, and that's what the test framework is set up to test.
However, the Latin stemmer produces two stems which are left in two
string variables:
/* the stemmed words are left in noun-form and verb-form, and can
be picked up as C strings at z->S[0] and z->S[1] through the API. */
In Java the string variables are private members and no getters are
currently generated for them, so these values aren't currently
accessible by the caller.
Cheers,
Olly
More information about the Snowball-discuss
mailing list