[Snowball-discuss] Minor(?) errors in snowball's generated C code

Olly Betts olly at survex.com
Sun Sep 30 21:28:01 BST 2018


On Sun, Sep 30, 2018 at 12:55:40PM -0400, Tom Lane wrote:
> /srv/coverity/git/pgsql-git/postgresql/src/backend/snowball/libstemmer/stem_UTF_8_indonesian.c: 370 in indonesian_UTF_8_stem()
> 364                         z->lb = z->c; z->c = z->l; /* backwards, line 185 */
> 365     
> 366                         {   int ret = r_remove_suffix(z); /* call remove_suffix, line 185 */
> 367                             if (ret == 0) goto lab6;
> 368                             if (ret < 0) return ret;
> 369                         }
> >>>     CID 1439829:  Incorrect expression  (UNUSED_VALUE)
> >>>     Assigning value from "z->lb" to "z->c" here, but that stored value is overwritten before it can be used.
> 370                         z->c = z->lb;
> 371                         z->c = c_test8;
[...]
> This is not much of a problem --- any decent compiler will just discard
> the dead store on line 370.

That's true for C, though perhaps not for all the languages we can
generate.

> But maybe it's an easily fixable mistake in the Snowball compiler?  Or
> a bug in the indonesian stemmer algorithm?

This comes from the end of this line:

            test ($measure > 2 backwards remove_suffix)

"backwards X" swaps c and l, then does X, then swaps them back, which
gives us the first restore of z->c.

"test X" performs X and then restores the cursor, which gives us the
second restore of z->c.

I think the "test" is actually redundant here because remove_suffix
can't change z->lb, and if "test" is removed the test vocabulary still
stems the same way, though the algorithm code is arguably clearer with
it.  

We actually have machinery in the compiler to try to eliminate redundant
cursor restoring, but this doesn't currently handle "backwards" so makes
the conservative assumption that the restore is needed.  Probably this
is the best place to address this.

> /srv/coverity/git/pgsql-git/postgresql/src/backend/snowball/libstemmer/stem_UTF_8_arabic.c: 972 in r_Normalize_post()
> 966     
> 967             z->ket = z->c; /* [, line 321 */
> 968             if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 5 || !((124 >> (z->p[z->c - 1] & 0x1f)) & 1)) goto lab0; /* substring, line 321 */
> 969             if (!(find_among_b(z, a_1, 5))) goto lab0;
> 970             z->bra = z->c; /* ], line 321 */
> 971             {   int ret = slice_from_s(z, 2, s_50); /* <-, line 322 */
> >>>     CID 1439828:  Control flow issues  (MISSING_RESTORE)
> >>>     Value of non-local "z->c" that was saved in "c1" is not restored as it was along other paths.
> 972                 if (ret < 0) return ret;
> 973             }
> 974             z->c = z->lb;
> 975         lab0:
> 976             z->c = c1;
> 977         }
> 
> It's not clear to me if this is actually a bug, or it's more of a case of
> "we don't care whether this happens in this code path".  Coverity does have
> a lot of heuristic checks that sometimes give false positives.

It's not a bug - it gives signal f, which will propagate upwards until
either something handles it (and anything which does will restore c to
the saved value at that point), or we return to the (non-snowball) caller
(in which case the position of c isn't important).

I guess we could always restore z->c here since the value will get reset
later when it matters anyway, and this is not a code path that's taken
in normal operation.  But this seems a somewhat dubious check and I'm
hesitant to start changing the code generation for such things.

Cheers,
    Olly



More information about the Snowball-discuss mailing list