[Snowball-discuss] Quasi-infinite recursion in Turkish stemmer

Tom Lane tgl at sss.pgh.pa.us
Wed Aug 24 15:31:26 BST 2022


Some folks doing static code analysis on Postgres [1] discovered that
Snowball's Turkish stemmer contains unchecked recursion: the function
r_stem_suffix_chain_before_ki() will recurse to self indefinitely.
It's possible to exploit that to drive the Postgres server to a stack
overflow crash:

postgres=# SELECT ts_lexize('turkish_stem', repeat('lerdeki', 1000000));
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

We're not too happy about that, as it's on the edge of being something
we'd consider a security issue.  Ordinarily we'd just stick in a stack
depth check and call it good, but since this is generated code that
doesn't seem like a workable long-term fix.  Besides, other users of
the stemmer might also be concerned about this.

I have zero competence in Snowball, so I have no ability to write
an algorithmic fix; but I wonder if this recursion could be converted
to iteration, or else bounded somehow.  Surely real-world cases
wouldn't need to recurse more than a few levels.

(I believe we're using Snowball v2.2.0, if it matters.)

			regards, tom lane

[1] https://www.postgresql.org/message-id/flat/1661334672.728714027%40f473.i.mail.ru



More information about the Snowball-discuss mailing list