[Snowball-discuss] Quasi-infinite recursion in Turkish stemmer
Tom Lane
tgl at sss.pgh.pa.us
Tue Aug 30 17:32:28 BST 2022
Olly Betts <olly at survex.com> writes:
> On Wed, Aug 24, 2022 at 10:31:26AM -0400, Tom Lane wrote:
>> I have zero competence in Snowball, so I have no ability to write
>> an algorithmic fix; but I wonder if this recursion could be converted
>> to iteration, or else bounded somehow. Surely real-world cases
>> wouldn't need to recurse more than a few levels.
> The input string for a snowball stemmer should be a word from a
> human language, so as a simple way to avoid deep recursion here I'd
> suggest imposing a maximum length on the input you pass to Snowball
> (if you just leave anything longer alone then to the caller it's as if
> the stemmer didn't change it).
Thanks for the suggestion! On reflection it seems like putting an
a-priori limit on the word length is a good idea to protect ourselves
from other performance issues that might exist in the stemmers, not
only this one problem with Turkish.
> As to how we could address this in Snowball, I think the easy option
> would be to bail out early based on input length in turkish.sbl.
> Refactoring stem_suffix_chain_before_ki to iterate instead of recurse
> looks complicated. Modifying the Snowball compiler to turn
> self-recursion into iteration also looks complicated.
Probably none of that is really necessary. What could be helpful
perhaps is to suggest in the documentation that input length be
limited. (Maybe it already says that somewhere?)
regards, tom lane
More information about the Snowball-discuss
mailing list