I really think this is not an issue. The old Porter stemmer, for example, run on pure binary data with the occasional hex 20 marking a word break, runs happily to completion, and similarly for the other stemmers. Suggesting a limit on input length, it seems to me, is to ring unnecessary alarm bells.