simon-git: putty (main): Simon Tatham

Fri Feb 17 17:20:04 GMT 2023

TL;DR:
  9e01de7c decode_utf8: add an enumeration of failure reasons.

Repository:     https://git.tartarus.org/simon/putty.git
On the web:     https://git.tartarus.org/?p=simon/putty.git
Branch updated: main
Committer:      Simon Tatham <anakin at pobox.com>
Date:           2023-02-17 17:20:04

commit 9e01de7c2b2903412822f3285da1d692d1474524
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=9e01de7c2b2903412822f3285da1d692d1474524;hp=9d308b39da715f77c9c07ea28eeb3016d1be12b1
Author: Simon Tatham <anakin at pobox.com>
Date:   Fri Feb 17 16:39:09 2023 +0000

    decode_utf8: add an enumeration of failure reasons.

    Now you can optionally get back an enum value indicating whether the
    character was successfully decoded, or whether U+FFFD was substituted
    due to some kind of problem, and if the latter, what problem.

    For a start, this allows distinguishing 'real' U+FFFD (encoded
    legitimately in the input) from one invented by the decoder. Also, it
    allows the recipient of the decode to treat failures differently,
    either by passing on a useful error report to the user (as
    utf8_unknown_char now does) or by doing something special.

    In particular, there are two distinct error codes for a truncated
    UTF-8 encoding, depending on whether it was truncated by the end of
    the input or by encountering a non-continuation byte. The former code
    means that the string is not legal UTF-8 _as it is_, but doesn't rule
    out it being a (bytewise) prefix of a legal UTF-8 string - so if a
    client is receiving UTF-8 data a byte at a time, they can treat that
    error code specially and not make it a fatal error.

 misc.h                       |  24 ++++++-
 utils/decode_utf8.c          | 155 ++++++++++++++++++++++++++++++++-----------
 utils/decode_utf8_to_wchar.c |   5 +-
 utils/unicode-known.c        |   8 ++-
 utils/unicode-norm.c         |   2 +-
 windows/unicode.c            |   2 +-
 6 files changed, 149 insertions(+), 47 deletions(-)