simon-git: putty (main): Simon Tatham
Commits to Tartarus hosted VCS
tartarus-commits at lists.tartarus.org
Fri Feb 17 17:20:04 GMT 2023
TL;DR:
9e01de7c decode_utf8: add an enumeration of failure reasons.
Repository: https://git.tartarus.org/simon/putty.git
On the web: https://git.tartarus.org/?p=simon/putty.git
Branch updated: main
Committer: Simon Tatham <anakin at pobox.com>
Date: 2023-02-17 17:20:04
commit 9e01de7c2b2903412822f3285da1d692d1474524
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=9e01de7c2b2903412822f3285da1d692d1474524;hp=9d308b39da715f77c9c07ea28eeb3016d1be12b1
Author: Simon Tatham <anakin at pobox.com>
Date: Fri Feb 17 16:39:09 2023 +0000
decode_utf8: add an enumeration of failure reasons.
Now you can optionally get back an enum value indicating whether the
character was successfully decoded, or whether U+FFFD was substituted
due to some kind of problem, and if the latter, what problem.
For a start, this allows distinguishing 'real' U+FFFD (encoded
legitimately in the input) from one invented by the decoder. Also, it
allows the recipient of the decode to treat failures differently,
either by passing on a useful error report to the user (as
utf8_unknown_char now does) or by doing something special.
In particular, there are two distinct error codes for a truncated
UTF-8 encoding, depending on whether it was truncated by the end of
the input or by encountering a non-continuation byte. The former code
means that the string is not legal UTF-8 _as it is_, but doesn't rule
out it being a (bytewise) prefix of a legal UTF-8 string - so if a
client is receiving UTF-8 data a byte at a time, they can treat that
error code specially and not make it a fatal error.
misc.h | 24 ++++++-
utils/decode_utf8.c | 155 ++++++++++++++++++++++++++++++++-----------
utils/decode_utf8_to_wchar.c | 5 +-
utils/unicode-known.c | 8 ++-
utils/unicode-norm.c | 2 +-
windows/unicode.c | 2 +-
6 files changed, 149 insertions(+), 47 deletions(-)
More information about the tartarus-commits
mailing list