simon-git: putty (main): Simon Tatham

Commits to Tartarus hosted VCS tartarus-commits at lists.tartarus.org
Wed May 15 17:36:15 BST 2024


TL;DR:
  640c7028 More Unicode samples for utf8.txt, most of which fail.
  b6ef4f18 Support Unicode flag glyphs in terminal.c (works in GTK).

Repository:     https://git.tartarus.org/simon/putty.git
On the web:     https://git.tartarus.org/?p=simon/putty.git
Branch updated: main
Committer:      Simon Tatham <anakin at pobox.com>
Date:           2024-05-15 17:36:15

commit 640c7028f8b228b8228e98bcc7addefd7629cf80
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=640c7028f8b228b8228e98bcc7addefd7629cf80;hp=6b10eaa245f05c938085d61e9de4cc27d5bb1611
Author: Simon Tatham <anakin at pobox.com>
Date:   Mon May 6 08:58:38 2024 +0100

    More Unicode samples for utf8.txt, most of which fail.
    
    These samples all come from the 'emoji' parts of Unicode, although I
    use the word a bit loosely because I'm not sure that flags count (they
    have their own special system). But they're all things that ought to
    display via a separate font, likely in colour.
    
    The second line of this extra test already looks correct in PuTTY:
    three code points each representing an emoji, for which wcwidth()
    correctly reports that they occupy 2 cells each. On GTK, the emoji
    even appear in colour; on Windows they come out in black and
    white. (And I don't know what I can do to fix that; the problem is not
    that I don't have any emoji font installed. I do.)
    
    The first line consists of 'simpler' emoji in the sense of being more
    common, but technically more complicated, because they're ordinary
    Unicode characters such as U+2764 HEAVY BLACK HEART, modified into
    emoji by U+FE0F VARIATION SELECTOR-16. This goes badly because
    wcwidth() measures the primary character as having width 1 (which it
    would do, by itself), and the variation selector as width 0 (also not
    unreasonable), but the total is 1, where you'd like it to be 2. This
    is also difficult to fix, because if we unilaterally changed it then
    every curses-type library would mispredict the cursor position and
    produce display corruption during partial screen redraws!
    
    The third line uses a mechanism I've only found out about recently:
    U+200D ZERO WIDTH JOINER glues together two code points that would
    each be a valid emoji on its own, to make a single combined one. In
    this case, WOMAN + PERSONAL COMPUTER ought to combine into a woman
    using a computer. Again this doesn't work in PuTTY, which knows
    nothing about ZWJ. But it comes out as expected in other tools viewing
    this file, such as 'gedit', or Firefox.
    
    The fourth line shows another complex emoji case: the WOMAN code point
    is followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, and
    another one is followed by U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-6,
    in each case selecting the woman's skin tone. PuTTY mishandles that
    too, because it doesn't know that those should act as modifiers (again
    because wcwidth gives them width 2 rather than 0), and so each one
    occupies an extra two character cells.
    
    And the last line contains some sample flags, each of which is
    obtained by writing a 2-letter code for a country or region (here GB,
    UA, EU) with each Latin letter replaced by the appropriate 'regional
    indicator symbol letter' from the 26-code-point range U+1F1E6 to
    U+1F1FF inclusive. PuTTY doesn't know anything about those either, but
    they at least occupy the right number of cells if handled naïvely, so
    _that_ one might be possible to fix!

 test/utf8.txt | 7 +++++++
 1 file changed, 7 insertions(+)

commit b6ef4f18d51bf6ea5467814a7de586472aebc8fe
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=b6ef4f18d51bf6ea5467814a7de586472aebc8fe;hp=640c7028f8b228b8228e98bcc7addefd7629cf80
Author: Simon Tatham <anakin at pobox.com>
Date:   Mon May 6 11:07:12 2024 +0100

    Support Unicode flag glyphs in terminal.c (works in GTK).
    
    This is the only one of the newly added cases in test/utf8.txt which I
    can (try to) fix unilaterally just by changing PuTTY's display code,
    because it doesn't change the number of character cells occupied by
    the text, only the appearance of those cells.
    
    In this commit I make the necessary changes in terminal.c, which makes
    flags start working in GTK PuTTY and pterm, but not on Windows.
    
    The system of encoding flags in Unicode is that there's a space of 26
    regional-indicator letter code points (U+1F1E6 to U+1F1FF inclusive)
    corresponding to the unaccented Latin alphabet, and an adjacent pair
    of those letters represents the flag associated with that two-letter
    code (usually a nation, although at least one non-nation pair exists,
    namely EU).
    
    There are two plausible ways we could handle this in terminal.c:
    
      (a) leave the regional indicators as they are in the internal data
          model, so that each RI letter occupies its own character cell,
          and at display time have do_paint() spot adjacent pairs of them
          and send each pair to the frontend as a combined glyph.
    
      (b) combine the pairs _in_ the internal data model, by
          special-casing them in term_display_graphic_char().
    
    This choice makes a semantic difference. What if a flag is displayed
    in the terminal and something overprints one of its two character
    cells? With option (a), overprinting one cell of an RI pair with a
    different RI letter would change it into a different flag; with
    option (b), flags behave like any other wide character, in that
    overprinting one of the two cells blanks the other as a side effect.
    
    I think we need (a), because not all terminal redraw systems
    (curses-style libraries) will understand the Unicode flag glyph system
    at all. So if a full-screen application genuinely wants to do a screen
    redraw in which a flag changes to a different flag while keeping one
    of its constituent letters the same (say, swapping between BA and CA,
    or between AC and AD), then the redraw library might very well
    implement that screen update by redrawing only the changed letter, and
    we need not to corrupt the flag.
    
    All of this is now implemented in terminal.c. The effect is that pairs
    of RI characters are passed to the TermWin draw_text() method as if
    they were a wide character with a combining mark: that is, you get a
    two-character (or four-surrogate) string, with TATTR_COMBINING
    indicating that it represents a single glyph, and ATTR_WIDE indicating
    that that glyph occupies two character cells rather than one.
    
    In GTK, that's enough to make flag display Just Work. But on
    Windows (at least the Win10 machine I have to test on), that doesn't
    make flags start working all by itself. But then, the rest of the new
    emoji tests also look a bit confused on Windows too. Help would be
    welcome from someone who knows how Windows emoji display is supposed
    to work!

 putty.h             |  3 +++
 terminal/terminal.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 test/utf8.txt       |  2 +-
 3 files changed, 71 insertions(+), 3 deletions(-)



More information about the tartarus-commits mailing list