simon-git: putty (main): Simon Tatham

Commits to Tartarus hosted VCS tartarus-commits at lists.tartarus.org
Sun Oct 10 15:02:59 BST 2021


TL;DR:
  0377c689 Start a 'terminal' source subdirectory.
  d7548d04 Move bidi gettype main() into its own file.
  804f3276 Make bidi type enums into list macros.
  3a3b264e wcwidth.c: reflow existing lookup table.
  53e84b89 wcwidth.c: update to Unicode 14.0.0.
  caa16deb bidi.c: update the API.
  b8be01ad Complete rewrite of the bidi algorithm.
  93ba7457 Test rig for the new bidi algorithm.

Repository:     https://git.tartarus.org/simon/putty.git
On the web:     https://git.tartarus.org/?p=simon/putty.git
Branch updated: main
Committer:      Simon Tatham <anakin at pobox.com>
Date:           2021-10-10 15:02:59

commit 0377c689f24a704814ae180264c3cf075548032c
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=0377c689f24a704814ae180264c3cf075548032c;hp=e7dd2421cfaf6dd75b316c011e4b10843d2007a6
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun Oct 10 14:30:37 2021 +0100

    Start a 'terminal' source subdirectory.
    
    This contains terminal.c, bidi.c (formerly minibidi.c), and
    terminal.h. I'm about to make a couple more bidi-related source files,
    so it seems worth starting by making a place to put them that won't be
    cluttering up the top level.

 CMakeLists.txt                    | 5 ++++-
 putty.h                           | 2 +-
 minibidi.c => terminal/bidi.c     | 0
 terminal.c => terminal/terminal.c | 0
 terminal.h => terminal/terminal.h | 0
 5 files changed, 5 insertions(+), 2 deletions(-)

commit d7548d044923a991914486432439612f0ddb982a
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=d7548d044923a991914486432439612f0ddb982a;hp=0377c689f24a704814ae180264c3cf075548032c
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun Oct 10 14:31:04 2021 +0100

    Move bidi gettype main() into its own file.
    
    That's what I've usually been doing with any main()s I find under
    ifdef; there's no reason this should be an exception. If we're keeping
    it in the code at all, we should ensure it carries on compiling.
    
    I've also created a new header file bidi.h, containing pieces of the
    bidi definitions shared between bidi.c and the new source file.

 CMakeLists.txt          |   4 ++
 terminal/bidi.c         | 122 +++++-------------------------------------------
 terminal/bidi.h         |  60 ++++++++++++++++++++++++
 terminal/bidi_gettype.c |  53 +++++++++++++++++++++
 4 files changed, 128 insertions(+), 111 deletions(-)

commit 804f32765fd909018ec22b27d5e3d7bffe802d72
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=804f32765fd909018ec22b27d5e3d7bffe802d72;hp=d7548d044923a991914486432439612f0ddb982a
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun Oct 10 14:32:03 2021 +0100

    Make bidi type enums into list macros.
    
    This makes it easier to create the matching array of type names in
    bidi_gettype.c, and eliminates the need for an assertion to check the
    array matched the enum. And I'm about to need to add more types, so
    let's start by making that trivially easy.

 terminal/bidi.h         | 63 ++++++++++++++++++++++++++-----------------------
 terminal/bidi_gettype.c | 30 ++++-------------------
 2 files changed, 39 insertions(+), 54 deletions(-)

commit 3a3b264e9dc801a39cb9fba8aad65b9c71cc5908
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=3a3b264e9dc801a39cb9fba8aad65b9c71cc5908;hp=804f32765fd909018ec22b27d5e3d7bffe802d72
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun Oct 10 14:32:33 2021 +0100

    wcwidth.c: reflow existing lookup table.
    
    With one entry per line, it now takes up more vertical space, but it
    will be easier to see changes when I update it for a later Unicode
    version.

 utils/wcwidth.c | 188 ++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 141 insertions(+), 47 deletions(-)

commit 53e84b893323d2cae3a789bfa6560a5e1a92ccdd
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=53e84b893323d2cae3a789bfa6560a5e1a92ccdd;hp=3a3b264e9dc801a39cb9fba8aad65b9c71cc5908
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun Oct 10 14:33:21 2021 +0100

    wcwidth.c: update to Unicode 14.0.0.
    
    I wasn't able to find the 'uniset' program mentioned in the comment
    that generated one of the tables, or at least I wasn't confident that
    I'd found the right thing of that name. So I rewrote the semantics of
    that command line in my own Perl and have included that in the revised
    version of the comment.

 utils/wcwidth.c | 315 +++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 275 insertions(+), 40 deletions(-)

commit caa16deb1cca045e88065b86ac39826fcbee84fb
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=caa16deb1cca045e88065b86ac39826fcbee84fb;hp=53e84b893323d2cae3a789bfa6560a5e1a92ccdd
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun Oct 10 14:40:51 2021 +0100

    bidi.c: update the API.
    
    The input length field is now a size_t rather than an int, on general
    principles. The return value is now void (we weren't using the
    previous return value at all). And we now require the client to have
    previously allocated a BidiContext, which will allow allocated storage
    to be reused between runs, saving a lot of churn on malloc.
    
    (However, the current BidiContext doesn't contain anything
    interesting. I could have moved the existing mallocs into it, but
    there's no point, since I'm about to rewrite the whole thing anyway.)

 defs.h              |  2 ++
 putty.h             |  4 +++-
 terminal/bidi.c     | 22 +++++++++++++++++++---
 terminal/terminal.c |  6 +++++-
 terminal/terminal.h |  2 ++
 5 files changed, 31 insertions(+), 5 deletions(-)

commit b8be01adca7f9b70d04cbd967628136398a7abaa
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=b8be01adca7f9b70d04cbd967628136398a7abaa;hp=caa16deb1cca045e88065b86ac39826fcbee84fb
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun Oct 10 14:51:17 2021 +0100

    Complete rewrite of the bidi algorithm.
    
    A user reported that PuTTY's existing bidi algorithm will generate
    misordered text in cases like this (assuming UTF-8):
    
      echo -e '12 A \xD7\x90\xD7\x91 B'
    
    The hex codes in the middle are the Hebrew letters aleph and beth.
    Appearing in the middle of a line whose primary direction is
    left-to-right, those two letters should appear in the opposite order,
    but not cause the rest of the line to move around. That is, you expect
    the displayed text in this situation to be
    
      12 A <beth><aleph> B
    
    But in fact, the digits '12' were erroneously reversed, so you would
    actually see '21 A <beth><aleph> B'.
    
    I tried to debug the existing bidi algorithm, but it was very hard,
    because the Unicode bidi spec has been extensively changed since
    Arabeyes contributed that code, and I couldn't even reliably work out
    which version of the spec the code was intended to implement. I found
    some problems, notably that the resolution phase was running once on
    the whole line instead of separately on runs of characters at the same
    level, and also that the 'sor' and 'eor' values were being wrongly
    computed. But I had no way to test any fix to ensure it hadn't
    introduced another bug somewhere else.
    
    Unicode provides a set of conformance tests in the UCD. That was just
    what I wanted - but they're too up-to-date to run against the old
    algorithm and expect to pass!
    
    So, paradoxically, it seemed to me that the _easiest_ way to fix this
    bidi bug would be to bring absolutely everything up to date. But the
    revised bidi algorithm is significantly more complicated, so I also
    didn't think it would be sensible to try to gradually evolve the
    existing code into it. Instead, I've done a complete rewrite of my
    own.
    
    The new code implements the full UAX#9 rev 44 algorithm, including in
    particular support for the new 'directional isolate' control
    characters, and also special handling for matched pairs of brackets in
    the text (see rule N0 in the spec). I've managed to get it to pass the
    entire UCD conformance test suite, so I'm reasonably confident it's
    right, or at the very least a lot closer to right than the old
    algorithm was.
    
    So the upshot is: the test case shown at the top of this file now
    passes, but also, other detailed bidi handling might have changed,
    certainly some cases involving brackets, but perhaps also other things
    that were either bugs in the old algorithm or updates to the standard.

 terminal/bidi.c | 3673 ++++++++++++++++++++++++++++++++++++++++---------------
 terminal/bidi.h |   69 ++
 2 files changed, 2735 insertions(+), 1007 deletions(-)

commit 93ba74579a22a8976e49bbb1c9e45b5bbd0d35bf
web diff https://git.tartarus.org/?p=simon/putty.git;a=commitdiff;h=93ba74579a22a8976e49bbb1c9e45b5bbd0d35bf;hp=b8be01adca7f9b70d04cbd967628136398a7abaa
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun Oct 10 14:52:17 2021 +0100

    Test rig for the new bidi algorithm.
    
    This standalone CLI program runs the UCD bidi tests in the form
    provided in Unicode 14.0.0. You can run it by just saying
    
      bidi_test --class BidiTest.txt --char BidiCharacterTest.txt
    
    assuming those two UCD files are in the current directory.

 CMakeLists.txt       |   4 +
 terminal/bidi.c      |   8 ++
 terminal/bidi.h      |  13 ++
 terminal/bidi_test.c | 365 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 390 insertions(+)



More information about the tartarus-commits mailing list