simon-git: puzzles (main): Simon Tatham

Commits to Tartarus hosted VCS tartarus-commits at lists.tartarus.org
Sun May 23 08:56:56 BST 2021


TL;DR:
  f729f51 WASM: move save file encoding from JS into C.

Repository:     https://git.tartarus.org/simon/puzzles.git
On the web:     https://git.tartarus.org/?p=simon/puzzles.git
Branch updated: main
Committer:      Simon Tatham <anakin at pobox.com>
Date:           2021-05-23 08:56:56

commit f729f51e475ff98d0caf529f0723ef810b1c88ef
web diff https://git.tartarus.org/?p=simon/puzzles.git;a=commitdiff;h=f729f51e475ff98d0caf529f0723ef810b1c88ef;hp=1c760b2ee808ba68781a68a57292cc841b3df5a0
Author: Simon Tatham <anakin at pobox.com>
Date:   Sun May 23 08:45:55 2021 +0100

    WASM: move save file encoding from JS into C.
    
    The previous fix worked OK, but it was conceptually wrong. Puzzles
    save files are better regarded as binary, not text: the length fields
    are measured in bytes, so translating the file into a different
    multibyte character encoding would invalidate them.
    
    So it was wrong to fetch a C byte string containing the exactly right
    binary data, then translate it into a Javascript string as if decoding
    from UTF-8, then retranslate to a representation of a bytewise
    encoding via encodeURIComponent, and then label the result as
    application/octet-stream.
    
    This probably wouldn't have caused any problems in practice, because I
    don't remember any situation in which my save files use characters
    outside printable ASCII (plus newline). But it's not actually
    forbidden, so a save file might choose to do that some day, so that
    UTF-8 decode/reencode hidden in the JS was a latent bug.
    
    Now the URI-encoding is done on the C side, while we still know
    exactly what the binary data ought to look like and can be sure we're
    translating it byte for byte into the output encoding for the data:
    URI. By the time the JS receives a string pointer from get_save_file,
    it's already URI-encoded, which _guarantees_ that it's in ASCII and
    won't be messed about with by Emscripten's UTF8ToString.

 emcc.c     | 49 +++++++++++++++++++++++++++++++++++++++++++++----
 emccpre.js |  3 +--
 2 files changed, 46 insertions(+), 6 deletions(-)



More information about the tartarus-commits mailing list