simon-git: halibut (main): Simon Tatham

Commits to Tartarus hosted VCS tartarus-commits at lists.tartarus.org
Wed Apr 23 10:17:08 BST 2025


TL;DR:
  ce14e37 load_pfb_file: better edge-case handling.
  570407a Merge libcharset updates, including CMake version fix.
  cb8a083 Update CMake version spec to 3.7...3.28.
  dc2ccb0 Fix buffer underrun in lz77_compress().
  76ac98e in_sfnt.c: fix some UBsan warnings.
  a954b4f Fix nasty mess in bk_paper font configuration.

Repository:     https://git.tartarus.org/simon/halibut.git
On the web:     https://git.tartarus.org/?p=simon/halibut.git
Branch updated: main
Committer:      Simon Tatham <anakin at pobox.com>
Date:           2025-04-23 10:17:08

commit ce14e373b7e6532c0dfa1908fe6030c5667cf79a
web diff https://git.tartarus.org/?p=simon/halibut.git;a=commitdiff;h=ce14e373b7e6532c0dfa1908fe6030c5667cf79a;hp=91e5d73320bb90a85c2b413174e1dba810fde476
Author: Simon Tatham <anakin at pobox.com>
Date:   Wed Apr 23 08:43:24 2025 +0100

    load_pfb_file: better edge-case handling.
    
    A bogus PFB empty of data would provoke a crash when trying to write
    through 'tail' on return, because it would still be null.

 in_pf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

commit 570407a40bdde2a9bb50c16aa47711202ade8923
web diff https://git.tartarus.org/?p=simon/halibut.git;a=commitdiff;h=570407a40bdde2a9bb50c16aa47711202ade8923;hp=ce14e373b7e6532c0dfa1908fe6030c5667cf79a
Merge: ce14e37 4094c95
Author: Simon Tatham <anakin at pobox.com>
Date:   Wed Apr 23 08:44:47 2025 +0100

    Merge libcharset updates, including CMake version fix.

commit cb8a083d084b3464d72c4d3d247085389b2e5874
web diff https://git.tartarus.org/?p=simon/halibut.git;a=commitdiff;h=cb8a083d084b3464d72c4d3d247085389b2e5874;hp=570407a40bdde2a9bb50c16aa47711202ade8923
Author: Simon Tatham <anakin at pobox.com>
Date:   Wed Apr 23 08:45:36 2025 +0100

    Update CMake version spec to 3.7...3.28.
    
    This allows building on distros as far back as Debian stretch (the
    earliest Debian still just-about in support, with CMake 3.7) and as
    far forward as current sid (running 3.31) without provoking any
    warning of the form "Compatibility with CMake < 3.x will be removed
    from a future version of CMake."

 CMakeLists.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

commit dc2ccb079514751c3483362becc5b42bf399c015
web diff https://git.tartarus.org/?p=simon/halibut.git;a=commitdiff;h=dc2ccb079514751c3483362becc5b42bf399c015;hp=cb8a083d084b3464d72c4d3d247085389b2e5874
Author: Simon Tatham <anakin at pobox.com>
Date:   Wed Apr 23 08:58:43 2025 +0100

    Fix buffer underrun in lz77_compress().
    
    The CHARAT macro was calculating (st->winpos + k) % st->winsize, where
    in one out of two calls k is negative (since it represents the
    position before the current input location that we're examining for a
    match). But in C semantics of %, a negative number remains negative
    when reduced mod something.
    
    While I'm here, also added some missing parens around the use of k.

 lz77.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

commit 76ac98ec3e54deb255c5a2b218668bf4805c77e4
web diff https://git.tartarus.org/?p=simon/halibut.git;a=commitdiff;h=76ac98ec3e54deb255c5a2b218668bf4805c77e4;hp=dc2ccb079514751c3483362becc5b42bf399c015
Author: Simon Tatham <anakin at pobox.com>
Date:   Wed Apr 23 09:03:20 2025 +0100

    in_sfnt.c: fix some UBsan warnings.
    
    Even if you're committed to platforms where int is at least 32 bits,
    it's dangerous to shift an unsigned char left by 24, because it
    auto-promotes to int rather than unsigned int, so the shift can end up
    in the sign bit.
    
    And shifting a negative number left is undefined behaviour in the
    first place, so decode_int16 was wrong to use a left shift to combine
    the signed version of its high byte with the unsigned version of its
    low byte. Ordinary multiplication is safer.

 in_sfnt.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

commit a954b4fbd8e6650528128e5a33e70bddd476baf1
web diff https://git.tartarus.org/?p=simon/halibut.git;a=commitdiff;h=a954b4fbd8e6650528128e5a33e70bddd476baf1;hp=76ac98ec3e54deb255c5a2b218668bf4805c77e4
Author: Simon Tatham <anakin at pobox.com>
Date:   Wed Apr 23 09:44:09 2025 +0100

    Fix nasty mess in bk_paper font configuration.
    
    The documentation of \cfg{paper-*-fonts} didn't match what Halibut
    actually did, because the behaviour accidentally changed when I
    introduced support for \s{strong text}. The unintended behaviour
    change has been in all versions of Halibut since 1.1, so it's too late
    to change it back. Instead, I've changed the documentation to match
    reality, and papered over the worst result by introducing a new font
    configuration directive for code paragraphs with a more sensible
    syntax.
    
    The sequence of events leading to this disaster:
    
    1. In the original bk_paper code, I had an array of font names for
    each context (headings, body text etc) indexed by a tiny enum
    containing the values FONT_NORMAL, FONT_EMPH, FONT_CODE in that order.
    
    But code paragraphs had a special highlighting feature that allowed
    bold, and on the other hand, didn't have FONT_NORMAL at all (the whole
    point is that everything is code!), so I reused the FONT_NORMAL entry
    in the code-paragraphs array to hold the font used for bold code.
    
    This was a bit of a hack, but it was purely internal to the Halibut
    source code, not visible externally.
    
    2. In commit 7e9483d0ffc2a40, Ben added support for configuring the
    fonts used in Postscript and PDF output, by introducing the general
    schema \cfg{paper-foo-fonts}{a}{b}{c}, causing the font names a,b,c to
    replace the ones previously configured for that context, in the same
    order as above: FONT_NORMAL, FONT_EMPH, FONT_CODE.
    
    Due to my reuse of FONT_NORMAL, this meant that the code-paragraph
    fonts were specified in a slightly weird order, which wasn't what
    you'd probably have chosen if you'd been designing the config syntax
    from scratch: bold first, then emphasised, then normal code. Ben
    documented this.
    
    3. In commit dcf080aa0e011de, I introduced bold text in normal
    paragraphs, via the \s{...} syntax. I inserted an extra element
    FONT_STRONG into the enumeration of font types.
    
    I inserted FONT_STRONG in the place where I thought it made most sense
    to a reader of the code: between FONT_EMPH and FONT_CODE. I completely
    failed to notice that this changed behaviour, because the order of
    that enumeration was no longer an implementation detail, but had
    leaked into the configuration language as a result of Ben's work.
    
    So it now became possible to specify _four_ fonts on a config line,
    and they'd be taken in the order FONT_NORMAL, FONT_EMPH, FONT_STRONG,
    FONT_CODE.
    
    So font configuration for non-code contexts now expects the order
      \cfg{paper-foo-fonts}{normal}{emph}{bold}{code}
    which I think is a reasonably sensible order, but breaks any previous
    input files that had specified {normal}{emph}{code}.
    
    But for code contexts it's a disaster, because FONT_NORMAL is still
    the array entry used for bold code, so you get
      \cfg{paper-code-fonts}{bold code}{italic code}{IGNORED}{normal code}
    which no sensible human would write on purpose, or expect the code to
    support!
    
    As I said above, I'm not _changing_ either of these syntaxes. I've
    invalidated old input files once, and once was enough.
    
    I've dealt with the non-code contexts by simply updating the
    documentation to match reality, because the new version of the
    directive is reasonably sensible. But I can't leave the code font
    configuration in that horrible state, now that I know about it, so
    I've introduced a differently spelled version of the directive which
    has the fonts in a consistent (to the user's way of thinking) order:
      \cfg{paper-codepara-fonts}{normal code}{italic code}{bold code}
    
    The 'paper-code-font-size' option now has a preferred spelling
    'paper-codepara-font-size' too, to go with the new -fonts option.
    
    This is implemented internally by giving paper_cfg_fonts() a little
    list of array indices in the font structure, showing which array entry
    each font in the config directive corresponds to. As a result, the
    enum of font indices now _is_ a purely internal implementation detail
    again: the user-visible behaviour lives in those fontindices_*[] arrays.

 bk_paper.c     | 66 +++++++++++++++++++++++++++++++++++++++-------------------
 doc/index.but  | 10 +++++++--
 doc/output.but | 50 ++++++++++++++++++++++++++++++++------------
 3 files changed, 90 insertions(+), 36 deletions(-)



More information about the tartarus-commits mailing list