simon-git: charset (master): Simon Tatham

Commits to Tartarus hosted VCS tartarus-commits at lists.tartarus.org
Tue Dec 26 07:47:48 GMT 2017


TL;DR:
  d33e458 New character sets: ISO/IEC 6937 and a variant.
  0a45f74 Alternative CMake-based build script.
  0a81212 Add extern "C" in charset.h.

Repository:     https://git.tartarus.org/simon/charset.git
On the web:     https://git.tartarus.org/?p=simon/charset.git
Branch updated: master
Committer:      Simon Tatham <anakin at pobox.com>
Date:           2017-12-26 07:47:48

commit d33e45816f8b3e6bc1ede926514eb780de9382ed
web diff https://git.tartarus.org/?p=simon/charset.git;a=commitdiff;h=d33e45816f8b3e6bc1ede926514eb780de9382ed;hp=8718813d32346b14917df1348b61ba3ad329ddd5
Author: Simon Tatham <anakin at pobox.com>
Date:   Tue Dec 26 07:46:24 2017 +0000

    New character sets: ISO/IEC 6937 and a variant.
    
    These are _mostly_ single-byte character sets, except that the
    0xC0-0xCF range of bytes are introducer characters for two-byte
    encodings of accented letters - but you'd be forgiven for mistaking
    them for something more like combining characters, since each
    introducer character consistently adds the same diacritic to a
    (defined) selection of permissible follow-up letters.
    
    Here I support ISO 6937 itself (assuming Wikipedia's transcription of
    it to be accurate), and also a variant form I found in EN 300 468 (one
    of the standards for DVB digital broadcast television) which is used
    in broadcast episode-guide metadata and extends the standard version
    of the character set by adding the euro sign.
    
    To make it easier to handle things that are mostly single-byte but
    with special cases, I've extended sbcsgen.pl to be able to output a
    full sbcs_data structure containing two-way translation tables, but
    _not_ also generate a charset_spec and an ENUM_CHARSET to match them.
    This partial output is triggered by replacing the keyword 'charset'
    with 'tables' at the start of an SBCS definition section.

 Makefile.am |   8 +-
 charset.h   |   2 +
 enum.h      |   1 +
 iso6937.c   | 336 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 localenc.c  |   5 +
 sbcs.dat    |  55 ++++++++++
 sbcsgen.pl  |  19 ++--
 7 files changed, 414 insertions(+), 12 deletions(-)

commit 0a45f74685aa1465b279d63c9b5f053bbeffbd84
web diff https://git.tartarus.org/?p=simon/charset.git;a=commitdiff;h=0a45f74685aa1465b279d63c9b5f053bbeffbd84;hp=d33e45816f8b3e6bc1ede926514eb780de9382ed
Author: Simon Tatham <anakin at pobox.com>
Date:   Tue Dec 26 07:46:24 2017 +0000

    Alternative CMake-based build script.
    
    This is subsidiary to the autotools one, in the sense that it works by
    _reading_ Makefile.am to get the lists of source files, so that I
    don't have to maintain those in more than one place. But it means that
    now CMake-based superprojects as well as autotools-based ones can
    include libcharset as a subdirectory or git submodule, and incorporate
    libcharset's build-time needs into their own just by saying something
    like this:
    
      add_subdirectory(charset EXCLUDE_FROM_ALL)
      target_include_directories(some_target PRIVATE charset)
      target_link_libraries(some_target charset)

 .gitignore     |  2 ++
 CMakeLists.txt | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

commit 0a81212ae48131db761890fb058111ae2f2ce59f
web diff https://git.tartarus.org/?p=simon/charset.git;a=commitdiff;h=0a81212ae48131db761890fb058111ae2f2ce59f;hp=0a45f74685aa1465b279d63c9b5f053bbeffbd84
Author: Simon Tatham <anakin at pobox.com>
Date:   Tue Dec 26 07:47:31 2017 +0000

    Add extern "C" in charset.h.
    
    Now I can include it in a C++ program and still successfully link and
    run against a libcharset static library compiled in the normal way.

 charset.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)



More information about the tartarus-commits mailing list