simon-git: charset (master): charset.git

Commits to Tartarus hosted VCS tartarus-commits at lists.tartarus.org
Thu Oct 27 18:46:54 BST 2016


TL;DR:
  af8ff5c Add easier-to-type localencs for lots of charsets.
  afa3cca New query function, charset_is_single_byte().
  7657a77 New command-line tool, 'csshow'.

Repository:     https://git.tartarus.org/simon/charset.git
On the web:     https://git.tartarus.org/?p=simon/charset.git
Branch updated: master
Committer:      charset.git
Date:           2016-10-27 18:46:54

commit af8ff5c6f03b60a8610c2f81534931ae40bbcc9b
web diff https://git.tartarus.org/?p=simon/charset.git;a=commitdiff;h=af8ff5c6f03b60a8610c2f81534931ae40bbcc9b;hp=89123b0bdbb3b2780a1bc16d95813f08f43801e5
Author: Simon Tatham <anakin at pobox.com>
Date:   Thu Oct 27 18:44:19 2016 +0100

    Add easier-to-type localencs for lots of charsets.
    
    "Mac Roman (Pirard encoding)" is exceptionally unpleasant to have to
    type on a convcs command line, when I want to convert text from it to
    UTF-8 (since, despite the weirdness and obscurity of that character
    set, it actually comes up quite commonly in email in my experience).
    I've added the aliases 'Mac Pirard', 'Mac-Pirard' and 'MacPirard' for
    it (all three, because I've no idea which of those I might have
    thought to try first the next time I needed it :-).
    
    While I'm at it, I've added similar aliases for all the other Mac
    charsets, since even the nicely named ones like 'Mac Roman' still have
    a space in them which makes them annoying to type as Unix command-line
    arguments. And then I did the same for a few other charsets that only
    had spacey names. Now _everything_ has at least one actually sensible
    identifier, by which I mean it's reasonably short, and contains no
    character that needs shell-escaping. (In fact, according to a quick
    Perl check, everything has at least one name consisting only of
    alphanumerics and hyphens.)

 localenc.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

commit afa3cca9f603efec48a02f095e7dc5dc1acc35f4
web diff https://git.tartarus.org/?p=simon/charset.git;a=commitdiff;h=afa3cca9f603efec48a02f095e7dc5dc1acc35f4;hp=af8ff5c6f03b60a8610c2f81534931ae40bbcc9b
Author: Simon Tatham <anakin at pobox.com>
Date:   Thu Oct 27 18:44:46 2016 +0100

    New query function, charset_is_single_byte().
    
    Mostly for use in the new command-line utility I'm about to commit,
    but I can easily imagine it having other uses too.

 charset.h | 5 +++++
 slookup.c | 6 ++++++
 2 files changed, 11 insertions(+)

commit 7657a77dc0a8079f43ff11d349eccf07e3df2c0b
web diff https://git.tartarus.org/?p=simon/charset.git;a=commitdiff;h=7657a77dc0a8079f43ff11d349eccf07e3df2c0b;hp=afa3cca9f603efec48a02f095e7dc5dc1acc35f4
Author: Simon Tatham <anakin at pobox.com>
Date:   Thu Oct 27 18:44:51 2016 +0100

    New command-line tool, 'csshow'.
    
    Run with no arguments, it prints out the ASCII table, with hex
    character codes down the side and along the top. But you can also ask
    it for any single-byte character set we know about (by giving any
    recognised name for it as an argument, e.g. 'csshow Win1252'), or for
    a 256-byte sub-region of Unicode (e.g. 'csshow U+2500').
    
    This is intended to replace a tiny command-line utility I used to use
    which would print out a full table of single-byte character codes, and
    I then had to pipe it through convcs if I wanted the table for some
    particular SBCS. The replacement is massively overengineered: it auto-
    detects characters that are missing or unprintable for various reasons
    (illegal encodings, terminal control codes), it attempts to compensate
    at display time for double-width East Asian characters and zero-width
    things like combining characters, and it generally tries to arrange to
    be maximally clever with minimal user guidance.

 .gitignore |   1 +
 Makefile   |   9 +-
 csshow.c   | 326 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 335 insertions(+), 1 deletion(-)



More information about the tartarus-commits mailing list