simon-git: cvt-utf8 (master): cvt-utf8.git

Commits to Tartarus hosted VCS tartarus-commits at lists.tartarus.org
Sat Jan 21 19:06:50 GMT 2017


TL;DR:
  3551777 Rewrite the Unihan zip-file untangling.

Repository:     https://git.tartarus.org/simon/cvt-utf8.git
On the web:     https://git.tartarus.org/?p=simon/cvt-utf8.git
Branch updated: master
Committer:      cvt-utf8.git
Date:           2017-01-21 19:06:50

commit 35517774facc527c523730220d92905003a1059d
web diff https://git.tartarus.org/?p=simon/cvt-utf8.git;a=commitdiff;h=35517774facc527c523730220d92905003a1059d;hp=0013973562ef5099eab40802aec4f8f5df93ce99
Author: Simon Tatham <anakin at pobox.com>
Date:   Sat Jan 21 19:02:04 2017 +0000

    Rewrite the Unihan zip-file untangling.
    
    Jacob Nevins pointed out that unicode.org has changed their zip file
    organisation so as to divide up the giant Unihan.txt into multiple
    files. So we now need to iterate over all members of the zip file, not
    just the first one.
    
    The simplest way to achieve that in turn is to completely throw out my
    old code that would unpack a zip file in a streamed presentation even
    after having already read its first character, and replace it with the
    really simple approach of just slurping the whole file into memory and
    passing it to the standard Python zipfile module. I think these days
    that's not an unreasonable demand on the computer running this build
    step.

 cvt-utf8 | 88 ++++++++++++++++------------------------------------------------
 1 file changed, 21 insertions(+), 67 deletions(-)



More information about the tartarus-commits mailing list