diff options
author | Joe Tsai <joetsai@digital-static.net> | 2017-11-02 13:53:16 -0700 |
---|---|---|
committer | Joe Tsai <thebrokentoaster@gmail.com> | 2017-11-06 21:35:59 +0000 |
commit | 4fcc835971ad63cf913ebe074ef6191e35a44ab9 (patch) | |
tree | 21b132a256d3f8652416b9eb9d4a4fe5a25473d5 /src/archive/zip/reader.go | |
parent | 0c554957483aaf1f62c6251dbe62ab5fce3e219c (diff) | |
download | go-4fcc835971ad63cf913ebe074ef6191e35a44ab9.tar.gz go-4fcc835971ad63cf913ebe074ef6191e35a44ab9.zip |
archive/zip: add FileHeader.NonUTF8 field
The NonUTF8 field provides users with a way to explictly tell the
ZIP writer to avoid setting the UTF-8 flag.
This is necessary because many readers:
1) (Still) do not support UTF-8
2) And use the local system encoding instead
Thus, even though character encodings other than CP-437 and UTF-8
are not officially supported by the ZIP specification, pragmatically
the world has permitted use of them.
When a non-standard encoding is used, it is the user's responsibility
to ensure that the target system is expecting the encoding used
(e.g., producing a ZIP file you know is used on a Chinese version of Windows).
We adjust the detectUTF8 function to account for Shift-JIS and EUC-KR
not being identical to ASCII for two characters.
We don't need an API for users to explicitly specify that they are encoding
with UTF-8 since all single byte characters are compatible with all other
common encodings (Windows-1256, Windows-1252, Windows-1251, Windows-1250,
IEC-8859, EUC-KR, KOI8-R, Latin-1, Shift-JIS, GB-2312, GBK) except for
the non-printable characters and the backslash character (all of which
are invalid characters in a path name anyways).
Fixes #10741
Change-Id: I9004542d1d522c9137973f1b6e2b623fa54dfd66
Reviewed-on: https://go-review.googlesource.com/75592
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Diffstat (limited to 'src/archive/zip/reader.go')
-rw-r--r-- | src/archive/zip/reader.go | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/src/archive/zip/reader.go b/src/archive/zip/reader.go index ae01786386..7417b8f36a 100644 --- a/src/archive/zip/reader.go +++ b/src/archive/zip/reader.go @@ -281,6 +281,24 @@ func readDirectoryHeader(f *File, r io.Reader) error { f.Extra = d[filenameLen : filenameLen+extraLen] f.Comment = string(d[filenameLen+extraLen:]) + // Determine the character encoding. + utf8Valid1, utf8Require1 := detectUTF8(f.Name) + utf8Valid2, utf8Require2 := detectUTF8(f.Comment) + switch { + case !utf8Valid1 || !utf8Valid2: + // Name and Comment definitely not UTF-8. + f.NonUTF8 = true + case !utf8Require1 && !utf8Require2: + // Name and Comment use only single-byte runes that overlap with UTF-8. + f.NonUTF8 = false + default: + // Might be UTF-8, might be some other encoding; preserve existing flag. + // Some ZIP writers use UTF-8 encoding without setting the UTF-8 flag. + // Since it is impossible to always distinguish valid UTF-8 from some + // other encoding (e.g., GBK or Shift-JIS), we trust the flag. + f.NonUTF8 = f.Flags&0x800 == 0 + } + needUSize := f.UncompressedSize == ^uint32(0) needCSize := f.CompressedSize == ^uint32(0) needHeaderOffset := f.headerOffset == int64(^uint32(0)) |