gh-72680: Fix false positives when using zipfile.is_zipfile()#134250
Merged
gpshead merged 4 commits intopython:mainfrom May 21, 2025
Merged
gh-72680: Fix false positives when using zipfile.is_zipfile()#134250gpshead merged 4 commits intopython:mainfrom
gpshead merged 4 commits intopython:mainfrom
Conversation
The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles
gpshead
approved these changes
May 19, 2025
|
🤖 New build scheduled with the buildbot fleet by @gpshead for commit 4bfec3e 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134250%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Comment on lines
1
to
13
| Improve Zip file validation in :func:`zipfile.is_zipfile`. | ||
|
|
||
| Before this change :func:`zipfile.is_zipfile` only checked the End Central Directory | ||
| signature. If the signature could be found in the last 64k of the file, | ||
| success! This produced false positives on any file with ``'PK\x05\x06'`` in the | ||
| last 64k of the file - including PDFs and PNGs. | ||
|
|
||
| This is now corrected by actually validating the Central Directory location | ||
| and size based on the information provided by the End of Central Directory | ||
| along with verifying the Central Directory signature of the first entry. | ||
|
|
||
| This should be sufficient for the vast number of Zip files with fewer | ||
| false positives. |
Member
There was a problem hiding this comment.
This is a very beefy NEWS, could it be condensed. We generally keep them quite brief, details go in issues or docs.
Member
There was a problem hiding this comment.
I suggest to move most of the NEWS entry into the commit message, and keep the NEWS entry short.
gpshead
approved these changes
May 21, 2025
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this pull request
May 21, 2025
…ythonGH-134250) bpo-28494: Improve zipfile.is_zipfile reliability The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles * Reuse 'concat' handling for is_zipfile (cherry picked from commit 1298511) Co-authored-by: Tim Hatch <timhatch@netflix.com> Co-authored-by: John Jolly <john.jolly@gmail.com>
|
GH-134401 is a backport of this pull request to the 3.14 branch. |
gpshead
pushed a commit
that referenced
this pull request
May 21, 2025
…H-134250) (#134401) gh-72680: Fix false positives when using zipfile.is_zipfile() (GH-134250) bpo-28494: Improve zipfile.is_zipfile reliability The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles * Reuse 'concat' handling for is_zipfile (cherry picked from commit 1298511) Co-authored-by: Tim Hatch <timhatch@netflix.com> Co-authored-by: John Jolly <john.jolly@gmail.com>
lkollar
pushed a commit
to lkollar/cpython
that referenced
this pull request
May 26, 2025
…ythonGH-134250) bpo-28494: Improve zipfile.is_zipfile reliability The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles * Reuse 'concat' handling for is_zipfile Co-authored-by: John Jolly <john.jolly@gmail.com>
Pranjal095
pushed a commit
to Pranjal095/cpython
that referenced
this pull request
Jul 12, 2025
…ythonGH-134250) bpo-28494: Improve zipfile.is_zipfile reliability The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles * Reuse 'concat' handling for is_zipfile Co-authored-by: John Jolly <john.jolly@gmail.com>
taegyunkim
pushed a commit
to taegyunkim/cpython
that referenced
this pull request
Aug 4, 2025
…ythonGH-134250) bpo-28494: Improve zipfile.is_zipfile reliability The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles * Reuse 'concat' handling for is_zipfile Co-authored-by: John Jolly <john.jolly@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rebased #5053 and fixed the impl to pass tests. Original PR and description below by @jjolly
Fix zipfile validation issue by ... providing more validation!
Originally, zipfile.is_zipfile() only checked the End Central Directory
signature. If the signature could be found in the last 64k of the file,
success! This produced false positives on any file with 'PK\x05\x06' in the
last 64k of the file - including PDFs and PNGs.
This is now corrected by actually validating the Central Directory location
and size based on the information provided by the End Central Directory
along with verifying the Central Directory signature of the first entry.
This should be sufficient for the vast number of zipfiles, but more could be
done to absolutely validate the zipfile content - such as validating all
local file headers and Central Directory entries.