Hacker News new | past | comments | ask | show | jobs | submit login

I believe a zip file could be streamed - most of the file metadata is duplicated between both the 'central directory record' trailer and a header in front of each file. In other words, the first thing in the zip file is a header that you can use to extract the first file, followed by that file, followed by the next file's header...



You can, yes. You can even download the header, open the zip file, choose which files to extract, and only download those. This is possible with HTTP today, and has been for decades.


> You can even download the header, open the zip file, choose which files to extract, and only download those.

That bit isn't true, unfortunately; the file list is stored at the end of the file. You either have to (sequentially, file-by-file) scan the zip to find the file you want, or look at the end. Even more unfortunately, there are several variable-length fields between the interesting stuff like the file list and the end of the archive, and the length of those fields is stored before the field itself, so you can't simply use a Range request to retrieve the last <x> bytes from the end, either. (And even more more unfortunately, the very last thing in the file is a "comment" field, which could conceivably contain the magic number that you have to look for in order to read the trailer/footer record.)

The Zip format is truly awful.

https://en.wikipedia.org/wiki/ZIP_(file_format)#Central_dire...

https://github.com/python/cpython/blob/7e465a6b8273dc0b6cb48...


Something I’ve wondered for a while: is there a good modern alternative to zip?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: