Hacker News new | past | comments | ask | show | jobs | submit login

Are you able to seek and selectively extract from squashfs archives using range headers if stored in object storage systems like S3?

Example: https://alexwlchan.net/2019/working-with-large-s3-objects/




Certainly, squashfs is designed to be random-access.


But S3 isn't.


It has to be if you can "seek and selectively extract from" a zip file: the ability to do that relies on the ability to read the end of the archive for the central directory, then read the offset and size you get from that to get at the file you need.

squashfs may or may not be able to do it with as few roundtrips (I don't know the details of its layout), but S3 necessarily provides the capabilities for random access otherwise you'd have to download the entire content either way and the original query would be moot.


You can read sequentially through a zip file and dynamically build up a central directory yourself and do whatever desired operations.

There's the caveat of the zipfile itself may have stuff that's not mentioned in the actual central directory of the zipfile.


> You can read sequentially through a zip file and dynamically build up a central directory yourself and do whatever desired operations.

First, zip files already have a central directory so why would you do that?

Second, you seem to be missing the subject of this subthread entirely, the point is being able to selectively access S3 content without downloading the whole archive. If you sequentially read through the entire zip file, you are in fact downloading the whole archive.


Sorry, I wasn't clear before. You don't need the central directory to process a zipfile. You don't need random access to a zipfile to process it.

A zipfile can be treated as a stream of data and processed as each individual zip entry is seen in the download/read. NO random access is required.

Just enough memory for the local directory entry and buffers for the zipped data and unzipped data. The first two items should be covered by the downloaded zipfile buffer.

If you want to process the zipfile ASAP or don't have the resources to download the entire zipfile first before processing the zipfile, then this is a valid manner to handle the zipfile. If your desired data occurs before the entire zipfile has been downloaded, you can stop the download.

A zipfile can also be treated as a randomly accessed file as you mentioned. Some operations are faster that way - like browsing the each zip entry's metadata.


It is. S3 supports fetching arbitrary byte ranges from files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: