Hacker News new | past | comments | ask | show | jobs | submit login

The tar file format is REALLY bad. It's pretty much impossible to thread because it's just doing metadata then content and repeatably concatenating.

IE

    /foo.txt 21
    This is the foo file
    /bar.txt 21
    This is the bar file
That makes it super hard to deal with as you essentially need to navigate the entire tar file before you can list the directories in a tar file. To add a file you have to wait for the previous file to be added.

Using something like sqlite solves this particular problem because you can have a table with file names and a table with file contents that can both be inserted into in parallel (though that will mean the contents aren't guaranteed to be contiguous.) Since SQLite is just a btree it's easy (well, known) how to concurrently modify the contents of the tree.




Funnily enough, tar is like 3 different formats (PaX, tar, ustar). One of the annoying parts of the tar format is that even though you scan all the metadata upfront, you have to keep the directory metadata in RAM until the end and have to wait to apply it at the end.


Or just what zip and every other format does an skits put all the metadata at the beginning - enough to list all files, and extract any single one efficiently


zip interestingly sticks the metadata at the end. That lets you add files to a zip without touching what's already been zipped. Just new metadata at the end.

Modern tape archives like LTFS do the same thing as well.


That sounds like you need to have fetched the whole zip before you can unzip it - which is not what one wants when making "virtual tarfiles" which only exist in a pipe. (i.e. you're packing files in at one end of the pipe and unpacking them at the other)


Just fseek to the end.

Zip format was not designed for piping.


Tapes don't (? certainly didn't) operate this way. You need to read the entire tape to list the contents.

Since tar is a Tape ARchive, the way tar operates makes sense (as it was designed for both output to file and device, i.e. tape).


That point is always raised on every criticism of tar (that it's good at tape).

Yes! It is! But it's awful at archive files, which is what it's used for nowadays and what's being discussed right now.

Over the past 50 years some people did try to improve tar. People did develop ways to append a file table at the end of an archive file. Maintaining compatibility with tapes, all tar utilities, and piping.

Similarly, driven people did extend (pk)zip to cover all the unix-y needs. In fact the current zip utility still supports permissions and symlinks to this day.

But despite those better methods, people keep pushing og tar. Because it's good at tape archival. Sigh.


Tapes currently don't really operate like tar anymore either. Filesystems like LTFS stick the metadata all in one blob somewhere.


It's been a long time since I've operated tape, so good to know things have changed for the better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: