The tar file format is REALLY bad. It's pretty much impossible to thread because it's just doing metadata then content and repeatably concatenating.
IE
/foo.txt 21
This is the foo file
/bar.txt 21
This is the bar file
That makes it super hard to deal with as you essentially need to navigate the entire tar file before you can list the directories in a tar file. To add a file you have to wait for the previous file to be added.
Using something like sqlite solves this particular problem because you can have a table with file names and a table with file contents that can both be inserted into in parallel (though that will mean the contents aren't guaranteed to be contiguous.) Since SQLite is just a btree it's easy (well, known) how to concurrently modify the contents of the tree.
Funnily enough, tar is like 3 different formats (PaX, tar, ustar). One of the annoying parts of the tar format is that even though you scan all the metadata upfront, you have to keep the directory metadata in RAM until the end and have to wait to apply it at the end.
Or just what zip and every other format does an skits put all the metadata at the beginning - enough to list all files, and extract any single one efficiently
zip interestingly sticks the metadata at the end. That lets you add files to a zip without touching what's already been zipped. Just new metadata at the end.
Modern tape archives like LTFS do the same thing as well.
That sounds like you need to have fetched the whole zip before you can unzip it - which is not what one wants when making "virtual tarfiles" which only exist in a pipe. (i.e. you're packing files in at one end of the pipe and unpacking them at the other)
That point is always raised on every criticism of tar (that it's good at tape).
Yes! It is! But it's awful at archive files, which is what it's used for nowadays and what's being discussed right now.
Over the past 50 years some people did try to improve tar. People did develop ways to append a file table at the end of an archive file. Maintaining compatibility with tapes, all tar utilities, and piping.
Similarly, driven people did extend (pk)zip to cover all the unix-y needs. In fact the current zip utility still supports permissions and symlinks to this day.
But despite those better methods, people keep pushing og tar. Because it's good at tape archival. Sigh.
IE
That makes it super hard to deal with as you essentially need to navigate the entire tar file before you can list the directories in a tar file. To add a file you have to wait for the previous file to be added.Using something like sqlite solves this particular problem because you can have a table with file names and a table with file contents that can both be inserted into in parallel (though that will mean the contents aren't guaranteed to be contiguous.) Since SQLite is just a btree it's easy (well, known) how to concurrently modify the contents of the tree.