Hacker News new | past | comments | ask | show | jobs | submit login
Pack: A new container format for compressed files (pack.ac)
204 points by todsacerdoti 9 months ago | hide | past | favorite | 247 comments



I found squashfs to be a great archive format. It preserves Linux file ownership and permissions, you can extract individual files without parsing the entire archive like tar and it's mountable. It's also openable in 7zip.

I wonder how pack compares to it, but its home page and github don't tell much.


Yeah squashfs is one of the good ones right now.

for sosreports (archives with lots of diagnostic commands and logfiles from a linux host), I wanted to find a file format that can both used zstd compression (or maybe something else that is about as fast and compressible, currently often uses xz which is very very slow) -and- that lets you unpack a single file fast, with an index, ideally so you can mount it loopback or with fuse or otherwise just quickly unpack a single file in a many-GB archive.

You'd be surprised that this basically doesn't exist right now. Theres a bunch of half solutions, but no real good easily available one. Some things add indexes to tar, zstd does support partial/offset unpacking without reading the entire archive in the code but basically no one uses that function, it's kindof silly. There are zip and rar tools with zstd support, but they are not all cross compatible and mostly doesn't exist in the packaged Linux versions.

squashfs with zstd added mostly fits the bill.

I was really surprised not to find anything else given we had this in Zip and RAR files 2 decades ago. But nothing so far that would or could ship on a standard open source system managed to modernise that featureset.

(If anyone has any pointers let me know :-)


You can do that with Pack:

`pack -i ./test.pack --include=/a/file.txt`

or a couple files and folders at once:

`pack -i ./test.pack --include=/a/file.txt --include=/a/folder/`

Use `--list` to get a list of all files:

`pack -i ./test.pack --list`

Such random access using `--include` is very fast. As an example, if I want to extract just a .c file from the whole codebase of Linux, it can be done (on my machine) in 30 ms, compared to near 500 ms for WinRAR or 2500 ms for tar.gz. And it will just worsen when you count encryption. For now, Pack encryption is not public, but when it is, you can access a file in a locked Pack file in a matter of milliseconds rather than seconds.


I haven't had a chance to use it yet, but https://github.com/mhx/dwarfs claims to be times faster than squashfs, to compress much better, and to have full FUSE support.


Are you able to seek and selectively extract from squashfs archives using range headers if stored in object storage systems like S3?

Example: https://alexwlchan.net/2019/working-with-large-s3-objects/


Certainly, squashfs is designed to be random-access.


But S3 isn't.


It has to be if you can "seek and selectively extract from" a zip file: the ability to do that relies on the ability to read the end of the archive for the central directory, then read the offset and size you get from that to get at the file you need.

squashfs may or may not be able to do it with as few roundtrips (I don't know the details of its layout), but S3 necessarily provides the capabilities for random access otherwise you'd have to download the entire content either way and the original query would be moot.


You can read sequentially through a zip file and dynamically build up a central directory yourself and do whatever desired operations.

There's the caveat of the zipfile itself may have stuff that's not mentioned in the actual central directory of the zipfile.


> You can read sequentially through a zip file and dynamically build up a central directory yourself and do whatever desired operations.

First, zip files already have a central directory so why would you do that?

Second, you seem to be missing the subject of this subthread entirely, the point is being able to selectively access S3 content without downloading the whole archive. If you sequentially read through the entire zip file, you are in fact downloading the whole archive.


Sorry, I wasn't clear before. You don't need the central directory to process a zipfile. You don't need random access to a zipfile to process it.

A zipfile can be treated as a stream of data and processed as each individual zip entry is seen in the download/read. NO random access is required.

Just enough memory for the local directory entry and buffers for the zipped data and unzipped data. The first two items should be covered by the downloaded zipfile buffer.

If you want to process the zipfile ASAP or don't have the resources to download the entire zipfile first before processing the zipfile, then this is a valid manner to handle the zipfile. If your desired data occurs before the entire zipfile has been downloaded, you can stop the download.

A zipfile can also be treated as a randomly accessed file as you mentioned. Some operations are faster that way - like browsing the each zip entry's metadata.


It is. S3 supports fetching arbitrary byte ranges from files.


> It's also openable in 7zip

If only 7zip could also create them on Windows (it apparently can WIM which seems a direct Windows-native counterpart, also mountable on Linux).


WIM is the closest thing Windows has to a full file-based capture, but I've noticed that even that doesn't capture everything, unfortunately. I forget exactly, but think it was extended attributes that DISM wouldn't capture, despite the /EA flag. Not sure if that was a file format limitation or just a DISM bug.


Very sad. Cross-platform extended attributes are the very thing I would love. I even imagine a new archive format which would be just a key-key-value (I mean it - two keys, a set of key-value pairs for every top level key - this is EA / NTFS streams) store with values compressed using a common dictionary (also possibly encrypted/signed with a common key). Needless to say such a format would enable almost any use case, especially if the layout of the file itself is architectured right. MacOS wouldn't have to add their special folder (they add to every ZIP) anymore, tagging files and saving any metadata about them would be possible, saving multiple versions of the same file, alternative names (e.g. what you received it with and what you renamed it to) for the same file.

I even dream about the days when a file main stream would be pure data and all the metadata would go to EAs. Imagine an MP3 file where the main stream only records the sound but no ID3, all the metadata like the artist and the song names are handled as EAs and can be used in file operation commands.

This also can be made tape-friendly and eliminate need in TAR. Just make sure files (streamms/EAs are written contiguous, closely-related streams go right near, compression is optional and the ToC+dictionary is replicated in a number of places like the beginning, the middle and the end).

As you might have guessed I use all the major OSes and single-OS solutions are of little use to me. Apparently I'd just use SquashFS but it's use is limited on Windows because you can hardly make or mount one there - only unpack with 7zip.


It's easy to forget about supporting EAs on Windows - they are extremely uncommon because you practically need to be in kernelspace to write them. Ntoskrnl.exe has one or two EAs, almost nothing else does.

(ADS are super commonplace and the closer analogue to posix xattrs.)


I didn't know this, thanks. I thought xattrs and ADS are synonymous. Do SquashFS, ext4, HFS+ and APFS have ADS then?

I am looking forward to write my own cross-platform app which would rely on attaching additional data and metatata to my files.

"need to be in kernelspace" does not sound very scary because a user app probably doesn't need to do this very job itself - isn't there usually an OS-native command-line tool which can be invoked to do it?


But it is read-only?

I was trying to change a single file in squashfs container recently and could not find a way to do that.


That's exactly what I'd like to avoid. I want to transfer a group of files (either to myself, friends, or website visitors), not make assumptions about the target system's permission set. For copies of my own data where permissions are relevant, I've got a restic backup

Wake me up if a simple standard comes to pass that neither has user/group ID, mode fields, nor bit-packed two-second-precision timestamps or similar silliness. Perhaps an executable bit for systems that insist on such things for being able to use the download in the intended way

(I self-made this before: a simple length-prefixed concatenation of filename and contents fields. The problem is that people would have to download an unpacker. That's not broadly useful unless it is, as in that one case, a software distribution which they're going to run anyway)


No, too simple.

Sometimes you want to include data and sometimes you don't for different reasons in different contexts. It's not a data handlers job to decide what data is or isn't included, it's the senders job to decide what not to include and the receivers job to decide what to ignore.

The simplest example is probably just the file path. tar or zip don't try to say whether or not a file in the container includes the full absolute path, a portion of the path, or no path.

The container should ideally be able to contain anything that any filesystem might have, or else it's not a generally useful tool, it's some annoying domain-specific specialized tool that one guy just luuuuuvs for his one use-case he thinks is obviously the most rational thing for anyone.

If you don't want to include something like a uid, say for security reasons not to disclose the internal workings of something on your end, then arrange not to include it when creating the archive, the same way you wouldn't necessarily include the full path to the same files. Or outside of a security concern like that, include all the data and let the recipient simply ignore any data that it doesn't support.


Good argument, I've mostly come around to your view. The little "but" that I still see is that the current file formats don't let you omit fields you don't want to pass on, and most decoders don't let you omit fields you don't want to interpret/use while unpacking.

Even if a given decoder could, though, most users wouldn't be able to use that and so they'd get files from 1970 or 1980 if I don't want to pass that on and set it to zeroes, so better is if the field can be omitted (as in, if the header wasn't fixed length but extensible like an IP packet). So I'd still like a "better" archiving format than the ones we have today (though I'm not familiar with the internals of every one of them, like 7z or the mentioned squashfs so tell me if this already exists), but I agree such a format should just support everything ~every filesystem supports


Oh sure, I was talking in generalities and an imaginary archiver, what should an archiver have, not any particular existing actual one.

os and filesystem features differ all over the place, and there will be totally new filesystems and totally new metadata tomorrow. There is practically no common denominator, not even the basic ascii for the filename let alone any other metadata.

So there should just be metadata fields where about the only thing actually parrt of the spec is the structure of a metadata field, not any particular keys or values or number or order of fields. The writer might or might not even include a filed for say, creation time, and the reader might or might not care about that. If the reader doesn't recognize some strange new xattr field that only got invented yesterday, no problem, because it does know what a field is, and how to consume and discard fields it doesn't care about.

There would be a few fields that most readers and writers would all just recognize by convention, the usual basics like filename. Even the filename might not be technically a requirement but maybe an rfc documents a short list of standard fields just to give everyone a reference point. But for instance it might be legal to have nothing but some uuids or something.

That's getting a bit weird but my main point was just that it's wrong to say an archiver shouldn't include timestamps or uids just because one use of archive files is to transfer files from a unix system to a windows system, or from a system with a "bob" user to a system with no "bob" user.


The arguments for tar are --preserve-permissions and --touch (don't extract file modified time).

For unzip, -D skips restoration of timestamps.

For unrar, -ai ignores file attributes, -ts restores the modification time.

There are similar arguments for omitting these when creating the archive, they set the field to a default or specified value, or omit it entirely, depending on the format.


Those are user controls, to allow the user on one end to decide what to put into the container, and there are others to allow the user at the other side to decide what to take out of the container, not limits of the container.

The comment I'm replying to suggested that since one use case results in metadata that is meaningless or mis-matched between sender and receiver, the format itself should not even have the ability to record that metadata.


Is "absolute path" a coherent concept when you are talking about 2 systems?


Is this question a coherent concept when it doesn't change anything when you substitute any other term like "full path" or "as much path as exists" or "any path"?


D:\etc\your.conf would like a word, they seem lost and confused.


It can be if you make assumptions about the basic structure of both systems. Some people rely on this behavior. It can be a good idea or a bad idea, depending on what you're doing.


I agree very much with this. Something that annoys me is how much information tar files leak. Like, you don't need to know the username or groupname of the person that originally owned the files. You don't need to copy around any mode bit other than "executable". You definitely don't need "last modified" timestamps, which exist only to make builds that produce archives non-hermetic.

Frankly, I don't even want any of these things on my mounted filesystem either.

> The problem is that people would have to download an unpacker.

Your archive format just needs to be an executable that runs on every platform. https://github.com/jart/cosmopolitan is something that could help with that. ("Who would execute an archive? It could do anything," I hear you scream. Well, tell that to anyone who has run "curl | bash".)


  tar --create --owner=0 --group=0 --mtime='2000-01-01 00:00:00' \
    --mode='go-rwxst' --file test.tar /bin/dash /etc/hosts

  tar --list --verbose --file test.tar
  -rwx------ root/root    125688 2000-01-01 00:00 bin/dash
  -rw------- root/root      1408 2000-01-01 00:00 etc/hosts


I know it may not seem this way, but a lot of people don't ever run "curl | bash", or if they do, they do so in throwaway VM (or container if source is mostly trusted)


It's really a bad idea most of the time to have an archive that doubles as an executable. It's not possible to cover every possible platform, and in the distant future those self-extracting archives may be impossible to extract without the required host system.


In most common scenarios, curl | bash is no different from apt-add-repository && apt install. Running a completely non-curated executable is very different.


> Wake me up if a simple standard comes to pass that neither has user/group ID, mode fields, nor bit-packed two-second-precision timestamps or similar silliness. Perhaps an executable bit for systems that insist on such things for being able to use the download in the intended way

Other than having timestamps isn't this a ZIP file? No user id, no x bit, widely available implementations... Not very simple though I guess.


Zip is extremely simple, and well documented.

I wrote a ReadableStream to Zip encoder (with no compression) in 50 lines of Javascript.


Me too but in php. I couldn't find a streaming zip encoder that you can just require() and use without further hassle, so I wrote one (it's on github somewhere).

The problem is that zip is finicky and extremely poorly documented. I had to look at what other implementations do to figure out some of the fields. About at least one field, the spec (from the early 90s or late 80s I think) says it is up to you to figure out what you want to put there! After all that, I additionally wrote my own docs in case someone coming after me needs to understand the format as well, but some things are just assumptions and "everyone does it this way"s, leading to me having only moderate confidence that I've followed the spec correctly. I haven't found incompatibilities yet, but I'd also not be surprised if an old decoder doesn't eat it or if a modern one made a different choice somewhere.

It's also not as if I haven't come across third party zip files that the Debian command line tool wouldn't open but the standard Debian/Cinnamon GUI utility was perfectly happy about. If it were so well-documented and standard, that shouldn't be a thing. (Similarly, customers on macOS can't open our encrypted 7z pentest report files. The Finder asks for the password and then flat-out tells them "incorrect password", whereas in reality it seems to be unable to handle filename encryption. Idk if that is per the spec but incompatibilities are abound.)


The PKWare Zip file spec is reasonably detailed.

If you're not sure what the spec is trying to say, then either the PKZip binaries or the Info-ZIP zip/unzip source code is your usual source of truth.

When one unzip works but another unzip app doesn't, then you can usually point the finger at the last zip app that modified the zip file. There's some inconsistency in the zip file.

Running "unzip -t -v" on the zip file in question may yield more info about the problem.


The binaries you refer to as source of truth are a paid product (not sure if the trial version, which requires filling out a form that's currently not loading, includes all options, or how honest it is to use that to make an alternative to their software, or if the terms allow that) and don't seem to run on my operating system. I guess I could buy me a Windows license and read the pkzip EULA to see if you're allowed to use it for making a clone, but I figured the two decoders (that don't always agree with each other) I had on hand would do. If they agree about a field, it's good enough (and decoders can expect that unspecced fields are garbage)


Info-ZIP is open source. Have you never used unzip?


Isn't pkzip the original? I'm not sure I've heard of info-zip but unzip is a command I use regularly on Debian. I highly doubt that's the original commercial implementation though


Here's the link to the PKWARE APPNOTE.TXT

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT


The only special thing about the Zip file format that springs to mind as causing ambiguity is the handling of the OS-specific extra field for a Zip archive entry.

You don't have to include an OS-specific extra field unless you want the information in that specific extra field to be available by the party trying to extract the contents of the zipfile.


Wait until you add support for encryption.


- As far as I know, squashfs is a file system and not an archive format; the "FS" in the name shows the focus.

- It is read-only; Pack is not. Update and delete are not just public yet, as I wanted people to get the taste first.

- It is clearly focused on archiving, rather than Pack wanting to be a container option for people who want to pack some files/data and store or send them with no privacy dangers.

- Pack is designed to be user-friendly for most people; CLI is very simple to work with, and future OS integration will make working with it like a breeze. It is far different from a good file system focused on Linux.

- I did not compare to squashfs, but I will be happy to see any results from interested people.

My bet is on Pack, obviously, to be much faster.


- loop-mount is a thing

- being read only is mostly a benefit to an archive. Back in the days when drives had been small, I occasionally wanted to update a .rar, but in the last ~5 years I can't remember a case for it.

- it's fine, but don't think that others' use cases are invalid because of your vision

- mount is also a CLI interface


As a separate note, had I encountered pack.ac link anywhere on the internet other than here with a description attached, I'd have left it immediately. It just lacks for me any info what it is and why should I try it.


They state how squash is nice for archiving and then you go and ramble about specifically Not Archiving


I second this.


Interesting, I've recently spent an unhealthy amount of time researching archival formats to build the same setup of using SQLite with ZStd.

My use case is extremely redundant data (specific website dumps + logs) that I want decently quick random access into, and I was unhappy with either the access speed, quality/usability or even existence of libraries for several formats.

Glancing over the code this seems to use the following setup:

- Aggregate files

- Chunk into blocks

- Compress blocks of fixed size

- Store file to chunk and chunk to block associations

What I did not see is a deduplication step for the chunks, or an attempt to group files (and by extend, blocks) by similarity in an attempt improve compression.

But I might have just missed that due to lack of familiarity with Pascal.

For anyone interested in this strategy, take a look at ZPAQ [1] by Matt Mahoney, you might know him from the Hutter Prize competition [2] / Large Text Compression Benchmark. It takes 14th place with tuned parameters.

There's also a maintained fork called zpaqfranz, but I ran into some issues like incorrect disk size estimates with it. For me the code was also sometimes hard to read due to being a mix of English and Italian. So your mileage may vary.

[1]: http://mattmahoney.net/dc/zpaq.html [2]: http://prize.hutter1.net [3]: https://github.com/fcorbelli/zpaqfranz


Thank you for the detail check. I should thank the syrup too :)

I'm happy to see a fellow enthusiast. Your deduction is on point. And also, Pack is smart; it skips non-compressible files like MP3 [1], so you do not need to choose the "Store" option to have a faster option, and it speedup decompression too. Pack is the first to achieve this, being faster than Store options. Yes, it was a surprise to me too.

ZPAQ is great, and I study the Hutter Prize competition. Pack is on another chart, which is why I proposed CompressedSpeed [2]. The speed of getting to compression needs to be accounted for. You can store anything on an atom if you try hard enough, but hard work takes time. Deduplication step may get added, but in Hard Press [3].

I am curious to see the results of Pack on your data. You can find me here or o at pack.ac.

[1] It is based on content rather than extension; any data that is determined not to be worthy of compression, will be stored as is. And as a file can get chucked, some parts can get compressed and some cannot. Imagine that part of the subtitle in a MKV file can get compressed, and the Video part gets skipped. Although these features will get more updates over time, if they don't cost time,. Pack focus is being seamless and not the most compressed; there are already great works in the field, such as the noted ZPAQ.

[2] CompressedSpeed = (InputSize / OutputSize) * (InputSize / Speed). Materialized compression speed.

[3] You can choose --press=hard to ask for better compression. Even with Hard Press, Pack does not try to eat your hardware just to get a little more; it goes the optimized way I described.


For your use case you might want to look at RocksDB.

It supports zstandard compression, random access, and it's very robust.


Try this one?

https://github.com/mhx/dwarfs

It has a ton of comparison with existing tools in the README -- zpaqfranz included -- and it seems to be the best there is.


When I read the title, I thought it was a new operating system-level containerization image format for filesystem layers and runtime config. But it looks like "container format" is a more general term for a collection of multiple files or streams into one. https://en.wikipedia.org/wiki/Container_format TIL.

OS containers could use an update too, though. They're often so big and tend to use multiple tar.gz files.


You can use Pack for those cases too. --press=hard creates a more compressed pack for cases of pack once, unpack many.


Similar. I though it would be a media file format ... containing audio, video, tags, pics, etc.


With all due respect, I find it hard to believe the author stumbled upon a trivial method of improving tarballing performance by several orders of magnitude that nobody else had considered before.

If I understand correctly, they're suggesting Pack, which both archives and compresses, is 30x faster than creating a plain tar archive. That just sounds like you used multithreading and tar didn't.

Either way, it'd be nice to see [a version of] Pack support plain archival, rather than being forced to tack on Zstd.


That’s more because plain tar is actually a really dumb way of handling files that aren’t going to tape.

Being better than that is not a hard bar.


The tar file format is REALLY bad. It's pretty much impossible to thread because it's just doing metadata then content and repeatably concatenating.

IE

    /foo.txt 21
    This is the foo file
    /bar.txt 21
    This is the bar file
That makes it super hard to deal with as you essentially need to navigate the entire tar file before you can list the directories in a tar file. To add a file you have to wait for the previous file to be added.

Using something like sqlite solves this particular problem because you can have a table with file names and a table with file contents that can both be inserted into in parallel (though that will mean the contents aren't guaranteed to be contiguous.) Since SQLite is just a btree it's easy (well, known) how to concurrently modify the contents of the tree.


Funnily enough, tar is like 3 different formats (PaX, tar, ustar). One of the annoying parts of the tar format is that even though you scan all the metadata upfront, you have to keep the directory metadata in RAM until the end and have to wait to apply it at the end.


Or just what zip and every other format does an skits put all the metadata at the beginning - enough to list all files, and extract any single one efficiently


zip interestingly sticks the metadata at the end. That lets you add files to a zip without touching what's already been zipped. Just new metadata at the end.

Modern tape archives like LTFS do the same thing as well.


That sounds like you need to have fetched the whole zip before you can unzip it - which is not what one wants when making "virtual tarfiles" which only exist in a pipe. (i.e. you're packing files in at one end of the pipe and unpacking them at the other)


Just fseek to the end.

Zip format was not designed for piping.


Tapes don't (? certainly didn't) operate this way. You need to read the entire tape to list the contents.

Since tar is a Tape ARchive, the way tar operates makes sense (as it was designed for both output to file and device, i.e. tape).


That point is always raised on every criticism of tar (that it's good at tape).

Yes! It is! But it's awful at archive files, which is what it's used for nowadays and what's being discussed right now.

Over the past 50 years some people did try to improve tar. People did develop ways to append a file table at the end of an archive file. Maintaining compatibility with tapes, all tar utilities, and piping.

Similarly, driven people did extend (pk)zip to cover all the unix-y needs. In fact the current zip utility still supports permissions and symlinks to this day.

But despite those better methods, people keep pushing og tar. Because it's good at tape archival. Sigh.


Tapes currently don't really operate like tar anymore either. Filesystems like LTFS stick the metadata all in one blob somewhere.


It's been a long time since I've operated tape, so good to know things have changed for the better.


It was hard to believe for me, too. And I didn't stumble upon it; I looked for it closely, and that was a point in the note. People did not look properly for nearly three decades. Many things have changed, but we computer people are still using the same tools. I am not saying old is not good; the current solutions are great, but what are we, if we don't look for the better?

Yes it is that much faster, and a good part of it is because of the multi-thread design, but as a reminder, WinRAR or 7-Zip are too multi-thread, and you can see the difference. To satisfy your doubt, I suggest running Pack for yourself. I am looking for more data on its behaviour on different machines and data.

Can I ask why do you need a version without ZSTD? If you are thinking that compression slows it down, I should say no. Pack is the first of its kinds that "Store" is slowing it down. Because its compression is smart, it will skip any non-compressible content.

On the same machine and the same Linux source code test:

Pack: 194 MB, 1.3 s

Pack (With no Press): 1.25 GB, 1.8 s


My concern with Pack obliging me to compress is that compression becomes less pluggable; I'd much rather my archive format be agnostic of compression, as with tar, so that I can trivially move to a better compression format when one inevitably comes to be.


You got a point. Although with that that option comes a great cost: We will lose portability, speed and even reliability.

Portability: Receiver (or future you) needs to know what you used, and what version even.

Speed: If you want to do the archive part first (tar) and then compress (gz), you will get much lower speed (as shown in the note).

Reliability: Most people use tar with gz anyway, but if you use it with not so popular algorithm and tools, you will risk having a file that may or may not work into the future.

Pack plan is to use the best of time (Zstandard) and if an update is needed in years to come, it will add support for the new algorithm updates. All Pack clients must only write the latest version (and read all previous versions) and that makes sure almost all use the best of their time.


Pure zstd (or .tar.zstd) vs pack vs patched 7z+zstd would be more interesting, how much overhead introduced by pack format itself - in size and speed


I answered this question here: https://news.ycombinator.com/item?id=39801083 If that is not enough, let me know.


tar.zst vs pack is looking great, thanks! Also there is https://github.com/mcmilk/7-Zip-zstd

.pack vs zst-7z with the same compression settings would b interesting. That will be pure container overhead


Also, 4.7 seconds to read 1345 MB in 81k files is suspiciously slow. On my six-year-old low/mid-range Intel 660p with Linux 6.8, tar -c /usr/lib >/dev/null with 2.4 GiB in 49k files takes about 1.25s cold and 0.32s warm. Of course, the sales pitch has no explanation of which hardware, software, parameters, or test procedures were used. I reckon tar was tested with cold cache and pack with warm cache, and both are basically benchmarking I/O speed.


The footnotes at the bottom says

> Development machine with a two-year-old CPU and NVMe disk, using Windows with the NTFS file system. The differences are even greater on Linux using ext4. Value holds on an old HDD and one-core CPU.

> All corresponding official programs were used in an out-of-the-box configuration at the time of writing in a warm state.


My apologies, the text color is barely legible on my machine. Those details are still minimal though; what versions of software? How much RAM is installed? Why is 7-Zip set to maximum compression but zstd is not? Why is tar.zst not included for a fair comparison of the Pack-specific (SQLite) improvements on top of from the standard solution?


Using 32GB of RAM, but it is far more than they need.

7-Zip was used as others, just gave it a folder to compress. No configuration.

As requested, here are some numbers on tar.zst of Linux source code (the test subject in the note): tar.zst: 196 MB, 5420 ms (using out-of-the box config and -T0 to let it use all the cores. Without it, it would be, 7570 ms) Pack: 194 MB, 1300 ms Slightly smaller size, and more than 4X faster. (Again, it is on my machine; you need to try it for yourself.) Honestly, ZSTD is great. Tar is slowing it down (because of its old design and being one thread). And it is done in two steps: first creating tar and then compression. Pack does all the steps (read, check, compress, and write) together, and this weaving helped achieve this speed and random access.


This sounds like a Windows problem, plus compression settings. Your wlog is 24 instead of 21, meaning decompression will use more memory. After adjusting those for a fair comparison, pack still wins slightly but not massively:

  Benchmark 1: tar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
    Time (mean ± σ):      2.573 s ±  0.091 s    [User: 8.611 s, System: 1.981 s]
    Range (min … max):    2.486 s …  2.783 s    10 runs
   
  Benchmark 2: bsdtar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
    Time (mean ± σ):      3.400 s ±  0.250 s    [User: 8.436 s, System: 2.243 s]
    Range (min … max):    3.171 s …  4.050 s    10 runs
   
  Benchmark 3: busybox tar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
    Time (mean ± σ):      2.535 s ±  0.125 s    [User: 8.611 s, System: 1.548 s]
    Range (min … max):    2.371 s …  2.814 s    10 runs
   
  Benchmark 4: ./pack -i ./linux-6.8.2 -w
    Time (mean ± σ):      1.998 s ±  0.105 s    [User: 5.972 s, System: 0.834 s]
    Range (min … max):    1.931 s …  2.250 s    10 runs
   
  Summary
    ./pack -i ./linux-6.8.2 -w ran
      1.27 ± 0.09 times faster than busybox tar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
      1.29 ± 0.08 times faster than tar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
      1.70 ± 0.15 times faster than bsdtar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
Another machine has similar results. I'm inclined to say that the difference is probably mainly related to tar saving attributes like creation and modification time while pack doesn't.

> it is done in two steps: first creating tar and then compression

Pipes (originally Unix, subsequently copied by MS-DOS) operate in parallel, not sequentially. This allows them to process arbitrarily large files on small memory without slow buffering.


Thank you for the new numbers. Sure, it can be different on different machines, especially full systems. For me on Linux and ext4, Pack finishes the Linux code base at just 0.96 s.

Anyway, I do not expect an order of magnitude difference between tar.zst and Pack; after all, Pack is using Zstandard. What makes Pack fundamentally different from tar.zst is Random Access and other important factors like user experience. I shared some numbers on it here: https://news.ycombinator.com/item?id=39803968 and you are encouraged to try them for yourself. Also, by adding Encryption and Locking to Pack, Random Access will be even more beneficial.


HDD for testing is a pretty big caveat for modern tooling benchmarks. Maybe everything holds the same if done on a SSD, but that feels like a pretty big assumption given the wildly different performance characteristics between the two.


gzip is really, really, really slow, so it's pretty easy to make a thing that uses gzip fast by switching to Zstandard.


Eh, it's not that hard to imagine given how rare it is to zip 81k files of around 1kb each.


Not that rare at all. Take a full disk zip/tar of any Linux/Windows filesystem and you'll encounter a lot of small files.


Ok? How are you comparing these systems to the benchmark so they might be considered relevant? Compressing "Lots of small files" describes an infinite variety of workloads. To achieve anything close to the benchmark you'd need to specifically only compress only small files in a single directory of an average small size. And even the contents of those files would have large implications as to expected performance....


My comment is not making any claims about that. It's just a correction that filesystems with "81k 1KB files" are indeed common.


If that were true, surely it would make sense to demonstrate this directly rather than with a contrived benchmark? The issue is not the preponderance of small files but rather the distribution of data shapes.


Reading many files (81K in this test) is way slower than reading just one big file. For bigger files, Pack is much faster. Here is a link to some results from a kind contributor: https://forum.lazarus.freepascal.org/index.php/topic,66281.m...

(Too long to post here)


That's basically any large source repo.


Zipping up a project directory even without git can be a big file collection. Python virtual environment or node_modules, can quickly get into thousands of small files.


It's like 3x not 30x but yes same skepticism


Wow, Pascal! Haven't seen a project in Pascal in a while. https://github.com/PackOrganization/Pack


Yeah, I'll wait for the ALGOL 68 port.


https://en.wikipedia.org/wiki/Argumentum_ad_populum can be a mistake of the youth. Just sayin.


Pascal was my first programming language, but I appreciate the link ;)


That is the best joke I've heard all day. Thank you for the laugh :)


Indeed, this is the kind of thing I would have expected to see written in Go or Rust. I wonder what the motivation for this implementation choice was.


Let's say the tyranny of the C children has diverted our attention. I'll make a wild ass statement: If Modula-2 (Wirth family with Pascal) had caught on, you would have had whatever you wanted from Rust 20 years ago. But the C noise dominated the narrative.

Use the language that makes you money and encourages you to write code that addresses domains requiring more than just bolting together framework pieces. AI can do a measurable chunk of that work.


20 years ago was 2004, and Borland Delphi 7 was out. It was Pascal-based, but it didn't had that much difference from C programs.

It had a nice unit system with separate interface & implementation sections. This was very nice. The unit files were not compatible with anything else, including previous versions of Delphi - this was not nice, especially since a lot of libraries were distributed in compiled form.

The compilation speed was amazingly fast. This is one thing that was unequivocally better than C at this time.

There were range types (type TExample = 1..1000), but they more of a gimmick - turns out there are very few use cases for build-time limits. There were some uses back in DOS days when you'd have hardcoded resolution of 640x480, but in the windows time most variables were just Integer.

Arrays had optional range checks on access, that was also nice. We'd turn them off if we felt programs were too slow.

Otherwise, it was basically same as C with a bit of classes - custom memory allocation/deallocation, dangling pointers, NULLs, threads you start and stop, mutexes and critical sections. When I finally switched from Pascal to C, I didn't see that much difference (except compilation got much slower)

Maybe you'd say that Borland did something wrong, and Wirth's Modula-2 would be much better than Borland's Pascal, but I doubt this.


You can still use RAD Studio today. Although it's expensive and it's primarily used to maintain old software these days.

Lazarus is the best IDE for Pascal, being completely free, open source and cross platform.


Yep, optional range checks and a variety of other compiler defines to accommodate programmers coming from a C background who preferred to disregard compile time checks in the name of speed of execution. So sure, you can still to this day make pascal act like C. You even get comment delimiters. Kind of adds credence to the influence of C that I'm suggesting.

Wirth languages are about constraints. For instance, when I started writing code in TSM2 and Stonybrook, my general impression was that they both emitted 10-30% more compile time bugs than BP did. If that's too much of a hassle for C programmers, well ok.

Also to add, all the wordiness of Wirth languages, the block delimiters, yes I get it. But all this stuff is just another constraint for sake of correctness. M2, being case sensitive, is even worse about this than Pascal. But the point is to make you look at your code more than once, to proofread it and think about what's going on, because the syntax screams at you a little bit. Of course, with compiler defines, you can turn pascal into C and assume the responsibility for yourself. That's what runtime debuggers are for anyway, yes?

Ok, whatever, but we're missing the point that Wirth was trying to get across, which is to turn the language itself into implicit TDD, starting with first line of code written. C/++ may give you speed, but for the average programmer, all that speed is taken back in the end, due to maintenance costs. IMO, M2 was even better at shifting maintenance costs left of what the C tack did than pascal in the value delivery stream.

Sure, mission critical code can be written even in C/++. Most of SpaceX's code is a C++ codebase. So how did they pull that off? IMO, what they did was write C in the spirit of what Wirth was trying to accomplish. For the sake of maintenance costs, speed is now less of a metric thanks to hardware advances, and correctness is far more of an issue. Which makes sense, because all business is mission critical now and all business runs on more and more software. Would you turn off bounds checks in the compiler now? How about for the programmer who you'll never meet who is writing autonomous driver code for the car you drive?

Way too much money was wasted on the near-sighted value of C. Time to move on, according to Rust developers, who undoubtedly have an impressive background as C/++ programmers. So yeah we all have to follow this C dominated narrative even today, and my charge is, this narrative has retarded the art of programming. So I stick to my original proposition: Whatever you think is great about Rust, like ownership and borrowing, would have been available in production M2 code 20 years ago if we had just given Wirth languages a chance to advance the art in the commercial world. But that narrative would have been too wordy and constraining.


In the "Source" section of the site:

> It is written in the Pascal language, a well-stabilized standard language with compatibility promises for decades. Using the FreePascal compiler and the Lazarus IDE for free and easy development. The code is written to be seen as pseudocode. In place of need, comments are written to help.


Last time is did some Pascal was 2006/7. I think I never saw production-grade code myself.

I wonder if this line is an array in-situ?

  Split(APath, [sfodpoWithoutPathDelimiter, sfodpoWithoutExtension], P, N)


No, it’s a set (bitmask) constant[1].

[1] https://www.freepascal.org/docs-html/current/ref/refse83.htm...


But yeah FPC supports ref counted dynamic arrays with "+" operator.


Yes, and I hope that is a good surprise. As you can see, you can create fast and readable codes with it.


The whole thing makes sense to me and I can't see any major points of criticism in the design rationale. Some thoughts:

* There is already a "native" Sqlite3 container solution called Sqlar [0].

* Sqlite3 itself is certainly suitable as a base and I wouldn't worry about its future at all.

* Pascal is also an interesting choice, it is not the hippest language nor a new kid on the block, but offers its own advantages as being "boring" and "normal". I am thinking especially of the Lindy effect [1].

All in all a nice surprise and I am curious to see the future of Pack. After all, it can only succeed if it gets a stable, critical mass of supporters, both from the user and maintainer spectrum.

[0]: https://sqlite.org/sqlar/doc/trunk/README.md

[1]: https://en.wikipedia.org/wiki/Lindy_effect


Thank you so much for the kind words, refreshing.

Here is the latest sqlar result on Linux source code on the same test machine in warm state:

sqlar: 268 MB, 30.5 s

Pack: 194 MB, 1.3 s

Very good result compared to tar.gz. And much better than ZIP, considering sqlar gives random access like ZIP and unlike tar.gz. I considered sqlar as a proof of concept, and it inspired me to create Pack as a full solution. I always agreed with the great drh (creator of SQLite) points about SQLite as a file format, and Pack is a try to demonstrate that.

I made Pack to give people a better life (at least behind their desks), and as you do, I hope people get to use it and find it useful.


Related note: Lindy's closed in 2018.

https://en.wikipedia.org/wiki/Lindy%27s


And that wasn’t even the original Lindy’s but an unrelated/“unauthorized” reboot since the trademark was declared abandoned. The original closed in 1969 (for which the law was named as it predates that): from 1964 https://web.archive.org/web/20210619015733/https://www.gwern...


Sqlite3 is universal, but now your spec is entirely reliant on Sqlite3's format and all the quirks required to support that.

If you actually care about the future, spec out your own database format and use that instead. It could even be mostly a copy of Sqlite3, but at least it would be part of the spec itself.


You're not "wrong" but Sqlite isn't your run-of-the-mill project. "The SQLite file format is stable, cross-platform, and backwards compatible and the developers pledge to keep it that way through the year 2050." [1]

[1] https://www.sqlite.org/


How many sqlite implementations are there?

Do you need to generate sqlite bindings for every language/runtime? E g. Cloudflare workers


A couple, but simple.

No, you will only need Pack, everything is built into it. Pack is built for Windows and Linux, and more will come. You will be able to run it on almost all CPUs.


I suppose you do need bindings for every language. But sqlite is in C and is extremely popular. If you can't get bindings as one of the first third party libraries your language supports, it's probably a shitty language anyway.


Brilliant


How different is this to any other run of the mill project with few active developers on a single implementation, with backwards compatibility based entirely on promises?

Hot take: SQLite has bugs and quirks.


On the other hand, by using Sqlite one can reimplement this format in another language with very little effort.


That's not the usual meaning of "reimplement".


It requires the sqlite3 library bindings, which might be a lot of effort.


Is there a mainstream language which does not have SQLite bindings?


Probably Go because of CGO.


Go has both. The CGO-bindings are preferable, generally. It’s very mature, fast and works great.

https://github.com/mattn/go-sqlite3


sqlite should be an implementation detail. The table format should be fully documented and use a sqlite virtual table module.


> Most popular solutions like Zip, gzip, tar, RAR, or 7-Zip are near or more than three decades old.

If I can't extract .pack archives 3 decades from now, the use of SQLite 3 will be the reason.


What makes you think that? It's a very widely-used and stable format, cited as a great format for archival use.

The Library of Congress has a page that goes into some depth with respect to their sustainability analysis for the format.

https://www.loc.gov/preservation/digital/formats/fdd/fdd0004...


Good point, thank you. Note that SQLite format is very simple https://www.sqlite.org/fileformat.html


I had to make an archiver once (commercial) so I did think about it a bit. I am not sure pack would solve anything for me. It obviously solves the authors usecases but tar has some tricks which I don't want to lose;

* Able to write to a pipe/socket - lets you not waste space or time by writing to disc something that you intend to transmit over a pipe or TCP socket anyhow. It's almost a "virtual" archive and it should be possible to make one that is far too big to fit into memory - because as you send each bit of it you deallocate that memory. At the receiver each bit can be written to disc or extracted and then that memory is reused for the next bit - so the archive never fully "exists" but it does the job of serialising some data. An example could be piping the output of tar to an ssh command which untars it on a remote machine.

* Metadata has to be with the file data - not stuck at the end of the file - because you need to be able to start work without waiting till the file is fully received through your pipe. You don't want to be forced to have space to store the archive and the extracted files (may be a huge archive).

* Choice of compression - lzop is super fast such that using it can sometimes give slightly better performance than writing uncompressed data. OTOH that might not be your concern and XZ might suit you by compressing much more thoroughly. Either way it's very nice to have compression that works across multiple files - which is especially helpful when compressing a lot of small files such as source code.

* Ability to encapsulate - should be able to put the packed data into any imaginable container like an encryption or data transmission protocol without insisting that the entire archive has to be fully read before members can start to be extracted/processed. This is essentially the same as the pipe/socket requirement.

I'm not saying that these things matter to everyone - I have just found them incredibly useful in a few critical situations. The world of ZIP users on Windows seems to be sort of blind to them - thinking firmly in that box.


Hey fellow enthusiast.

- Piping is really easy and it will get added to Pack. It is matter of time, until these features get added as they will be added based on popularity and piping is not that popular for most people. But I get you and I will add it for you.

- Metadata is not stored in Pack. I don’t want the metadata of my machine attached to a file. It’s a never-ending nightmare to match source and destination OS metadata. There will always be something missing, and Pack tends to get everything perfect or nothing. Storing metadata adds extra weight that most users don’t care about and complicates the ability to store other types of data alongside files. It may get added as an option in the future if many people need it.

- Pack uses Zstandard under the hood. Great compression speed and ratio. In my opinion, it is the leading algorithm in the field and makes it a proper choice to use instead of DEFLATE, used in ZIP or GZIP.

- At this point you are telling tar features. tar is not random access, Pack is. As an example, if I want to extract just a .c file from the whole codebase of Linux, it can be done (on my machine) in 30 ms, compared to near 500 ms for WinRAR or 2500 ms for tar.gz. And it will just worsen when you count encryption. For now, Pack encryption is not public, but when it is, you can access a file in a locked Pack file in a matter of milliseconds rather than seconds.


I think random access is a useful feature - e.g. if you want to compress code modules like java does so that you can still load individual modules quickly.

This is not something I've wanted yet personally but that's just random chance. When I do need it I will know which tool to use! Thanks.

It could be handy to be able to mount a pack like a filesystem.


Indeed. Benchmarking Pack as a file system was fun. It is near 10X faster to let you iterate all the files compared to what I get from NTFS (warm with all caching on for both solutions).

Someday, it can be used as a virtual drive. I leave it to future people.


> The world of ZIP users on Windows seems to be sort of blind to them - thinking firmly in that box.

Seems to me like this is you being blind to certain use cases, and so stuck in your streaming oriented box that you can not conceive of other use cases where a streaming format is actively detrimental.


I started on ZIP like most people and discovered what could be done with streaming so I don't think so.


sqlite is great as a "file format" for a particular application, but I think it's a bad interchange format.

As mediocre as zip and tar are, you can cobble together read/write support without even needing a library. With sqlite, your only real option is to bundle sqlite itself, and while it's relatively lightweight, it's far from trivial.

zip has support for zstd, and if you wanted to make it go faster, you could embed some index metadata.

I can't see any specs for their format, not even a description of the sqlite tables.


After overwriting the "Pack" magic bytes back to the SQLite default values, I was able to open it and see the following tables

    CREATE TABLE Content(ID INTEGER PRIMARY KEY, Value BLOB);
    CREATE TABLE Item(ID INTEGER PRIMARY KEY, Parent INTEGER, Kind INTEGER, Name TEXT);
    CREATE TABLE ItemContent(ID INTEGER PRIMARY KEY, Item INTEGER, ItemPosition INTEGER, Content INTEGER, ContentPosition INTEGER, Size INTEGER);
According to the `.indexes` directive, there are... no indexes. What's the point of sqlite if you're not going to index things?

All the data is stored in one big blob (the "Value" column of the "Content" table), with the metadata storing offsets into it. It looks like there's still the possibility of things being split over multiple blobs (to circumvent the 2GB blob size limit)


I've reverse engineered the format and written up my findings here: https://github.com/DavidBuchanan314/pack-analysis

Summary:

- Custom sqlite magic bytes makes the format incompatible with all existing sqlite tooling.

- No support for file metadata.

- There's no version field (afaict), making future format improvements difficult.

Edit: A previous version of this comment had a much longer list of complaints, but after taking a closer look, I retract them. I was looking at the MediaKit.pack file as an example, which, due to being relatively small, packed all its files into a single BLOB. I was under the mistaken impression that the same approach was taken for larger files, but after some further testing I see that they're split up into ~8MB chunks.

Though, if you have lots of small files (say, a couple of kilobytes each) then random access performance could suffer.


Hello David, and thank you for your comment, analysis, and the issues you opened. I will get to them all.

- SQLite tooling: You will not need it unless you are debugging something, then you can change the header or just use the `--activate-other-options --transform-to-sqlite3` parameter to transform a Pack file to SQLite3, and use the `--activate-other-options --transform-to-pack` to go back. This way, you get a true SQLite3 database that you can browse as you wish. For most people, mixing Pack with SQLite was just a call for problems for the SQLite team (imagine people coming and asking to fix their Pack file from the team; that would not be fair) and a harder future for Pack to update.

- Metadata is not stored in Pack. I don’t want the metadata of my machine attached to a file. It’s a never-ending nightmare to match source and destination OS metadata. There will always be something missing, and Pack tends to get everything perfect or nothing. Storing metadata adds extra weight that most users don’t care about and complicates the ability to store other types of data alongside files. It may get added as an option in the future if many people need it.

- There is a version field. It is currently in Draft 0, and it is written using a custom VFS. Look here for more information: https://github.com/PackOrganization/Pack/blob/main/Source/Dr...

- All future versions of Pack must handle previous versions and must only write the latest version. So any files created right now (Draft 0) will be read correctly for ever to come.

- Each Draft proposal will get its own version, and if it gets final, it will be set to final.

- Two byte after 'Pack' header in little endian as (1 (Draft) shl 13 + 0 (version 0) = 8192). Final would be 0, so the first Final version will be 0 shl 13 + 1 = 1. and the second will be 2. It is by design, so any Draft version gets a higher number, preventing future mixups.

- 8 MB chunks are the default; Pack may choose smaller or bigger (16 MB for many small files or 32 MB for Hard Press).

- Random access is proper as unpacking steps take into account what you want and decompress a Content just once for many neighbouring files. But even for reading just one file, here is an example: if I want to extract just a .c file from the whole codebase of Linux, it can be done (on my machine) in 30 ms, compared to near 500 ms for WinRAR or 2500 ms for tar.gz. And it will just worsen when you count encryption. For now, Pack encryption is not public, but when it is, you can access a file in a locked Pack file in a matter of milliseconds rather than seconds.


Thank you for the detailed response(s). I must admit I'm warming up to the idea of Pack, it does perform well in my testing (I didn't test at first because I'm on aarch64 linux, for which there are no compatible builds).

Not including metadata is an opinionated stance, but I can certainly get behind it, especially as a default. 99% of the time I do not care about metadata when producing a file archive.

Compatibility with existing SQLite tooling is not just useful for debugging, it is extremely useful for writing alternative implementations. If you want Pack to be successful as a format and not just as a piece of software, I think you should do everything you can to make this easier.

In my experimentation, I wrote a simple python script to extract files from a Pack archive. Conveniently, sqlite is part of the python standard library, but in order to make it work with that version (as opposed to compiling my own) I had to edit the file header first, which is inconvenient and not always possible to do (e.g. if write permissions are not available).

Despite that inconvenience, it took less code than a comparably basic ZIP extractor, which is cool!

I worry that requiring a custom VFS will make it harder for people to produce compatible software implementations.

I think your concerns about people contacting SQLite for support are overblown. I assume you've heard the `etilqs_` story[0], but in this case, you need to use a hex-editor or a utility like `file` to even see the header bytes. I think anyone capable of discovering that it's an SQLite DB will be smart enough not to contact SQLite for support with it.

The `Application ID`[1] field in the SQLite header is designed with this exact purpose in mind

> The application_id PRAGMA is used to query or set the 32-bit signed big-endian "Application ID" integer located at offset 68 into the database header. Applications that use SQLite as their application file-format should set the Application ID integer to a unique integer so that utilities such as file(1) can determine the specific file type rather than just reporting "SQLite3 Database".

It's convenient that `Pack` is 32 bits long ;)

[0] https://github.com/mackyle/sqlite/blob/18cf47156abe94255ae14...

[1] https://www.sqlite.org/pragma.html#pragma_application_id


I am happy to hear that, and I really appreciate your interest.

Did you compile it for yourself? Any problem or steps you used, I will be happy to hear, o at pack.ac or GitHub, as it is hard to follow the building here.

As a reminder, Pack Draft 0 has Compatibility with SQLite tools; the only needed step is to change the first 16 bytes. Again, you can use `--activate-other-options --transform-to-sqlite3` with the CLI tool, and you will get a perfectly working SQLite file.

VFS is not needed; they can change the header after writing; VFS was just cleaner to me.

My first work was using application_id, after a while, it did not feel right to me, so I changed it for good. It allows easier future development, fewer problems for file type detection, a decreased chance of mistaken change (you already saw many negative comments on using SQLite as a base), and the support reason: just yesterday I was reading a forum post about people asking for support on software because it was using SQLite. application_id seems like a great choice if you are doing a DB-related task or making a custom DB for transfer on wire, to communicate between internal and semi-public tools. Using it for a format that could potentially get to an innumerable count seemed unwise.


> - All future versions of Pack must handle previous versions and must only write the latest version.

I believe you are making a mistake by preventing Pack from writing archives that are compatible with prior versions.


Thank you for the check.

No index, as they take space, and I wrote the queries considering SQLite automatic indexes. They will be created on demand, at unpacking time. All the unpacking processes are made to read and decompress content just once, so there are no worries about slowdowns.

I suggest trying Pack for yourself and seeing the speed. Or deeper, use `--activate-other-options --transform-to-sqlite3` to transform a Pack file to SQLite3, create your own indexes, and use `--activate-other-options --transform-to-pack` to convert it to Pack and then try unpacking. You will not see any worthy difference.

Yes, Contents are like packages of raw data from a chunk, a whole, or many of the items (files or data). They may be compressed if needed (With Zstandard). ItemContent table helps to find the needed Item parts.

The Content structure circumvents any BLOB limit, but it is also made to give better compression while keeping random access.


Fair point, I can see that indexes are not really necessary.


I guess you are overestimating the "cobble together read/write support without even needing a library."

Let's imagine: You want to read a ZIP file. Will you write your own reader? I seriously doubt it, as the work, stabilising, and security (random memory access as an example) would be issues. But let's think we are couraginous. OK, we read rather not so simple format and carefully read the binary. Now, will you write your own DEFLATE and Huffman coding? Again, a bigger doubt.

I would argue that if someone cares enough to reimplement ZIP, it would at worst be twice as hard to write a Pack reader from scratch with no ZSTD or SQLite. And for those serious people, reading a format that lets them store better and faster would be a prize that is hard to say no to. But I get your point, and if you are in a desert and need something to put together fast before going out of water, tar may be a good choice.


I have written my own zip, deflate, and huffman coding - although the latter two were "just for fun". But I would definitely consider writing ad-hoc zip logic in real software, if I couldn't pull in a library for whatever reason. This isn't just a hypothetical, it happens a lot - there are many independent ZIP implementations in the wild, for better or for worse.

You're right to call out security though, because the multiple implementations cause security issues where they disagree, my favorite example being https://bugzilla.mozilla.org/show_bug.cgi?id=1534483 . Although arguably this is a symptom of ZIP being a poorly thought out file format (too many ambiguous edge-cases), rather than a symptom of it being easy to implement.


You are one of the bravest. And you know that, using SQLite as the base storage, rules out many of the security problems we can face.

Anyone needing to reimplement Pack, can do it, very easily, if not easier than implementing ZIP, IF they use SQLite and Zstandard. Maybe a day of work or less. If they want to rewrite (reading part of) them too, it will be a couple of days of work.


Complaining that all existing tools are old, but I'm looking at the documentation and the what immediately catches my eye is that it doesn't use any modern convention I've gotten used to?

Overwrite with "-w". I've never seen a tool not use "-f"

Not reserving "-h" for help text is also an interesting choice. Makes me think of the mantra "be conservative in what you send out but liberal in what you accept". Per that philosophy, both "--help" and "-h" should be accepted because neither gained a decisive majority in usage and so people might try either. It's not like you'd know what to use because it hasn't told you yet

Forcing use of a long option "--press=N" (for the zstd level setting) is also new/unique terminology for what is usually "-N" (like "-1" to "-9")

(Basic) drop-in compatibility with every other tool from gzip to zstd would also have been nice, but archiving and compression are different things and everything from zip to 7z to tar works in unique ways so this makes enough sense I guess. Still, could have been useful

It's still better than tar or ps, so if it catches on that's still a step forwards in terms of command line standards


Hmmm. I went to the doco hoping for something about the file format :( No doco for me. I guess that would be too old fashioned.

It seems to be "read the code" or nothing - which is fine until they update the code... It's great (probably should be mandatory) to have a reference program, but if they're promoting it as a container format, something along the lines of an RFC would be helpful.


More documentation will be published soon. For now and about CLI: https://pack.ac/cli-documentation


Hey,

The choice of parameters was solely done to be clear, and not what people used to. -f meaning force is not clear; -w meaning overwrite, seemed like a better logical choice, to me.

Nice point on -h. Yes, I did not want to go crazy. After all, almost all (CLI) people use pack as `pack ./test/`. Options are for advanced people like you. Most people will use the OS integration that will be published later on.

--press=hard is the only option there is. There may be more, but with Pack you do not need to choose a level (like 1..9 with ZIP). Just let Pack do its thing, and you will be happy. Hard Press is there for people who want to pack once and unpack many times (like publishing), and it is worth spending extra time on it. Even then, Pack goes the sane way and does not eat your computer just for a kilobyte or two.


> Just let Pack do its thing, and you will be happy

Well, no. Sometimes i want maximum compression while having a lot of CPU and wall clock time. And sometimes being fast is more important than compression level. Also managing server utilization is needed. Level thing is there for a reason


Then `--press=hard` would be the choice for you.


This is not a binary choice, actual level of effort is required. I've seen many times people fine tuning compression levels in all kinds of automation scenarios


Also, zstd covers a lot of ground between super fast compression and good compression ratio

https://raw.githubusercontent.com/facebook/zstd/master/doc/i...


Thank you for the notes. I am well aware of the levels and Pack uses custom configuration to match its inner design. Maybe more level come, or maybe not. But to be clear, Pack supports any valid Zstandard content, and this levels we are discussing are about Pack CLI chosen for better user experience. Any other client can produce and store any valid content for chose level or configuration, and other clients can read it.


There are many different usecases, and each one have different set of requirements. E.g. - for end-user facing cli: support as much conventions as possible (-v --version, -h --help, other options similar to other compressors), sane defaults

- for automatic tasks like making backups via cron: piping, correct exit codes, level of effort configuration, silent modes for reduced logging.

Second one is likely to fly first, can be used isolated on company level if file format is stable


Pascal!? My monocle nearly fell out.


It looks clean and pseudocode-like. It helps readers from around the world with different languages understand. FreePascal compiler is very good too.


Must be in honor of Wirth's passing!


I just do not want to follow Wirth's law.


Unfortunate name:

dnf info pack

    Summary      : Convert code into runnable images
    URL          : https://github.com/buildpacks/pack
    License      : Apache-2.0 and BSD-2-Clause and BSD-3-Clause and ISC and MIT
    Description  : pack is a CLI implementation of the Platform Interface Specification
                 : for Cloud Native Buildpacks.


The web site behaves strangely on mobile and folds the text as I try to scroll around.


Sorry. Site is very new, and custom made, and needs to be worked on mobile.


My suggestion?

If you’re not sure, grab a very basic template from either Hugo, MkDocs, or GitHub pages. They’re all pretty well tested and hard to break.


Yes at least on mobile absolute madness.


Better, faster, stronger but I can't tell from the homepage what's different about it, except that it is based on SQLite and Zstd.


You may like to read https://pack.ac/note/pack and test it for yourself.


I went to do some testing in a sandbox system (as the compilation strategy is unclear, missing builds for some of the artifacts).

I was able to initially construct an archive of the linux tree (that failed in decompression), but subsequently went to rebuild it and the tool is repeatedly producing this output even in an otherwise cleaned up environment:

    Runtime error 203 at $0000000100009B72
    $0000000100009B72
    $00000001000225B6
    $0000000100023153
    $000000010002319A

    Runtime error 203 at $0000000100009B72
    $0000000100009B72
    $00000001000225B6
    $0000000100023153
    $000000010002319A

    Runtime error 203 at $0000000100009B72
    $0000000100009B72
    $00000001000225B6
    $0000000100023153
    $000000010002319A

    Runtime error 203 at $0000000100009B72
    $0000000100009B72
    $00000001000225B6
    $0000000100023153
    $000000010002319A

    Runtime error 203 at $0000000100009B72
    $0000000100009B72
    $00000001000225B6
    $0000000100023153
    $000000010002319A
The first output did work, sadly I deleted it. It was over 400mb, so approximately double the size of a zip or a tar.zst of the same files.

zstd natively compresses a tar of this set to 209mb in 800ms in multi-threaded mode, or 3.5s in single threaded mode.

I suspect that sqlite is being held incorrectly (access from multiple threads, with multi-threading disabled), and the vfs lock forwarding is broken on Windows.


Did you compile it for yourself? Any problem or steps you used, I will be happy to hear, o at pack.ac or GitHub, as it is hard to follow the building here. I should prepare more documents on how to build it. I suspect that there is a problem with the custom build. Error and speed issues are not something you see in the official build.


That was the binary download from the website. You have a build.sh for the Linux binary artifacts, but no equivalent for the windows artifacts so i did not bother preparing a windows build


Pack binary? Can you tell what machine and what steps?

Build.sh can be used for Windows too, using MSYS2 UCRT64.


Windows 11 sandbox, running atop Windows 11. Binary downloaded from your webpage.

Data being packed was an unzipped copy of linux-master.zip fetched from GitHub unpacked with windows zip, selecting skip for the overlapping case files.


What are the parameters you gave to the CLI program? This issue seems interesting, as these files on Windows 11 were tested countless times.

To be clear, you can run Pack as: `pack.exe ./linux-master/`


I ran pack that way, and observed the error I posted


ZStandard is... standardized under rfc 8878

Plus there's no discussion against zstd itself and its container format.


If you're looking for a debate against ZStandard, its hard to argue against it.

ZStandard is Pareto optimal.

For the argument why, I really recommend this investigation.

https://insanity.industries/post/pareto-optimal-compression/


Thanks, superbly written and highly informative article!


If I need to compress stuff, it’s either to move a folder around (to places which may not have a niche compression tool, so ZIP wins), or to archive something long-term, where it can take a while to compress. I don’t see the advantages of this, since the compression output size seems quite mediocre even if it’s supposedly fast (compared to what implementations of the other formats?)


Hell yeah some fpc stuff showing its moves. Devs even put together an lpk to load up -bravo! Look for more of this stuff in the future as companies look for alternatives to corporate commodity programming and dbs tethered to major cloud resources. I have a major FPC effort going on right now that I hope to be able to offer on four platforms: browser, win, linux, and mac.

I wrote a similar general purpose pack way back in the early 90's in TopSpeed Modula-2 and run from the command line. Needed to span multiple disks and self launch. Algorithm was fast, but not nearly the same compression ratios. Wore out Mark Nelson's classic, "The Data Compression Book" along the way.


Hell yeah indeed! You can find me on the forum too if you liked to talk Pascal: https://forum.lazarus.freepascal.org/index.php/topic,66281.0...


How does tar take 4.7s in the benchmark, but Pack takes 1.3s (and 1/7th the size)?

That seems fishy...


Hello to all. I am the author, and I just saw this post and am happy to see this exciting discussion. Let me try to show my respect for it and answer as well as possible.


The website is completely unusable on iPhone and on iPad. It doesn't scroll, jerks back in place, parts are blank and empty... looks really strange.


Thanks for the note, and sorry for the inconveniences. I did not expect this many users in the iOS world. The site is very new and needs custom work; it will be updated soon.


I think generally it's a mobile layout issue. On Firefox for Android, scrolling the page still triggers a click/mouseup/focus event I guess - when you let go of your finger, it toggles the state of the "Note" -> "Pack" section, so it tends to hide itself as you're reading it!

Would just remove that "accordion" functionality completely or make it always expanded on mobile breakpoints or whatever. Or just move that entire "About pack" section to be on the main page "below the fold" as the first thing people are going to want to do is find out *what it is* :)


Welcome Otto!

I know many people including myself are curious on why you wrote this in Pascal?

Also, what are the main reasons Pack is faster than the tools you compare Pack with?


Thank you!

About Pascal: It makes me happy. It looks clean and pseudocode-like. It helps readers from around the world with different languages understand. I am happy that Pack made people curious about this old but great goodie.

Speed, Here are some reasons: - Pack does all the steps of pack or unpack (read, check, compress/decompress and write) together and this weaving helped achieve this speed and random access. It is by far the fastest speed I get to see reading or writing random files from file systems, as fast or faster than asynchronous read operations or OVERLAPPED on Windows. To a point, it is limited to file system. For example, on NTFS, Pack can pack Linux code base in around 1.3 s; similar is done on ext4 in 0.96 s.

- It is based on a heavily optimized code base, standard library, and the FreePascal compiler, which produces great binary.

- Multi core design: even mobiles have multi-core CPUs these days. Choosing threads based on the content and machine, it does not eat your machine.

- Speed-configured SQLite. SQLite is much faster than most people think it is.

- Configured the already rapid Zstandard.

In summary, standing on the shoulders of giants while trying hard to improve reliability, speed and user experience is a sign of respect for them.


How about pure zstd (or tar.zstd) vs pack vs patched 7z+zstd benchmark? Measure container overhead, in both speed and bytes


I answered this question here: https://news.ycombinator.com/item?id=39801083

If that is not enough, let me know.

7-Zip with the ZSTD patch is good too, but Pack is much faster at handling many files.

Testing packing the Linux code base (81K files and 1.25 GB) on Windows with NTFS:

7-Zip + Patched with ZSTD (-m0=Zstd): 6.453 s, 194.9 MB (Creating the header takes too much time)

Pack: 1.3 s, 194.5 MB


Thanks for the update! Somehow 7z container overhead is +0.4MB AND is slower by a lot? Huh. Great numbers, need more exposure for pack format. Also, i suggest to add section to the website that shows these numbers.

1) regular archive formats vs pack

2) various containers with the same zstd inside.


Thank you!

Exposure comes from enthusiasts like you.

I did not want to focus the point on speed, or say, "Look, others are bad". They are great; my point was, "Look what we can do if we update our design and code". Pack value comes from user experience, and speed is being one. I was not following the best speed or compression; I wanted an instantaneous feeling for most files. I wanted a better API, an easier CLI, improved OS integration (soon), and more safety and reliability. Tech people (including me) care so much about speed

I am happy about the results, but Pack offers much more that I like others to see.


Who is behind this? It's a new Github org, the committer (https://github.com/OttoCoddo) has a totally private Github profile. There's no name in "Legal". Sure, one can be anonymous, but I won't download it, don't trust it.


They also mention depending on https://github.com/SCLOrganization, which appears to have single similarly anonymous maintainer.

In fact, SCL has exactly two libraries, SQLite and Zstandard, so presumably it's the same developer https://github.com/SCLOrganization/Libraries


Me.

It is the point: if you trust a project based on "who" made it, my friend, that is the start of the big problem we are facing in this current situation of tech. Just look at the code, build it yourself, and check the license.

Pack is made to be a private option; future locking end encryption options will solidify that. Trusting the author is not the correct way to verify the security and safety of such a tool.


What's the best way to store a bunch of .zip files that all share some data? Assume that I can't decompress or alter the zip files in any way.

Basically I want a shared encoding dictionary. Is there an easy solution for this?

The use case is maintaining an archive of .crx files.


Use precomp or antiz to losslessly pre-transform the zip, and then use solid compression (e.g. tar.xz or zpaq).


There's not an _easy_ way to do this right now.

Your best bet is a lossless transform that undoes the huffman coding in the zip files, converting the compressed streams from effectively uncorrelated bitstreams to largely similar byte streams, and then pass that through a large-window compression algorithm (zstd?).

Similar techniques are used in ChromeOS for delta updates.


Depending on implementation, zip supports Zstd compressed archives with dictionaries.

You could maintain a shared dictionary among all the archives.

If you have a way to post process the archives, you may be able to abuse multipart zip files to split them and reconstitute them later.


Not sure if I understand the question.

Having a shared encoding dictionary is called compression.

If you want one across many zip files that you cannot modify, then compress them into another archive.


Yes, I have a bunch of zips I cannot modify. If I could uncompress them all, and zip them into one big archive, then it would be much smaller than if I put all the compressed zips into one big archive. This is because there's a lot more shared strings between uncompressed files across different zips, than there are shared strings between the compressed zips.

(At least... this is my assumption based on how I understand the formats to work. I do need to measure and verify this.)

So basically I'm wondering if there is some way to tell the "outer" zip to reuse the encoding dictionaries of each smaller zip, or to somehow intelligently merge their encoding dictionaries rather than treating each inner zip like an opaque blob.


No, the ZIP format compresses each embedded file separately, so it doesn't matter how much commonality there is between different files in the same archive.


Wow, really? I always assumed that if I zipped a directory, it would use a shared dictionary to compress all the files. But I guess what you say makes sense, because otherwise it wouldn't be possible to extract just one file.

Are there compression formats that do share an encoding dictionary across multiple files? I guess tar + gzip might do that?


Yeah, if you compress a tar file with gzip (or bzip2 or zstd or whatever) then the compressor doesn't care about the file boundaries, so it will be able to take advantage of redundancies.

However, those compressors generally only have a small context window, so they'll only be able to take advantage of relatively nearby redundancies in the archive. They won't help you if the common substrings are separated by many megabytes of unrelated data. So the order in which you pack the files matters.

In theory, you could do a two-pass approach, where you first scan the entire set of files to create a (relatively small) shared dictionary that's as useful as possible, and then uses that dictionary to compress each file independently. I don't know of any archive format that does that, but you could roll your own using the zstd command line compressor.


Site is broken on mobile. See logo and black screen. Reader mode doesn’t work.


Is anyone using this or sqllite archives for anything at scale? They always seemed like a good solution for certain scientific outputs. But data integrity obviously is a concern.


How does this compare to PAQ https://en.m.wikipedia.org/wiki/PAQ ?


As far as I know, PAQ strives to give the best compression for archival purposes. Pack, on the other hand, tries to give the best compression, while keeping speed as instantaneous as possible.

You should try them for yourself.


Quite a hilarious joke, hope no one takes this seriously


I remember someone had made a package manager / index tool for SQLite extensions. But I haven't ever been able to find it again



I think it was sqlpkg! Thank you. That was bothering me so much!


Any recommendation for a format that supports editing and deleting files without rewriting the whole archive


Most zip implementations allow this.


By rewriting the whole archive though


It’s written in Pascal.. Why not something like Rust or at least C++?


It makes me happy.

It looks clean and pseudocode-like. It helps readers from around the world with different languages understand.


no build instructions, binary artifacts in repo. yuck.


Easy to build using this document: https://pack.ac/source

Each binary has its own build script that you can use for yourself. Binaries are used for static builds and to ease future needs. https://github.com/PackOrganization/Pack/blob/main/Libraries...

I know you don't have a duty to look around for your answer, but you too don't have a duty to say yuck to a project that has been done with a lot of effort. I am ok with your comment, but maybe go easy on the next project.


beautiful website and nice overall aesthetic.


Thank you very much! I guess you are friend with colors ;)


> Pack format is based on the universal and arguably one of the most safe and stable file formats, SQLite3, and compression is based on Zstandard, the leading standard algorithm in the field.

yeah, no thanks. SQlite3 automatically means:

- Single implementation (yes, it's a nice one but still a single dependency)

- No way to write directly to pipe (SQlite requires real on-disk file)

- No standard way to read without getting the whole file first

- No guarantees in number of disk seeks required to open the file (relevant for NFS, sshfs or any other remote filesystem use)

- The archive file might be changed just by opening in read-only mode

- Damaged file recovery is very hard

- Writing is explicitly not protected against several common scenarios, like backup being taken in the middle of file write

- No parallel reads from multiple threads

Look, sqlite3 is great for it's designed purpose (embedded database). But trying to apply for other purposes is often a bad idea.


Often new things are met with an excess of skepticism but I agree here.

I'd take this more seriously if the format was documented at all, but so far it appears to be "this implementation relies on sqlite and zstd therefore it's better", without even a specification of the sql schema, let alone anything else.

The github repo contains precompiled binaries of zstd and sqlite. The sqlite builds appear to have thread support disabled so not only will it be single writer it'll be single reader too.

The schema is missing strictly typed tables, and the implementation appears to lack explicit collation handling for names and content.

The described benchmark appears to involve files with an average size of 16KB. I suspect it was executed on Windows on top of NTFS with an AV package running, which is a pathological case for single threaded use of the POSIXy IO APIs that undoubtedly most of the selected implementations use.

It's slightly odd that it appears to perform better when SQLite is being built with thread safety disabled (https://github.com/PackOrganization/Pack/blob/main/Libraries...) and yet the implementation is inserting in a thread group: https://github.com/PackOrganization/Pack/blob/main/Source/Dr.... I suspect the answer here is that because the implementation is using a thread group to read files and compress chunks, it's amortizing the slow cost of file opens in this benchmark using threading, but is heavily constrained by the sqlite locking - and the compression ratio will take a substantial hit in some cases as a result of the limited range of each compression operation. I suspect that zstd(1) with -T0 would outperform this for speed and compression ratio, and it's already installed on a lot of systems - even Windows 11 gained native support for .zst files recently.

The premise that we could do with something more portable than TAR and with less baggage is somewhat reasonable - we probably could do with a simple, safe format. There are a lot more key considerations to making such a format good, such as many you outline, such as choices around seeking, syncing, incremental updates, compression efficiency, parallelism, etc. There is no single set of trade-offs to cover all cases but it would be possible to make a file format that can be shared among them, while constraining the design somewhat for safety and ease of portability.


Hello, and thank you for the notes. Unfortunately, your points seem to be mostly wrong, so let me clarify them a little. Do not worry; many people misunderstand SQLite and its abilities.

- Single implementation: Sure. Working with SQLite convinced me that nobody cared to reimplement it, as it worked so well that nobody wanted or needed to rework it. I may write an unpacker just to prove that it is not hard at all to read SQLite format. The complicated part is the SQL engine (and many other features that are not used in Pack), and for Pack, you can live without it.

- SQLite does not require a disk. It has a memory option. Pack can have piping and properly will. I did not implement it because, well, it is too new, and I do what I feel is needed first. You can subscribe to the newsletter on the site (https://pack.ac/notes) or follow GitHub.

- Of course you can read SQLite without reading the whole file. It is a database, not a tar file.

- SQLite is highly optimized to read the lowest amount of data, and it has layers of smart caching. There is a reason it is used on almost any device that has a computer on it, even smartwatches.

- Of course the archive is safe from changes in unpacking. It will be opened in read-only mode, guarded by the OS and file system, and Pack also uses code isolation, which prevents calling write on any file.

- There are a lot of tools that help repair a damaged SQLite file. Pack too is guarded with transactions. The file will not get corrupted unless the disk goes corrupt; the mentioned tools come handy then. And in today's world of SSDs, the risk is shrinking rapidly.

- On unpack, Pack reads, decompresses, checks, and writes in a multithreaded. So yes, parallel reading is possible and done in Pack.

- I suggest trying Pack for yourself. It gives you the feeling you need to have to be sure.


I am very curious how you are going to get SQLite working with piping, especially on extract... It's pretty common to do stuff like "curl ... | tar xvf -", so that you can start extraction the moment first kilobyte of data arrives. This really saves a lot of time, as disk + network work in parallel.

A less common tar's feature is packing on compress -- stuff like "ssh remote tar cvf - ... > local-file.tar", which skips temporary file on remote machine, and also saves lots of time in transfer.

But for both of those, sqlite's "memory" won't help you there - memory or not, you still need to have the entire file to read it. So if you just store file contents in the sql database, then you have to fetch everything up to the latest byte before you can get any data out.

Maybe you can have index in sqlite, and append data as-is... but where would you put that index?

if you put it in front (like squashfs), you need to produce entire metadata before writing first data byte.. and that should include compressed sizes too (assuming you want to support random extraction), which means you cannot stream file out until you finish compressing all the data. And also sometimes you will not be able to add files to the archive without rewriting the whole archive (if the index grows and you didn't leave enough padding). This might be OK, but definitely should be mentioned.

If you put it at the end (like zip), you will be able to stream file out during compression, but fast decompression would be impossible. Also, you'll forego any sqlite transitional guarantees - since the database will be created in-memory, and only written at the very end once all the files are written.

So frankly, I don't see how you can win on a streaming front, unless you really have a custom format and "sqlite3" is just a small part of it.

(Another problem is there is not even a short spec - how is sqlite3 used, what is your schema, and so on. And I am sorry, but I am not going to read the source code just to figure this stuff out).


> SQlite requires real on-disk file

You can run SQLite with an In Memory database, I use it quite a lot for unit tests.

https://www.sqlite.org/inmemorydb.html


I think that point is about it not supporting streaming compression where the output of the packer is immediately fed into something like a pipe or a TCP connection.

You can do this with both tar and ZIP. If all you have is SQLite, you need to fully create the local database (be it in a file or in memory) before you can transmit it somewhere else to be stored or unpacked.


Yes, but I think they meant something like this:

pack archive | ssh host2 pack extract

Which is a weird gripe - in this use case, `tar` makes most sense to use. It's not like this format claims to be a full replacement for tar or anything close to that.


That's exactly what they do - they explicitly call out this format as replacement to tar and zip at least:

> Most popular solutions like Zip, gzip, tar, RAR, or 7-Zip are near or more than three decades old. While holding value for such a long time (in the computer world) is testimony to their greatness, the work is far from done.

> Pack tries to continue this path as the next step and proposes the next universal choice in the field.


Oh, for some reason I thought they promote it as archival container for situations with a lot of files.


Yes, I am. Pack can hold millions of files with no problem. One field in which it shines is that, aside from being fast at processing large amounts of data, it can process many small files much faster than similar tools or even many popular file systems.

About piping, it can be done, and it is on my list. I will finish features based on their popularity and making sense.

As Pack has random access support, you can choose a file in a big pack, and it can stream it out to the output. It is already able to unpack partially to your file system (using --include="file path in pack"); streaming/piping it would not be a problem.


> No parallel reads from multiple threads

Sqlite supports parallel reads from multiple threads.

It even supports parallel reads and writes from multiple threads.

What it doesn't really support is parallel reads and writes from multiple processes.


Not when threading is disabled, as it is in this project.


It is not disabled, as you think. It is secured by the internal code in Pack. Almost all parts of Pack are multithreaded.


it’s not safe to disable SQLite’s thread safety as you do here: https://github.com/PackOrganization/Pack/blob/main/Libraries... and then do your own locking. You attempt to pass the flag at open time to enable serialized mode, however quoting the SQLite docs for the build flag you set:

  Note that when SQLite is compiled with SQLITE_THREADSAFE=0, the code to make SQLite threadsafe is omitted from the build. When this occurs, it is impossible to change the threading mode at start-time or run-time.

SQLite’s APIs are often hazardous in these ways, it really should be erroring rather than silently ignoring the fullmutex flag, but alas.


Sure, but single threading isn't an inherent part of sqlite as OP implies.

> SQlite3 automatically means:

...

> - No parallel reads from multiple threads


Ad 2: SQLite has an in-memory DB option.


Then you cannot transfer through a pipe something that's too big to fit in memory.


Pack may not be it, but it would be nice if Tar would go the way of the dodo. It has all the flaws that you mentioned (and more!).


It does not in fact have most of the mentioned flaws. It's pipeable, immutable, continuous, trivial to repair, safe for append-only writing.


It literally has none of the issues mentioned though. Not that it doesn’t have limitations but those listed aren’t them.


Website is quite annoying to use on mobile. Scrolling behaviors get interpreted as taps which close the container you're reading.


Also if you dare try to naturally scroll up after opening a container it's interpreted as a refresh as it redraws. Might be an awesome format but web design fail negates it entirely.


Had the same issue, plus I think one of my Safari extensions was hiding a cookie banner so half the page was just “dark”. A more mobile friendly view is absolutely needed.


Nah, there's no cookie banner. It's just a strange/innovative design.


Yes, it's horrendous - I thought I was intoxicated


tldr: `cp * sqlite3://output.db` (basically?)

Really seems to make sense! For another fun compression trick: https://github.com/mxmlnkn/ratarmount


is this as widely compatible as 7zip?


Of course not! There's not even a version for macOS as far as I can tell. No one using it. This is a proposal.


ah, okay. the opening paragraphs don't read like an RFC




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: