Tangentially related - what's a good option for a cross-platform (portable to all platforms with a filesystem) read-only virtual file-system today like e.g. quake's pk3 file format ? e.g. let's say I want to access a few ten of thousand small files fast, much faster than what e.g. NTFS allows since I know that I'll likely have to read more-or-less all the files and I can mmap the whole thing, what are my options? My prime concern is having an api such as
ZIP is the closest to an "industry standard" portable filesystem. It's directly comparable to Quake's PK3 format because that's all that PK3 was, a ZIP with a custom file extension.
It's also what "powers" a wide range of portable filesystem in a single file tools such as DOCX and ODT and quite a few other modern Office and Office-adjacent file formats.
Would a RAM disk fit the bill? Just read all the contents from the copy in non-volatile storage at boot. Cross-platform then by virtue of using what-ever RAM-based filesystem or block-device options are commonly available on the target OS.
For “as fast as possible” you'll need to experiment and benchmark with your workload. Which filesystem is optimal may depend on how you are laying out the data and where the latency/throughput sensitivities are in your use case and the given filesystems.
> My prime concern is having an api such as...
Having a different API other than it looking like a filesystem would make cross-platform more of a concern, as you then have a data access library not a general filesystem. It will likely to be necessary for best performance though: any filesystem is going to have significant overheads (orders of magnitude) compared to being able to map chunks of the data directly into your process' address space.
If abandoning a generic filesystem, perhaps something like sqlite with an in-memory table/db (https://www.sqlite.org/inmemorydb.html)? Again like the ramdisk option just load up the content from permanent storage on first use.
The distinction I'd make us that a filesystem provides a common generic API that practically all processes on the OS understand and share access to. Pretty much always implemented out-of-process (in the kernel or another userland process via kernel stubs/hooks like FUSE).
A data access library is usually much more specific to a particular data set or set of applications, and likely doesn't follow the filesystem abstraction (at least not in the same way).