Hacker News new | past | comments | ask | show | jobs | submit login

The fastest file format is always the one you can just dump into memory and fix a few pointers. But these tend to be also the hardest formats to change/evolve...

Densely coded formats, such as msgpack, are neat for archival / long term storage, but their coding means that the unpackers are quite complex and not very fast (perhaps drawing some lessons from parallel-length x86 decoding could help implementations?); they tend to get slower as the data structure becomes more complex. Since the output format of these are cumbersome, applications will copy everything yet another time. In case we're not using C/C++/Rust, these will also mean a lot of allocator activity. Plus, data validation is separate from format validation, which means at least another pass over the structure.

So purely from a performance perspective these formats are not that good (despite all their claims). Designing purpose-built formats is obviously superior, but also takes quite a bit of time, and means more parsers to test (and fuzz). Stuff like Capnproto might be an adequate middle ground, I'm not sure, never worked with it / I don't know much about how they realized it (I assume custom accessor code that mostly does nothing on x86/ARM).




> The fastest file format is always the one you can just dump into memory and fix a few pointers. But these tend to be also the hardest formats to change/evolve...

Flatbuffers is kind of like this. With untrusted data you just have to run the validation routine (that the code generator spits out) to check that all the internal offsets are closed within your buffer.

https://google.github.io/flatbuffers/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: