People will keep laughing at me when I tell them to just produce a binary, pre-p...

dom0 · on May 26, 2017

The fastest file format is always the one you can just dump into memory and fix a few pointers. But these tend to be also the hardest formats to change/evolve...

Densely coded formats, such as msgpack, are neat for archival / long term storage, but their coding means that the unpackers are quite complex and not very fast (perhaps drawing some lessons from parallel-length x86 decoding could help implementations?); they tend to get slower as the data structure becomes more complex. Since the output format of these are cumbersome, applications will copy everything yet another time. In case we're not using C/C++/Rust, these will also mean a lot of allocator activity. Plus, data validation is separate from format validation, which means at least another pass over the structure.

So purely from a performance perspective these formats are not that good (despite all their claims). Designing purpose-built formats is obviously superior, but also takes quite a bit of time, and means more parsers to test (and fuzz). Stuff like Capnproto might be an adequate middle ground, I'm not sure, never worked with it / I don't know much about how they realized it (I assume custom accessor code that mostly does nothing on x86/ARM).

nly · on May 26, 2017

> The fastest file format is always the one you can just dump into memory and fix a few pointers. But these tend to be also the hardest formats to change/evolve...

Flatbuffers is kind of like this. With untrusted data you just have to run the validation routine (that the code generator spits out) to check that all the internal offsets are closed within your buffer.

https://google.github.io/flatbuffers/

falcolas · on May 26, 2017

> I've seen several articles about how webpages are bloated, but I don't see any real proposition on how to fix it.

Amp - Google = Fast webpages anyone can host.

Webpages are slow because of either fetch time (which can be addressed by not sending more data than needed) or processing time (which can be addressed by minimizing your time spent in Javascript).

Pure HTML/CSS pages are always going to be fast. Just look at HN for an example. It's when you are asking users to download huge images/movies, or when Javascript is used to do all of the DOM creation and fetching of assets, that things start to slow down.

vanni · on May 26, 2017

https://capnproto.org/

Tempest1981 · on May 27, 2017

I was about to suggest Protobuf, but it looks like it has evolved into this, and/or FlatBuffers.

https://developers.google.com/protocol-buffers