Exactly, it's not too hard to implement in C. The one I made never copied data, instead saved the pointer/length to the data.
The user only had to Memory Map the file (or equivalent), pass that data into the parse.
Only memory allocation was for the Jason nodes.
This way they only paid the parsing tax (decoding doubles, etc..) if the user used that data.
The first line of the article explains the context of the talk:
> This talk is a case study of designing an efficient Go package.
The target audience and context are clearly Go developers. Some of these comments are focusing too much on the headline without addressing the actual article.
Yup and if your implementation uses a hashmap for object key -> value lookup, then I recommend allocating the hashmap after parsing the object not during to avoid continually resizing the hashmap. You can implement this by using an intrusive linked list to track your key/value JSON nodes until the time comes to allocate the hashmap. Basically when parsing an object 1. use a counter 'N' to track the number of keys, 2. link the JSON nodes representing key/value pairs into an intrusive linked list, 3. after parsing the object use 'N' to allocate a perfectly sized hashmap in one go. You can then iterate over the linked list of JSON key/value pair nodes adding them to the hashmap. You can use this same trick when parsing JSON arrays to avoid continually resizing a backing array. Alternatively, never allocate a backing array and instead use the linked list to implement an iterator.
> The user only had to Memory Map the file (or equivalent)
Having done this myself, it's a massive cheat code because your bottleneck is almost always i/o and memory mapped i/o is orders of magnitude faster than sequential calls to read().
But that said it's not always appropriate. You can have gigabytes of JSON to parse, and the JSON might be available over the network, and your service might be running on a small node with limited memory. Memory mapping here adds quite a lot of latency and cost to the system. A very fast streaming JSON decoder is the move here.
> memory mapped i/o is orders of magnitude faster than sequential calls to read()
That’s not something I’ve generally seen. Any source for this claim?
> You can have gigabytes of JSON to parse, and the JSON might be available over the network, and your service might be running on a small node with limited memory. Memory mapping here adds quite a lot of latency and cost to the system
Why does mmap add latency? I would think that mmap adds more latency for small documents because the cost of doing the mmap is high (cross CPU TLB shoot down to modify the page table) and there’s no chance to amortize. Relatedly, there’s minimal to no relation between SAX vs DOM style parsing and mmap - you can use either with mmap. If you’re not aware, you do have some knobs with mmap to hint to the OS how it’s going to be used although it’s very unwieldy to configure it to work well.
Experience? Last time I made that optimization it was 100x faster, ballpark. I don't feel like benchmarking it right now, try yourself.
The latency comes from the fact you need to have the whole file. The use case I'm talking about is a JSON document you need to pull off the network because it doesn't exist on disk, might not fit there, and might not fit in memory.
> Experience? Last time I made that optimization it was 100x faster, ballpark. I don't feel like benchmarking it right now, try yourself.
I have. Many times. There's definitely not a 100x difference given that normal file I/O can easily saturate NVMe throughput. I'm sure it's possible to build a repro showing a 100x difference, but you have to be doing something intentionally to cause that (e.g. using a very small read buffer so that you're doing enough syscalls that it shows up in a profile).
> The latency comes from the fact you need to have the whole file
That's a whole other matter. But again, if you're pulling it off the network, you usually can't mmap it anyway unless you're using a remote-mounted filesystem (which will add more overhead than mmap vs buffered I/O).
I also really like this paradigm. It’s just that in old crusty null-terminated C style this is really awkward because the input data must be copied or modified. But it’s not an issue when using slices (length and pointer). Unfortunately most of the C standard library and many operating system APIs expect that.
This way they only paid the parsing tax (decoding doubles, etc..) if the user used that data.
You hit the nail on the head