Exactly, it's not too hard to implement in C. The one I made never copied data, ...

Aurornis · on Nov 5, 2023

The linked article is a GopherCon talk.

The first line of the article explains the context of the talk:

> This talk is a case study of designing an efficient Go package.

The target audience and context are clearly Go developers. Some of these comments are focusing too much on the headline without addressing the actual article.

hgs3 · on Nov 5, 2023

Yup and if your implementation uses a hashmap for object key -> value lookup, then I recommend allocating the hashmap after parsing the object not during to avoid continually resizing the hashmap. You can implement this by using an intrusive linked list to track your key/value JSON nodes until the time comes to allocate the hashmap. Basically when parsing an object 1. use a counter 'N' to track the number of keys, 2. link the JSON nodes representing key/value pairs into an intrusive linked list, 3. after parsing the object use 'N' to allocate a perfectly sized hashmap in one go. You can then iterate over the linked list of JSON key/value pair nodes adding them to the hashmap. You can use this same trick when parsing JSON arrays to avoid continually resizing a backing array. Alternatively, never allocate a backing array and instead use the linked list to implement an iterator.

compressedgas · on Nov 5, 2023

Like how https://github.com/zserge/jsmn works. I thought it would be neat to have such as parser for https://github.com/vshymanskyy/muon

cryo · on Nov 5, 2023

I've made a JSON parser which works like this too. No dynamic memory allocations, similar to the JSMN parser but stricter to the specification.

Always nice to be in control over memory :)

https://sr.ht/~cryo/cj

duped · on Nov 5, 2023

> The user only had to Memory Map the file (or equivalent)

Having done this myself, it's a massive cheat code because your bottleneck is almost always i/o and memory mapped i/o is orders of magnitude faster than sequential calls to read().

But that said it's not always appropriate. You can have gigabytes of JSON to parse, and the JSON might be available over the network, and your service might be running on a small node with limited memory. Memory mapping here adds quite a lot of latency and cost to the system. A very fast streaming JSON decoder is the move here.

vlovich123 · on Nov 5, 2023

> memory mapped i/o is orders of magnitude faster than sequential calls to read()

That’s not something I’ve generally seen. Any source for this claim?

> You can have gigabytes of JSON to parse, and the JSON might be available over the network, and your service might be running on a small node with limited memory. Memory mapping here adds quite a lot of latency and cost to the system

Why does mmap add latency? I would think that mmap adds more latency for small documents because the cost of doing the mmap is high (cross CPU TLB shoot down to modify the page table) and there’s no chance to amortize. Relatedly, there’s minimal to no relation between SAX vs DOM style parsing and mmap - you can use either with mmap. If you’re not aware, you do have some knobs with mmap to hint to the OS how it’s going to be used although it’s very unwieldy to configure it to work well.

duped · on Nov 5, 2023

Experience? Last time I made that optimization it was 100x faster, ballpark. I don't feel like benchmarking it right now, try yourself.

The latency comes from the fact you need to have the whole file. The use case I'm talking about is a JSON document you need to pull off the network because it doesn't exist on disk, might not fit there, and might not fit in memory.

vlovich123 · on Nov 5, 2023

> Experience? Last time I made that optimization it was 100x faster, ballpark. I don't feel like benchmarking it right now, try yourself.

I have. Many times. There's definitely not a 100x difference given that normal file I/O can easily saturate NVMe throughput. I'm sure it's possible to build a repro showing a 100x difference, but you have to be doing something intentionally to cause that (e.g. using a very small read buffer so that you're doing enough syscalls that it shows up in a profile).

> The latency comes from the fact you need to have the whole file

That's a whole other matter. But again, if you're pulling it off the network, you usually can't mmap it anyway unless you're using a remote-mounted filesystem (which will add more overhead than mmap vs buffered I/O).

duped · on Nov 5, 2023

I think you misunderstood my point, which was to highlight exactly when mmap won't work....

ben-schaaf · on Nov 5, 2023

In my experience mmap is at best 50% faster compared to good pread usage on Linux and MacOS.

eska · on Nov 5, 2023

I also really like this paradigm. It’s just that in old crusty null-terminated C style this is really awkward because the input data must be copied or modified. But it’s not an issue when using slices (length and pointer). Unfortunately most of the C standard library and many operating system APIs expect that.

I’ve seen this referred to as a pull parser in a Rust library? (https://github.com/raphlinus/pulldown-cmark)

mr_mitm · on Nov 5, 2023

Did you publish the code somewhere? I'd be interested in reading it.