The walkthrough is very nice, how to do this if you're going to do it. If you're...

vjerancrnjak · on Nov 5, 2023

If your JSON always looks the same you can also do better than general JSON parsers.

lylejantzi3rd · on Nov 5, 2023

Andreas Fredriksson demonstrates exactly that in this video: https://vimeo.com/644068002

kaladin_1 · on Nov 5, 2023

I really enjoyed this video even though he lost me with the SIMD code.

gnuvince · on Nov 6, 2023

I like this video because there's a lot of a good actionable advice before he gets into SIMD code.

ykonstant · on Nov 5, 2023

Very enjoyable video!

loeg · on Nov 5, 2023

You might also move to something other than JSON if parsing it is a significant part of your workload.

haswell · on Nov 5, 2023

Most of the times I’ve had to deal with JSON performance issues, it involved a 3rd party API and JSON was the only option.

If you’re building something net-new and know you’ll have these problems out the gate, something other than JSON might be feasible, but the moment some other system not in the closed loop needs to work with the data, you’re back to JSON and any associated perf issues.

diarrhea · on Nov 5, 2023

I wonder: can fast, special-case JSON parsers be dynamically autogenerated from JSON Schemas?

Perhaps some macro-ridden Rust monstrosity that spits out specialised parsers at compile time, dynamically…

minhazm · on Nov 5, 2023

For json schema specifically there are some tools like go-jsonschema[1] but I've never used them personally. But you can use something like ffjson[2] in go to generate a static serialize/deserialize function based on a struct definition.

[1] https://github.com/omissis/go-jsonschema [2] https://github.com/pquerna/ffjson

atombender · on Nov 5, 2023

Hey, go-jsonschema is my project. (Someone else just took over maintaining it, though.) It still relies on the standard Go parser; all it does it generate structs with the right types and tags.

marginalia_nu · on Nov 5, 2023

A fundamental problem with JSON parsing is that it has variable length fields that don't encode their length, in a streaming scenario you basically need to keep resizing your buffer until the data fits. If the data is on disk and not streaming you may get away with reading ahead to find the end of the field first, but that's also not particularly fast.

Schemas can't fix that.

Groxx · on Nov 7, 2023

Why couldn't they? Schemas can allow you to have that as part of your schema. E.g. JSON Schema lets you define max and min lengths on variable-sized things. You can avoid all dynamic resizing if you're careful enough.

I'll definitely agree that most things won't fully take advantage of that even if you provide that information, but it is definitely possible to do so.

marginalia_nu · on Nov 8, 2023

Unless you have fixed field lengths, you're still doing twice the work either scanning or resizing the buffer (or over-allocating memory I guess).

That said, JSON is designed for human readability above performance, so it's a design concession that makes sense. What doesn't make sense is using JSON anywhere performance matters.

robocat · on Nov 5, 2023

Only if you are using pointers/slices into the buffer as an optimisation.

Otherwise there is no need to keep a buffer of anything after it has been parsed.

marginalia_nu · on Nov 6, 2023

I'm talking about during parsing.

Let's assume I send you a JSON object that is one very long string and nothing else. It's e.g. 1 GB in size. To know you need to allocate a 1GB buffer, you need to first scan it, and then copy it; or keep reallocating the same buffer until it fits.

It's an absurd case, but shorter strings face similar overhead.

galangalalgol · on Nov 5, 2023

Doesn't the serde crate's json support do precisely this? It generates structs that have optional in all the right places and with all the right types anyway. Seems like the llvm optimiser can probably do something useful with that even if the serde feature isn't using apriori knowledge out of the schema.

dleeftink · on Nov 5, 2023

Somewhat tangentially related, Fabian Iwand posted this regex prefix tree visualiser/generator last week [0], which may offer some inspiration for prototyping auto generated schemas.

dleeftink · on Nov 6, 2023

Forgot to include the link:

[0]: https://observablehq.com/@mootari/regex-from-strings

atombender · on Nov 5, 2023

You forgot to include the link?

mhh__ · on Nov 5, 2023

It's relatively common in D application to use the compile time capabilities to generator a parser at compile time

PartiallyTyped · on Nov 5, 2023

Pydantic does that to some extend I think.

fooster · on Nov 5, 2023

Last time I compared the performance of various json parsers the simd one turned out to be disappointingly slow.

Thaxll · on Nov 5, 2023

The fastest json lib in Go is the one done by the company behind Tiktok.

rockinghigh · on Nov 5, 2023

https://github.com/bytedance/sonic

ken47 · on Nov 5, 2023

Fastest at what?

cannonpalms · on Nov 5, 2023

> For all sizes of json and all scenarios of usage, Sonic performs best.

The repository has benchmarks

mananaysiempre · on Nov 5, 2023

I’m not seeing simdjson in them though? I must be missing something because the Go port of it is explicitly mentioned in the motivation[1] (not the real thing, though).

[1] https://github.com/bytedance/sonic/blob/main/docs/INTRODUCTI...

pizzafeelsright · on Nov 5, 2023

Excellent treat vector.

pizzafeelsright · on Nov 5, 2023

Excellent treat vector.

lionkor · on Nov 5, 2023

simdjson has not been the fastest for a long long time

jzwinck · on Nov 5, 2023

What is faster? According to https://github.com/kostya/benchmarks#json nothing is.