Looks pretty good! Even though I've written far too many JSON parsers already in...

coldtea · on Nov 5, 2023

What line of work are you in that you've "written far too many JSON parsers already" in your career?!!!

jchw · on Nov 5, 2023

Reasons differ. C++ is a really hard place to be. It's gotten better, but if you can't tolerate exceptions, need code that is as-obviously-memory-safe-as-possible, can parse incrementally (think SAX style), off-the-shelf options like jsoncpp may not fit the bill.

Handling large documents is indeed another big one. It sort-of fits in the same category as being able to parse incrementally. That said, Go has a JSON scanner you can sort of use for incremental parsing, but in practice I've found it to be a lot slower, so for large documents it's a problem.

I've done a couple in hobby projects too. One time I did a partial one in Win32-style C89 because I wanted one that didn't depend on libc.

beached_whale · on Nov 5, 2023

The large documents are often fixed by using mmap/virtualalloc of the file, but Boost.JSON has a streaming mode and is reasonably fast and the license is good for pulling into anything. It's not the fastest, but faster than rapid with the interface of nlohmann JSON. For most tasks, it does seem that most of hte libraries taking a JSON document approach are wasting a lot of time/memory to get to the point that we want normal data structures, not a JSON document tree. If we pull that out and parse straight to the data structures there is a lot of win in performance and memory with less/no code, just mappings. That's how I approached it at least.

c-smile · on Nov 6, 2023

> that most of hte libraries taking a JSON document approach are wasting a lot of time/memory

I agree. That's the same situation as with XML/HTML. In many cases you don't really need to build a DOM or JSOM in memory. If your task is about deserializing some native structures.

This XML scanner of mine does not allocate any memory at all while parsing HTML/XML: https://www.codeproject.com/Articles/14076/Fast-and-Compact-...

It is even simpler than SAX parser.

beached_whale · on Nov 6, 2023

For the interesting JSON of a significant size, an interator/range interface that parses to concrete types works really well. Usually they are large arrays or JSONL like things

CyberDildonics · on Nov 6, 2023

If you have files that are large enough that json is a problem, why use json in the first place? Why not use a binary format that will be more compact and easier to memory map?

httgp · on Nov 6, 2023

Chances are they can’t control that; they’re perhaps provided by a vendor.

crabbone · on Nov 5, 2023

I've written JSON parsers because in one instance we had to allow users to keep their formatting but also edit documents programmatically. At the time I couldn't find parsers that did that, but it was a while back.

In another instance, it was easier to parse into some application-specific structures, skipping the whole intermediate generic step (for performance reasons).

With JSON it's easier to convince your boss that you can actually write such a parser because the language is relatively simple (if you overlook botched definitions of basically every element...) So, for example, if the application that uses JSON is completely under your control, you may take advantage of stupid decisions made by JSON authors to simplify many things. More concretely, you can decide that there will never be more than X digits in numbers. That you will never use "null". Or that you will always put elements of the same type into "lists". Or that you will never repeat keys in "hash tables".

marcosdumay · on Nov 5, 2023

I've seen "somebody doesn't agree with the standard and we must support it" way too many times, and I've written JSON parsers because of this. (And, of course, it's easy to get some difference with the JSON standard.)

I've had problems with handling streams like the OP on basically every programing language and data-encoding language pair that I've tried. It looks like nobody ever thinks about it (I do use chunking any time I can, but some times you can't).

There are probably lots and lots of reasons to write your own parser.

jbiggley · on Nov 5, 2023

This reminds me of my favourite quote about standards.

>The wonderful thing about standards is that there are so many of them to choose from.

And, keeping with the theme, this quote may be from Grace Hopper, Andrew Tanenbaum, Patricia Seybold or Ken Olsen.

craigching · on Nov 5, 2023

Probably anywhere that requires parsing large JSON documents. Off the shelf JSON parsers are notoriously slow on large JSON documents.

beached_whale · on Nov 5, 2023

There are several that are into the GB/s of performance with various interfaces. Most are just trash for large documents and sit in the allocators far too long, but that's not required either

zlg_codes · on Nov 6, 2023

What on Earth are you storing in JSON that this sort of performance issue becomes an issue?

How big is 'large' here?

I built a simple CRUD inventory program to keep track of one's gaming backlog and progress, and the dumped JSON of my entire 500+ game statuses is under 60kB and can be imported in under a second on decade-old hardware.

I'm having difficulty picturing a JSON dataset big enough to slow down modern hardware. Maybe Gentoo's portage tree if it were JSON encoded?

EMM_386 · on Nov 6, 2023

> What on Earth are you storing in JSON that this sort of performance issue becomes an issue?

I've been in the industry for a while. I've probably left more than one client site muttering "I've seen some things ...".

If it can be done, it will be done. And often in a way that shouldn't have even been considered at all.

Many times, "it works" is all that is needed. Not exactly the pinnacle of software design. But hey, it does indeed "work"!

pbrumm · on Nov 9, 2023

Insurance price transparency can have 16gb of compressed JSON that represents a single object.

Here is the anthem page. The toc link is 16gb

https://www.anthem.com/machine-readable-file/search/

They are complying with the mandate. But not optimizing for the parsers

Edwinr95 · on Nov 6, 2023

I've seen people dump and share entire databases in JSON format at my job....

nly · on Nov 6, 2023

I've seen tens of millions of market data events from a single day of trading encoded in JSON and used in various post-trade pipelines.

zlg_codes · on Nov 7, 2023

Ah, that's a dataset with a size certainly intimidating, and in an environment where performance means money. Thanks for pointing that out!

craigching · on Nov 8, 2023

In my case, sentry events that represent crash logs for Adobe Digital Video applications. I’m trying to remember off the top of my head, but I think it was in the gigabytes for a single event.

mxmlnkn · on Nov 6, 2023

Chrome trace format files also use JSON and can also become large and are a pain to work with.

ahoka · on Nov 5, 2023

Not necessarily, for example Newtonsoft is fine with multiple hundreds of megabyes if you use it correctly. But of course depends on how large we are talking about.

lgas · on Nov 5, 2023

Someone misunderstood the JSONParserFactory somewhere along the line.