99% of the time is still super frustrating when it fails, if you're using it in a consumer facing app. You have to clean up the output to avoid getting an error. If it goes from 99% to 100% JSON that is a big deal for me, much simpler.
If you're building an app based on LLMs that expects higher than 99% correctness from it, you are bound to fail. Negative scenarios workarounds and retries are mandatory.
Honestly, I suspect asking GPT-4 to fix your JSON (in a new chat) is a good drunken JSON parser. We are only scraping the surface of what's possible with LLMs. If Token generation was free and instant we could come up with a giant schema of interacting model calls that generates 10 suggestions, iterates over them, ranks them and picks the best one, as silly as it sounds.
It shouldn't be surprising though. If a human makes an error parsing JSON, what do you do? You make them look over it again. Unless their intelligence is the bottleneck they might just be able to fix it.
I already do this today to create domain-specific knowledge focused prompts and then have them iterate back and forth and a ‘moderator’ that chooses what goes in and what doesn’t.