I love byroot's work. I'm always surpised not only by the kind of contributions he does but the sheer size of how much he does, insane productivity.
Wish he would write more often, I've tried to get into ruby-core type of work more than once, but never found something that matched my skills so I could positively contribute and after a few weeks of no results the motivation would wear off, as it's really difficult to have the context he has shared in the article, for example.
If more Ruby C people would write more often, I bet there'd be more people with the skills that are needed to improve Ruby further.
The C profiler advice was great. Maybe I could just get a Ruby gem with C code and start playing again on optimizations :-)
He's insanely productive, but also insanely smart. I used to work in the same office as him at Shopify, and he's the kind of person whose level just seems unattainable.
Fully agree. He's also really patient and kind. He's the type of person who's really really smart and doesn't make you feel really really dumb. He takes the time to thoroughly explain things, without being condescending. I always enjoyed when our paths crossed because I knew I was going to learn something.
One thing that may be worth mentioning is the rails default use of jbuilder. I know jbuilder isn't JSON serialization bit, but if we're talking about things that make rendering JSON in ruby/rails slow, jbuilder would be top of my list. If you are rendering many partials with jbuilder, things really slow down.
Love the write-up on this topic, very easy to follow and makes me want to benchmark and optimize some of my ruby code now. Thanks for putting in the effort and also writing the post, byroot!
Possibly I've missed it, but is there anything saying how long the new version takes to parse/encode the Twitter JSON dump with all optimisations applied?
Oj has an extremely large API that I have no intent on emulating in the default json gem, things such as "SAJ" (SAX style parsing), various escaping schemes etc.
My goal is only to make it unnecessary for the 95% or so use case, so yes, Oj will remain useful to some people for a bunch of uses cases.
Sax style parsing is a godsend when dealing with large files, regardless of json or xml. It's indeed what made me switch to a different json library in a Ruby project of mine (I'd have to look it up, but probably to oj).
I really thought this pure ruby implementation was neat, though I never actually used it in production. It's been long since abandoned.
I'm curious in general about the status of pure ruby implementations. It looks like they killed json_pure? If so, that's a bummer. Does anyone know the details? By far the most interesting part of the posts to me are the Ruby optimizations not so much the C ones.
Very fun read! I’m curious though, when it comes to non-Ruby-specific optimizations, like the lookup table for escape characters, why not instead leverage an existing library like simdjson that’s already doing this sort of thing?
In short, since `ruby/json` ships with Ruby, it has to be compatible with its constraints, which today means plain c99, and no c++. There would also probably be a licensing issue with simdjson (Apache 2), but not sure.
Overall there's a bunch of really nice c++ libraries I'd love to use, like dragonbox, but just can't.
Another thing is that last time I checked, simdjson only provided a parser, the ruby/json gem does both parsing and encoding so it would only help on half the problem space.
The benefit of a Ruby-specific JSON parser is that the parser can directly output Ruby objects. Generic C JSON parsers generally have their own data model, so instead of just parsing JSON text into Ruby objects you'd be parsing JSON text into an intermediate data structure and then walk that to generate Ruby objects. That'd necessarily use more memory, and it'd probably be slower too unless the parser is way faster.
Same applies to generating JSON: you'd have to first walk the Ruby object graph to build a yyjson JSON tree, then hand that over to yyjson.
All of the would be a big savings in code complexity and a win for reliability, compared to doing new untested optimizations. If memory usage is a concern, I’m sure there’s a fast C SAX parser out there (or maybe one within yyjson)
I don't understand what you're getting at. If performance is a concern, integrating a different parser written in C isn't desirable, as it would probably be slower than the existing parser for the reasons I mentioned (or at least be severely slowed down by the conversion step), so you need to optimize the Ruby-specific parser. If performance isn't a concern, keeping the old, battle-tested Ruby parser unmodified would surely be better for reliability than trying to integrate yyjson.
What I love about this article is it's actual engineering work on an existing code base. It doesn't seek to just replace things or swap libraries in an effort to be marginally faster. It digs into the actual code and seeks to genuinely improve it not only for speed but for efficiency. This simply does not get done enough in modern projects.
I wonder if it was done more regularly would we even end up with libraries like simdjson or oj in the first place? The problem domain simply isn't _that_ hard.
Bear in mind that: the author is part of the ruby core team; json is a standard lib gem; the repo from the json gem was in the original author namespace; the repo had no activity for more than a year, despite several quality MRs.
It took some time to track and get the original author to migrate it to the ruby team namespace.
While I'm glad they to all this trouble, there's only a few who could pull this off. Everyone else would flock to or build a narrative.
The `json` gem is implemented in C, so it's a black box for YJIT (the reference implementation's JIT).
The TruffleRuby JIT used to interpret C extensions with sulong so it could JIT across languages barrier, but AFAIK they recently stopped doing that because of various compatibility issues.
Also on TruffleRuby the JSON parser is implemented in C, but the encoder is in pure Ruby [0]
Sorry about misuse of “intrinsics”. There is a simdjson library that uses SIMD instructions for speed. Would such an approach be feasible in the ruby json library?
TL;DR; it's possible, but lots of work, and not that huge of a gain in the context of a Ruby JSON parser.
`ruby/json` doesn't use explicit SIMD instructions, some routines are written in a way that somewhat expects compilers to be able to auto-vectorize, but it's never a given.
In theory using SIMD would be possible as proven by SIMDjson, but it's very (edit) UNlikely we'll do it because of multiple reasons.
First for portability, we have to stick with raw C99, no C++ allowed, so that prevent using SIMDjson outright.
In theory, we could implement the same sort of logic with support for various processors that have various level of SIMD support and have runtime dispatch for it would be terribly tedious. So it's not a reasonable amount of complexity for the amount of time I and other people are willing to spend on the library.
Then there's the fact that it wouldn't do as big as a difference as you'd think. I do happen to have made some bindings for simdjson in https://github.com/Shopify/heap-profiler, because I had an use case for parsing gigabytes of JSON, and it helps quite a bit there.
But I'll hopefully touch on that in a future blog post, the actual JSON parsing part is entirely dwarfed by the work needed to build the resulting Ruby objects tree.
My naive/clueless mind always wonders if it wouldn't make sense to make a new class of Ruby objects that are much simpler and would yield both less memory consumption and GC optimizations that could be used for such cases.
Without a different object model it's hard to imagine optimizations that could greatly improve Ruby execution speed for CRuby, or make the GC much faster (huge issue for big applications), but maybe it's because I don't know much :-)
“Starting with the Redwood Cove microarchitecture, if the predictor has no stored information about a branch, the branch has the Intel SSE2 branch taken hint (i.e., instruction prefix 3EH), When the codec decodes the branch, it flips the branch’s prediction from not-taken to taken. It then flushes the pipeline in front of it and steers this pipeline to fetch the taken path of the branch.
...
The hint is only used when the predictor does not have stored information about the branch. To avoid code bloat and reducing the instruction fetch bandwidth, don’t add the hint to a branch in hot code—for example, a branch inside a loop with a high iteration count—because the predictor will likely have stored information about that branch. Ideally, the hint should only be added to infrequently executed branches that are mostly taken, but identifying those branches may be difficult. Compilers are advised to add the hints as part of profile-guided optimization, where the one-sided execution path cannot be laid out as a fall-through. The Redwood Cove microarchitecture introduces new performance monitoring events to guide hint placement.”
> Yet another patch in Mame’s PR was to use one of my favorite performance tricks, what’s called a “lookup table”.
One thing stood out to me here as a fellow lookup-table-liker that I would like to mention even though it is probably not relevant to a generic JSON generator/parser which has to handle arbitrary String Encodings.
The example optimized code uses `String#each_char` which incurs an extra object allocation for each iteration compared to `String#each_codepoint` which works with Immediates. If you are parsing/generating something where the Encoding is guaranteed, defining the LUT in terms of codepoints saves a bunch of GC pressure from the throwaway single-character String objects which have to be collected. One of the pieces of example code even uses `#each_char` and then compares its `#ord`, so it's already halfway there.
I don't feel like compiling 3.4-master to test it too, but I just verified this in Ruby 3.3.6:
Apologies for linking my own hobby codebase, but here are two examples of my own simple LUT parsers/generators where I achieved even more performance by collecting the codepoints and then turning it into a `String` in a single shot with `Array#pack`:
- One from my filetype-guessing library that turns Media Type strings into key `Structs` both when parsing the shared-mime-info Type definitions and when taking user input to get the Type object for something like `image/jpeg`: https://github.com/okeeblow/DistorteD/blob/fbb987428ed14d710... (Comment contains some before-and-after allocation comparisons)
First, if the author is going to read this, let me thank you for your work. As a Rails developer, I find the premises very relatable.
Again, as a Rails developer, a pain point is different naming conventions regarding Ruby hash keys versus JS/JSON object keys. JavaScript/JSON typically uses camelCase, while Ruby uses snake_case. This forces me to perform tedious and often disliked transformations between these conventions in my Rails projects, requiring remapping for every JSON object. This process is both annoying and potentially performance-intensive. What alternative approaches exist, and are there ways to improve the performance of these transformations?
I don't have a solution for the performance problem. But for the camelCase to snake_case conversion, I can see potential solutions.
1. If you are using axios or other fetch based library, then you can use an interceptor that converts the camelCase JavaScript objects to 'snake_case' for request and vice versa for response.
2. If you want to control that on the app side, then you can use a helper method in ApplicationController, say `json_params`, that returns the JSON object with snake_case keys. Similarly wrap the `render json: json_object` into a helper method like `render_camel_case_json_response` and use that in all the controllers. You can write a custom Rubocop to make this behaviour consistent.
3. Handle the case transformation in a Rack middleware. This way you don't have to enforce developers to use those helper methods.
I believe his point is that this transformation could be done maybe in C and therefore have better performance, it could be a flag to the JSON conversion.
I find the idea good, maybe it even already exists?
It could be done relatively efficiently in C indeed, but it would be yet another option, imposing et another conditional, and as I mention in the post (and will keep hammering in the followups) conditions is something you want to avoid for performance.
IMO that's the sort of conversion that would be better handled by the "presentation" layer (as in ActiveModel::Serializers and al).
In these gems you usually define something like:
class UserSerializer < AMS::Serializer
attributes :first_name, :email
end
It wouldn't be hard for these libraries to apply a transformation on the attribute name at almost zero cost.
> Again, as a Rails developer, a pain point is different naming conventions regarding Ruby hash keys versus JS/JSON object keys. JavaScript/JSON typically uses camelCase, while Ruby uses snake_case.
Most APIs I've come across use snake_case for their keys in JSON requests and responses. I rarely come across camelCase in JSON keys. So I'm happy to just write snake_case keys and let my backend stay simple and easy, and let the API consumer handle any transformations.
I use the same approach another comment points out, using Axios transformers to convert back and forth as necessary.
> if I could go back in time and tell them to avoid one thing it would be their strict adherence to naming conventions
Monkey paw curls. Rails probably wouldn’t have reached popularity were it not for the strict adherence to naming conventions. The Rails value prop is productivity and the ethos of “convention over configuration” is what makes that possible.
There's been arguments over the years that push back to rails magic led to a new generation of explicitness. You're also correct in saying that it's part of the value prop for productivity. But there's an overhead to learning the conventions in the first place and these days there is no denying less and less people are coming to Rails and even less learning Ruby. That's just the lifecycle of software I suppose. I'm extremely grateful for the large companies with massive ruby codebases that continue to move the language and ecosystem forward.
I'm ok with most of the naming conventions, but the pluralization is one I loathe. The necessity of custom inflections should have been a strong smell, IMO.
If you look at the source [1], you'll see what it's doing is very simple (the file I linked is basically the whole library, everything else is Gem-specific things and tests). You can even skip the gem and implement it yourself, not a big dependency at all, so no need for constant maintenance in this case :p
Wish he would write more often, I've tried to get into ruby-core type of work more than once, but never found something that matched my skills so I could positively contribute and after a few weeks of no results the motivation would wear off, as it's really difficult to have the context he has shared in the article, for example.
If more Ruby C people would write more often, I bet there'd be more people with the skills that are needed to improve Ruby further.
The C profiler advice was great. Maybe I could just get a Ruby gem with C code and start playing again on optimizations :-)