In Ruby, as in NodeJS, the GIL pushes you to scale horizontally. The memory footprint of Hello World becomes a big problem, because the number of copies you run will be proportional to the number of cores you have, not the number of machines. You get no benefit from moving from an 8 core box to 16 or 20 cores.
I suspect if they do manage to pull off more concurrency in Ruby 3, that vertically scaling machines will make more sense. If 8 cores benefit from a shared footprint, instead of one core per process, then the budget looks more attractive.
So now might not be the right time to cherry-pick some of these features, but it may not be far off.
FWIW the GIL has been the GVL since YARV was merged in and it became based on a virtual machine rather than purely interpreted. I believe this was 2.0.
> because the number of copies you run will be proportional to the number of cores you have, not the number of machines
While this is true, Ruby is also very CoW optimized so while forks grow linerally in size (with count), usually the first fork is drastically smaller than the process it was forked from.
I work at Heroku and recommend perf settings to customers. 5 years ago people were mostly hitting memory limits. Now it's pretty common to see apps that are maxing out the CPU well before coming close to ram limits.
Especially when compared to javascript, Ruby is extremely memory efficient.
I agree with your larger statement but wanted to chime in and expand on those two points.
CRuby could still be much better at CoW. In theory, a forked process only needs a similar memory allocation to a pthread. In practice the runtime writes in a bunch of these inherited pages and fucks it up. malloc-ed memory is usually bigger than the "Ruby heap" so that kind of limits the impact you can have by trying to not write/re-write.
The high memory usage of ruby still causes problem if the app is single-threaded. I scaled databases for ruby apps for a living for almost 8 years, and sadly single-threaded legacy ruby app is still a thing.
Anyway, in the single-threaded scenario, the app may appear to be CPU bound under the steady state. However, when some hiccup happens in a database or in another microservice, all the ruby processes could soon be blocked waiting for network responses. In this case, ideally there should be plenty of idling ruby processes to absorb the load, but it will be rather costly to do so due to the high memory usage.
There are potential fixes of course, but with trade-offs:
- Aggressive timeout: May cause requests to fail under the steady state
- Circuit breaker: Difficult to tune the parameters, may not get triggered, or may prolong the degraded state longer than necessary. Also not a good fit when the process is single-threaded, as it can only get one data point at a time.
- Burning money: Can only do this until we hit the CPU : memory ratio limit imposed by the cloud vendors.
- Multi-threading: Too late to do this with years of monkey-patching that expects the app to run single-threaded.
Well, having more spare ruby processes / threads would make the app more resistant to latency variability, and could have made some incidents into nonevents.
Also, while I don't disagree that it is indeed a hard problem, I do have very good experience with an async java stack, where I didn't have to worry about things like this. As long as a sane queue limit is defined on let's say the jetty http client, if something bad happens at the other end, the back pressure would kick in by failing immediately the requests that couldn't make it into the queue. Other parts of the app would then continue to be functional.
So, I would contend that it has a lot to do with ruby high memory usage, made much worse when single-threaded, and it looks like ruby 3.0 still won't have a complete async story yet?
EDIT: I checked the link again, and it looks Jeff Dean was talking about latency at p999 or above? By "hiccup", I actually mean something that would increase avg latency by perhaps 5~10x times, e.g. avg latency of 100ms under steady state + timeout of 1 second + the remote being down. Sorry for the confusion. Here, I am lucky if people start caring about p95.
That's not an inherent property of a particular language or concurrency model, though. That's having logic to track request queue depth for a particular service or endpoint and fail fast/load shed. You can do the same in Ruby! Some would probably say this is what a service mesh is for.
Maybe you're thinking of the new Actor based model for compute parallelism? Async IO in production Ruby has been a thing for easily more than a decade.
Of course it is not an inherent property of a particular language or concurrency model, but it is a property of a particular language ecosystem. As a turing complete language, everything is doable in ruby, but at what cost? Now we are back to trade-offs I listed above.
As for async IO in production, looking at the client library, https://github.com/socketry/async-http is barely 3 years old, and probably reached the production-ready state a few months ago, if we are being generous.
But good point about service mesh. Moving the circuit breaker responsibility to the service mesh would definitely help in my case, as the sidecar would have all the data points from the 10+ single-threaded ruby processes running in the same pod, and thus could make a much quicker decision.
If you're using Unicorn then you've already got Raindrops which gives you a really simple way to do shared metrics across forked processes like in-flight requests to another service or how many of your Unicorns are busy.
EventMachine has been losing steam for awhile now, which is why I brought up Async as the new hotness. I don't think it is fair to classify async-http as "just a new client library". As of now, in the ruby ecosystem, the Async framework is the only player in town. From my perspective, it still looks pretty much unproven, but perhaps we just live inside different bubbles.
It kinda feel like we are talking past each other here. I would just like to clarify that I inherited all these different ruby apps, and I don't have the magical ability to go back in time and say "Hey, perhaps we should use an async framework from the beginning" or "Dude, enough with the monkey-patching". And even if I do, those could be bad advice, as the ruby apps are making money in production.
Anyway, thanks for the suggestion to share metrics across processes. That will definitely help with the circuit breaker decision making in my case.
CRuby forks using fork() and Copy-on-Write shares memory from parent to child.
JRuby doesn't have a GIL so you only need a single process. Same with TruffleRuby.
With CRuby, you're much better to run a bigger container with multiple processes than one process per container.
With either NodeJS or CRuby you're still better to run less containers on bigger hosts. Each host has to duplicate the host OS and container infrastructure. Each container of a real production app also duplicates a bunch of stuff despite Docker's best attempts at sharing.
Some major differences here are how they interface with I/O and the mechanisms around memory sharing.
Nodejs workers are more like webworkers and mostly suitable for proper CPU-intensive parallelization whereas in Ruby it's not uncommon to run e.g. multithreaded web server in the same process and namespace.
That's rather vague. But yes, no matter which JIT you always need some extra memory to run the JIT, and it creates a more optimized version while also needing the unoptimized version of the code, so it needs more memory.
Python3 has really been the norm since 3.4, which was released in 2014. After that it took another year until most major packages were updated to Python3, but that happened at some point in 2015. By 2016 there weren't many packages left that weren't Python3 compatible, or that didn't at least have Python3 replacements.
This is simply not true. From 2014 to 2017, I worked at a place where I kept starting new projects in Python 3. But people with less foresight wouldn't let their Python 2 habits die. It was a constant struggle to get anyone to realize that Python 3 was the future, and the majority of the code ended up using Python 2. In 2019 when the end-of-life for Python 2 was finally announced, I knew of companies scrambling to upgrade.
I can't help but contrast this with the upgrade from Ruby 1.8 to 1.9. It was also painful, but no one in the community was holding on to 1.8 years later.
> I can't help but contrast this with the upgrade from Ruby 1.8 to 1.9.
One difference is that upgrading Ruby from 1.8 to 1.9 brought a significant performance increase, whereas going from Python 2.x to early 3.x, performance actually got worse.
AFAICT this was mostly caused by the removal of the machine-word-sized integer type - in Python 3, even 1+1 is calculated using arbitrary-precision integers.
Python 2 => 3 had lots of other problems as well - ultimately they changed just enough to break everyone’s code, but not enough to make upgrading worthwhile.
Without that I think many would lose trust in Python and just switch language.
I mean it has only been 11 years since Python 3.1/2.7 and that's probably a common lifespan for maintenance mode code projects? 3.5 is still supported and that one is 5 years old. Why the hurry.
Because some people will always leave it to the last moment or beyond. Meanwhile the Python team has had the overhead of supporting more code than necessary.
Python is not technically superior to other languages enough that you can rhetorically ask that question. It’s main advantage is the ecosystem and network effects. If a bunch of people, especially the people who work on numpy, scipy, etc., decide to work on developing libraries for other languages like R and Julia, the data science ecosystem would switch over in a few years. Similarly for other fields people might switch to languages like Elixir, Haskell, OCaml, Go, Swift, Scala, Ruby, Kotlin, etc.
> Python3 has really been the norm since 3.4, which was released in 2014. After that it took another year until most major packages were updated to Python3, but that happened at some point in 2015.
So from the perspective of an application developer who uses package dependencies, using Python 2 was the norm until 2015 at the earliest? That sounds about right to me.
I beg to differ as someone who's been forced to use python 2 at work until next year. As for the python ecosystem as whole, it feels at the earliest the transition to 3 had to be around 2017 - 2018 before most popular libraries got with the program. Then again the numbers could prove me completely wrong. I just remember personal pain points well. By comparison, the Ruby ecosystem was much faster in transitioning.
Not true. Vanilla 18.04 LTS ships with no Python installed by default and the main repos `python` package is 2.7. Same for Debian 10. Ubuntu 20.04 switched the `python` package to Python3.
Even conservative CentOS (where Python is a base dependency for the system, as opposed to the above) is on Python3 as of now though.
As far as I am aware, Debian 10 buster is the last mainstream LTS distro to default to python2. Should change next year with Debian 11.
That's just a measure for backwards compatibility, so python2-only scripts don't cause cryptic errors when they have a `python` shebang. Many distros ship without python2, but will probably still link `python -> python2` for the near future.
In 50 years, I bet they'll still be looking for python 2 developers to work on old python code...
Or not... it's not like you can't throw a python 3 dev at an old python 2 codebase and tell them to work on it. Even if that probably wouldn't make them very happy about it, they wouldn't be lost.
But I'd bet an arm that there will still be python 2 codebases running in production in the next decades with companies very unwilling to do the work of migrating it.
No, they made a big deal of giving a ten-years period to developers to port their code before depreciating python 2.
But honestly, there are more breaking changes between two versions of ruby than between python 2 and 3. And during the time python 2 was still supported (until the last day of 2019), most features that made it into python 3 and could be backported were backported into python 2.
It will be a pain for those who aren't used to python 2 encoding errors and other nice stuff that they got rid of in 3 to make it a nicer and more robust language. And the will miss the new and shiny features that will make it in new python 3 version after python 2 EOL.
But apart from that, it's almost the same language, they just made the transition from 2 to 3 to be able to introduce some breaking changes in places that unfortunate design choices had made their way into the language and couldn't be rolled back because people running python 2 in production depended on those. So they upped the major version, introduced some breaking changes (but not that much really) and gave developers 10 years of support for the older language so they could port their codebase to the new version.
And porting a python 2 app to a python 3 app isn't such a hard task. But! If you've got a big app that's working now, even if the changes aren't that drastic, you can't be sure that the port won't introduce some hard to find bugs that will be a pain to debug. Hence why plenty of companies are still running python 2 versions of their apps and will do so for the forseeable future.
But, throwing a python 3 dev at a python 2 codebase is totally doable, it's just that the guy or gal will miss the shiny stuff that didn't get backported into 2 and will break their teeth on some behavioral changes between the two languages.
It still isn't default on Windows because they separated it from Windows so they could ship new versions of PowerShell faster than new versions of Windows are shipped[1]. The renumbering to the major version 7 was intended to communicate compatibility (via loading 5.1 internally where necessary).
I don't know that it's certain it ever will be default on Windows, IIRC the last things I read were that it might become an optional package in Windows.
[1] Six-monthly, now moving to annually to align with new versions of .Net Core.
I know this is a joke, but we’re legitimately running v8 as a PHP extension in production on high-traffic sites. It’s actually quite good for server-side JS rendering when you’re working inside an existing PHP framework (WordPress, in our case).
I’m excited about performance improvements but thrilled at the idea of adding types. Has anyone here worked with Sorbet or a prerelease 3.0 in Rails and able to share some notes?
Our team has been using Sorbet at getcensus.com for almost a year now, and I'm generally very happy with it though it's not perfect.
Like most non-trivial Rails apps, our test suite takes a while to run, so I like having Sorbet to catch "dumb" issues without having to run the full suite. Running `srb tc` to check types is incredibly fast and seems to be scaling well as our codebase grows. It catches the obvious stuff, but has also found some subtle bugs in flow checking and is great for refactoring support. The false positive rate is extremely low - if Sorbet flags a regression in your type checking, it's very likely to be a real bug.
The Slack community is helpful and responsive - if you're thinking of using sorbet, I'd strongly suggest joining.
The downsides are:
- Unclear workflows - it's hard to know when you need to "rescan" for new type definitions in gems, the stdlib, and in generated code in your own app
- Poor Rails integration - the sorbet-rails package is helpful and being actively developed, but it's clear that the maintainers don't use Rails and aren't going out of their way to support it.
- Upgrades are rough - the sorbet tools that scan your gems and code to find "hidden definitions" are seemingly unstable from release to release. There's a good chance that upgrading to a new version of sorbet will break your type checking for mysterious and hard-to-debug reasons. Lots of this is probably related to Rails as well.
- IDE integration isn't quite ready for prime-time yet. I've gotten it working in Emacs with lots of experimentation and poking around, and I think some folks have it working in VSCode too, but it's not officially "released" or supported and it crashes somewhat often. It's still stable enough to be useful and I'm glad I have it.
It's great and seems to be getting better, and it has absolutely made me more productive, but know that you're still adopting an alpha- or beta-quality tool and it's unlikely to "just work".
j/k. sorbet-rails maintainer here. I agree with the assessment that sorbet doesn't go out of the way to support some Rails feature, eg method overloading or scoping block accurately. Sorbet tool is opinionated about some of the design choices that makes it hard to support Rails' extensive use of meta-programming. That said, Sorbet is still useful in checking the custom code we write on top of Rails and their interactions. It may be hard to type the model files themselves, but we can type-check the code making use of the models! Recently, I started a new project on Rails and it's quite fun building it with type from scratch :D
I find Sorbet a very helpful tool for development. I hope people will give it a try and contribute to tools around it (sorbet-rails included) so that we have great tools to use!
I did not mean to put down sorbet-rails - it has been really useful to us and we appreciate all the work you and your team have put in to it!
But I do have the sense that building Rails apps using sorbet won't feel "first class" until we have some sorbet maintainers that use Rails or the Rails team starts adopting sorbet (or both!)
Yeah, I totally agree with this. In some way, it's the difference in philosophical approach of the two system that it'll be hard to reconcile. Eg: Rails favors convenience (method that just works under various condition), whereas Sorbet favors explicitness (no method overloading, typed struct class instead of hash)
I'm also very excited for Sorbet. Not because of types specifically (I don't use them and don't plan to), but because I hoped it would give me the same linting experience that ESLint gives me on JS files (unused variables, undefined methods, calling methods on nil, and so on).
The sorbet demo (https://sorbet.run/) is all I could wish for (you can remove the type signatures and see that it would still warn you about the `.barr` typo).
However, it is still a great deal of work to set it up on Rails (Sorbet is made by Stripe, they use Ruby but not Rails), and I couldn't finish the setup because of some gem warnings that I couldn't updated at the time.
@pqdbr: sorbet-rails maintainer here. We make the library to bridge the gap between sorbet & Rails. If you have any issue with setting up sorbet and/or sorbet-rails. Hit me up with issues on the repo and I'll try to help!
from Soutarowho who did the type system for Ruby 3:
We defined a new language called RBS for type signatures for Ruby 3. The signatures are written in .rbs files which is different from Ruby code. You can consider the .rbs files are similar to .d.ts files in TypeScript or .h files in C/C++/ObjC. The benefit of having different files is it doesn't require changing Ruby code to start type checking. You can opt-in type checking safely without changing any part of your workflow.[1]
So UX-wise I don't know. While it's nice to have them separated, you are just faster if you have types in the code in front of your face when the IDE warns. Otherwise you will context-switch on any warning/error. This could get significant since you code just based on the LSP's output sometimes for hours. The IDE could help here with showing the definition when warning but still, if you want to change the definition or want to see more than just a snippet you will constantly jump between files.
Also .d.ts files were introduced to type old code and as an fallback and as a secondary option, At some point, Matz and Soutaro need to integrate types into the language itself to get the same level of productivity other typed languages offer, eg. Rust/Go/TS.
Ruby 3 is expected to introduce new concurrency primitives that evade the global interpreter lock (guilds / isolates) and type definitions for the stdlib for optional typing support. This should be a big release for ruby!
There are some non-backwards-compat changes regarding keyword arguments.
They aren't that exciting, but they are necessary to eliminate some ambiguous and inconsistent cases, and will be a pain for some codebases. (2.7 already marks as deprecated behavior that will break in 3).
I'm not sure if ruby actually commits to semver-style no-backwards-incompat-unless-major; they didn't used to. Either way though, recent ruby minor version releases have seen few if any (?) backwards incompat changes of any note -- nothing of note I can think of since 1.9 in 2007 (which did have major changes. Ironically 2.0 didn't have so much). The keyword arg changes will definitely effect more codebases more significantly than any we've seen in a while.
It might be a little of both. I did come across this change to keyword arguments[1] recently, but I'm not sure how impactful it is since I don't personally leverage keyword arguments right now.
Problem is if you're stuck on gems that aren't maintained any more. Seen a few with PRs containing fixes going stale. Can still switch to a fork, at least.
looks like they're adding an optional type system, which means you get the worst of both worlds- no guarantees AND no compile-time type checking (think: what happens if type-checked code calls non-typed code?) so I have no idea how they're going to make that fast since all types will still have to be checked at runtime
Are you sure it’ll be awful? Sorbet (https://sorbet.org/) is pretty popular already. It can statically check your whole project and dynamically check it at runtime. It also doesn’t add that much overhead so I’m not sure what you’re on about...
Not sure what Ruby core will do, but Sorbet will type everything as T.untyped in the absence of type hints. Libraries can progressively provide rbi files to add first class type support though.
Sorbet was developed before the details of the new Ruby type system were locked in. They’re working on providing some convergence in upcoming releases.
A lot of focus has been on this benchmark https://github.com/mame/optcarrot/
It has seen huge performance improvements thanks to the mjit but not quite 3x yet.
Performance is more of an implementation level change rather than a language change unless they are restricting the language to facilitate new optimisations.
I'm waiting for this since long time. Hope it will have more features, more faster, more performance oriented, more exciting, and more user friendly than other languages.
Well, yes and no (althought the question is a bit open).
It's more or less beta quality, and very primitive. It's discouraged to be used with Rails, so I'd be inclined to state that "we didn't get it yet".
I'm also personally skeptical that the unusual approach (invoking a whole C compiler in a separate thread) will stand in the long term - but that's my own take.
The CRuby JIT is stable but whether it improves performance or not is workload dependent.
It's simple not primitive. MJIT is designed to take advantage of a C compilers optimization.
"Compile to C" worked for Chicken Scheme for the past 20 years and continues to be a popular way for functional langauges to compile. It's also how Nim works. It's all about different trade-offs.
Are there any benchmarks available? It seems I can find very little (OptCarrot, some microbenchmarks, and some general assessments about usage with Rails), and a broad(er) overview is crucial to assess the overall performance (I suppose that you can have workloads where the performance degrades).
JRuby has a semi-optimizing JIT since JRuby 9000 was released in 2015. It's mostly non-speculative but still the fastest way to run Ruby in production. I've used it at several companies.
Graal/TruffleRuby has shown some massive perf increases [3]
[1] https://pragtob.wordpress.com/2017/01/24/benchmarking-a-go-a...
[2] https://pragtob.wordpress.com/2020/08/24/the-great-rubykon-b...
[3] https://www.reddit.com/r/ruby/comments/b4c2lx/truffleruby_be...