Hacker News new | past | comments | ask | show | jobs | submit login
Ruby 3.3's YJIT Runs Shopify's Production Code 15% Faster (railsatscale.com)
175 points by nithinbekal 12 months ago | hide | past | favorite | 154 comments



  > 1.27 million requests per second
  > 3TB/minute of traffic
"rails doesn't scale"


That's not on a per-host basis. Shopify's design is, quite fortunately, one that partitions really well, as each store is completely independent of each other.

Each store can be assigned to one pod, each pod can have as many hosts as it takes to optimize the use of a database instance, and then you can add more pods as the need arises.

Edit: to be clear, that's not to say Rails can't scale. It can. It's just that it doesn't need to- you can scale anything with enough partitioning.


Sure.. but isn't that a database problem and not a rails problem?


Well, Shopify is investing in optimizing Ruby, so they believe Ruby was the component with the most opportunity for improvement. And the results are showing there was indeed lots of places where improvements could be made.


That's like saying the leading F1 team is investing in optimizing side mirrors, therefore they believe side mirrors are the component with the most opportunity for improvement.

The fact is that performance oriented organizations optimize everything unless they have math telling them it isn't worth optimizing.

The "weakest link" belief is pure conjecture


> The fact is that performance oriented organizations optimize everything unless they have math telling them it isn't worth optimizing.

Most companies don’t have unlimited budgets. Performance-oriented organizations profile and then spend money where profiling tells them to. Shopify isn’t hiring people to contribute to MySQL or Redis internals. They hired a full team to work on Ruby internals, not just creating YJIT, but also on CRuby’s memory layout, hiring the lead on TruffleRuby, and funding academic programming language research on Ruby.

No company has an infinite budget to “optimize everything”. It is clear where internal performance testing pointed Shopify (at Ruby) and with double digit gains being extracted year after year, their profiling didn’t lie. And other Ruby on Rails shops are seeing similar double digit performance wins, not on fake benchmarks, but on actual page load times and traffic that can be handled by a server.


Disclaimer: I'm a member of the Ruby & Rails infrastructure team at Shopify and Ruby committer.

> Shopify isn’t hiring people to contribute to MySQL or Redis internals

You are not wrong, but the main reason is that contrary to Ruby, MySQL and Redis are used by a lot of huge companies and are themselves owned by companies with full time people on it.

In comparison Ruby is still mostly a volunteer ran project that receive little funding and effort relative to its importance.

As for why trying to make Ruby faster at all in the first place. It's not because it was too slow, it's mostly just that at our scale, the engineering time spent on optimizing the runtime pretty much pays for itself.

But that's nothing specific to Ruby, in most very large software companies you will find similar efforts. e.g. I remember Twitter had a team working on the JVM, etc.


thank you for that insight! If I had to guess, early on shopify had a choice to say jump to Java or a JVM-based language and scrap Ruby altogether, but they chose that the benefits of ruby outweigh the benefits of java's performance. So now given that shopify is large enough spending $1MM to win $5MM on infrastructure yearly is easily worth it.

In any case, I am very grateful to Shopify because I think if they decided to switch over to java back in the day, ruby may actually be a mostly dead language by now.


> had a choice to say jump to Java or a JVM-based language and scrap Ruby altogether

As someone who joined Shopify 10 years ago, my perception is that it wasn't an option.

At the time competition was fierce and the priority was to get new features to grow.

If you are a free service Twitter, with a strong network effect, it makes sense to retool to reduce your costs, as it's a primary factor in profitability.

If you are a B2B paying service like Shopify with little to no network effect to keep your leading position, halting features for years while you re-tool, re-train etc is a death sentence.

My (hot) take is that if Shopify had decided to switch over to Java back in the day, Shopify may actually be a mostly dead company by now.


Thank you for that insight. I think that is really great to surface. If your product relies on the ability to spin up new features, then inefficiencies in operating costs can be eaten up by the fact that the company functions. If you don't spin up features that quickly but operational costs is king, then you make different decisions.

In any case, it has been a pleasure hearing insights from the inside. Thank you.


Most companies don’t have unlimited budgets. Performance-oriented organizations profile and then spend money where profiling tells them to.

The bizarre part here is that anyone who paint themselves into a corner with ruby, then decide that they need to implement an entire new fancy VM instead of just find targeted bottlenecks and rewriting them in C++. They writing complicated native programs to make a VM that is only 15% faster when they could rewrite specific sections that are slow and speed those up by 200x or more.


I understand that it's a metaphor but F1 teams did really optimize side mirrors more than once, leading to controversies

Mercedes 2022 https://www.the-race.com/formula-1/mercedes-2022-f1-car-make...

Williams 2019 https://www.autosport.com/f1/news/williams-modifying-front-s...

Ferrari 2018 https://www.autosport.com/f1/news/how-ferraris-formula-1-mir...


Of course, most teams do. Some teams have had 2 revisions to side mirrors this season.. and of course it didn't decide the WC, which is my point. Organizations don't merely optimize "the bottleneck" they optimize everything worth optimizing.


It's easy to believe (and probably true) that there are some low-hanging fruits in a place nobody looked at before.


No one disputes that Ruby is slow.

But you picking “mirrors” in your analogy makes it sounds like a premature optimization.

The reason why perf isn’t typically an issue with Rails is the design pattern is to leverage heavy caching.

The use of caching is to address the slowness of Ruby.


The two major areas where caching is used in Rails are:

- database queries

- template rendering

First one because round trip to db + running the query + allocating ORM result objects that otherwise get thrown away.

Second one because allocating a ton of strings that get join'd and thrown away.

> No one disputes that Ruby is slow.

I am, because the blanket statement doesn't make sense without context. Also that view is largely biased by the typical assumption that Ruby == Rails.


Well objectively, one could say both Ruby and RoR are slow. The first via The Computer Language Benchmarks Game[0], and the latter via TechEmpower Web Frameworks Benchmarks[1].

In practice though the database is doing most of the work, most of the time so is typically slower than both.

[0] https://en.wikipedia.org/wiki/The_Computer_Language_Benchmar...

[1] https://www.techempower.com/benchmarks



> "Also that view is largely biased by the typical assumption that Ruby == Rails."

Given that the predominate / vast majority use case for Ruby is for web development and Rails being the dominate Ruby web framework ... it's a fair assumption & characterization for people to make that the Ruby == Rails use case.


No, they both matter. The database can handle X shops per partition, and the rails host can handle Y shops per partition.

If rails were half as fast, you'd need twice as many rails hosts (but no more databases).


> but no more databases

Sort of. Twice as many rails hosts means more DB connections which generally means more load/memory on the DB or more load/memory on the external connection pooler.

It's only a bit of incremental load, but it's easy to overlook how many other systems need to run to make Rails scale.


No he’s his examples he’s talking 50 stores over 10 machines vs 50 stores over 5 machines. Both would require the same DB count, but the second would save on server costs for stores.


I think the above means (if I do get the point) that databases scale in terms of both requests per second and number of active connections.

Having 1000 connections doing 1 request per second isn't the same as having 1 connection doing 1000 requests per second.

But, and I could be wrong, the larger factor is the number of requests per second.


This entirely depends on the database you are using, and if you are using a connection proxy in front of your database.


There are of course a lot of things that cross store partitions, e.g. ShopPay.


>That's not on a per-host basis.

I'm sorry, but this is one of the silliest nit picks ive ever seen on this site. Of course 1 million rps and 3TB/s isn't coming from a single host. 3 TB/s is far beyond the throughput of any network I've ever seen, considered or thought of, short of maybe a data center (and I don't work in that domain). 1.3 million requests per second is far beyond the capacity of pretty much any hardware available right now.


The second paragraph explains the argument. Seems spot on and relevant to me.


The point is, it's not Rails that is scaling here.


Ok now what is the cost of all the instances. And how many instances would be required if Go/Rust would have been used?

Also what is the cost in man hours spent on optimizations and profiling.


Name one success on the scale of Shopify using Rust


Async Rust has not been around long enough for lots of examples. Go has plenty of them. Rust is powering big parts of CloudFlare along with Go, which is way bigger scale. AWS also has big parts done in Rust.


> And how many instances would be required if Go/Rust would have been used?

Zero. Because Shopify would have waited until Rust came out in 2015, instead of launching in 2006, and they would never have gotten off the ground and been another failed techbro startup that instead of getting shit done, bikeshedded over languages.

PHP and Ruby apps have generated far more revenues than all the Rust and Golang code combined.


Rails was a great choice for Shopify, GitHub and Stripe, no doubt. No one is questioning that.

GP’s sarcastic “rails doesn’t scale” implies that it would also be a great choice for people starting afresh in 2023. The reply asks for a comparison with other languages popular in 2023, especially ones that are known for being more performant (lower memory and CPU consumption, lower latency).

And that’s when you’re dragging the conversation back to 2006. It’s not 2006 anymore.


Rails was initially too slow for Github, so they forked it, didn't use "Rails" for a while [1], lost literal engineering years to upgrading it (same for Shopify), and now Github has an engineering department dedicated to working of the Rails master branch directly, which is huge engineering overhead and a problem and solution that shouldn't exist. Github co-founder Tom lamented using Rails at Github and has stopped using it [2].

If you're Github or Shopify and can throw (waste?) engineering years at solving a framework specific ecosystem nightmare problems, and have the klout and runway to hire core Ruby and Rails maintainers, then you're probably in a highly unique situation and could use any framework you want.

The rest of us don't see Rails as a great choice for Github. Doubt and questioning.

[1] https://videos.itrevolution.com/watch/550704376/ [2] https://youtu.be/GfhPeOiXDLA?t=725


> Rails was initially too slow for GitHub, so they forked it, didn't use "Rails" for a while

That's a weird way of framing it.

They stuck with a fork of Rails 2.3 for a long time because the upgrade was deemed too costly, not because their fork was faster.

In the end their performance patches were either outdated or contributed upstream, and they are now on Rails main branch.

And while it was a fork, it was still largely "Rails".


Please listen to the Github engineers in the provided links.

> We forked rails and _practically wrote our own._ We fought against the framework. We deviated from the framework, and we even wondered if rails was right for us at all.

and

> Rails 3 was found to be five times slower than Rails 2


> Please listen to the GitHub engineers

I regularly talk with engineers that worked on that project at GitHub, some are now my coworkers. I know more about this effort than what was said publicly.

> Rails 3 was found to be five times slower than Rails 2

This is a bogus claim. It might have been 5 times slower on some pathological cases, it absolutely wasn't 5 times slower overall.


The presentation and "bogus claim" is from a principle Github engineer and core Rails team member. I will trust them rather than anonymous anecdotes.


I'm a Rails core member too, and Eileen is my colleague...

You are interpreting both links you gave in terrible ways.


I’m quoting, not interpreting. What am I missing?

It sounds like you’re seeing the pain GitHub suffered through rose colored glasses. The talks about GitHub Rails upgrades say it took years and caused burnout.


You are quoting out of context, losing the meaning of the quotes.

The conclusion of Eileen's talk is that by not following with upgrade and essentially forking Rails 2.3 they painted themselves in a corner. They took short term gains, and produced longs term losses. It's a self induced problem.

In the end they upgraded and are now tracking the main branch, so the problem wasn't Rails.


Yes, not ideal. But we can’t know what would have happened if they had chosen a different language in 2007. Did they have the option of an efficient, well supported language that continues to be used in 2023? Maybe Java, although Java languished for years before development picked up again. C# is also a candidate.

But one thing we can’t measure - how many candidates chose to join Shopify and GitHub because they were keen to work on Ruby? Java had a reputation for being boring, while Ruby was fun and exciting. Their success was possibly tied to this, but we’ll never know for sure.

In 2023 the calculus of what language to choose is different. But these companies are just glad they succeeded while others didn’t.


The problem is at the time if you want things done fast, Rails was the right choice. Almost no startup would touch Spring as app server at the time. Django had not reached 1.0 yet, and it's not faster anyway. So for a startup, Rails was the only realistic choices.


What I find ironic was that in 2006, Rails was the shiny new kid on the block. These co's picked the "new" way of doing web dev compared to the stodgy Java/C# types. And yet, by recommending Rails for a new startup in 2023, they're actually more like the stodgy Java/C# old school paradigm camp, that the Rails startups avoided! A startup that would've used Rails in 2006 is more like a startup that is using things like NextJS in 2023. We see that in the stacks of new YC companies


The NextJS folks and Go folks are in that sweet spot. You use the framework du jour, you pat each other on the back, beaming with the folly of your ancestors to use such inferior tools, thinking "this is a golden age of NextJS (or Go) that will surely never end."

What comes next might not even be better, but that won't matter, because what came before won't be cool anymore.

You die a hero or live long enough to become the villain.


Nextjs is not comparable. I'm not sure if it's required to get some web pages out, I don't think frontend is that important in the startup space any way if u are able to fulfill the requirments

For app server nowadays u have many choices. But 15 years rails was really the only better choice for a few men's startup shops. C# was on windows shop, that's a no for many


You're implying that those languages made actual building products easier. I think we know by now they didn't. Go is a language which preaches building your boilerplate than reusing it. Produces very little of the economies of scale requires to build compelling products. It has other advantages, and the single binary thing captured a niche in infrastructure software, but that's it outside of Google. Rust is still young, jury is out, but it's already considered a big and hard to learn language, and that's not going to change soon. It'll eventually find a niche.

Ruby is still a great way to start it up. Consider that in 2006-2008, it's deployment story was horrible. Since then, the ruby ecosystem bootstrapped lockfiles, 12 factor app manifesto, and a lot of the conventions we all take for granted nowadays. And while there are certainly enough arguments to bikeshed on, its still a rock solid ecosystem.


> Because Shopify would have waited until Rust came out in 2015, instead of launching in 2006,

> PHP and Ruby apps have generated far more revenues than all the Rust and Golang code combined.

You already stated the obvious: PHP and Ruby apps generated far more revenues simply by existing longer.


I also question the revenue claim.

Google has a lot of revenue. Pinterest, Hashicorp, Uber, Twitch, Dropbox, etc. all have a good amount of golang and collectively have a lot of revenue. It might need a few more years to tip the scale, but it's closer than suggested here.


Twitch was initially built with Ruby on Rails as well.


YouTube was initially built with php, so the revenue argument is true.


Was it? What I was told was that Youtube was built in Python and was gradually migrated to C++ after bought by Google.


and Dropbox with Python


Or they could have used something available in 2006, like C++, Java, .NET/C#, OCaml, Haskell, D.


> Or they could have used something available in 2006, like C++, Java, .NET/C#, OCaml, Haskell, D.

Going for .NET/C# would have likely limited anyone to using mostly Windows Server for their infrastructure. Not that it's a bad OS, but .NET Core was released only in 2016 and although Mono came out in 2004, sadly it never got the love it deserved and was rather unreliable (otherwise we would have seen way more cross platform development before .NET Core). Oh, also, turns out that LINQ (which is pretty cool) was only released in 2007, though that still puts them a bit ahead of Java I guess, although I can't comment on when it landed in Mono.

Going with Java would have meant using something like Java 6, whereas the first truly decent version (in my eyes) was Java 8, which came out in 2014. Of course, the older language version and runtime wouldn't be a huge issue, however projects like Spring Boot only came out in 2014 and before then most people would either use Spring, Java EE (now Jakarta EE) or a similar framework from back then. I've worked with both and it wasn't pleasant - essentially the XML configuration hell with layers of indirection that people lament.

I mean, either would have probably been doable, but it's not like other stacks are without fault (even the ones I cannot really comment on).


Stackoverflow is doing just fine with Windows Server.

Java 6 would still blow the water out of Ruby's slow interpreter.

Being pleasant isn't relevant for performance.


> Stackoverflow is doing just fine with Windows Server.

Good for them! I guess it mostly depends on what you want to build your platform around, what the constraints are and what developer skillsets are popular in your market.

> Java 6 would still blow the water out of Ruby's slow interpreter.

Probably! I do recall major GC improvements starting from JDK 8 onwards, though when compared to Ruby even the older versions would probably be decent: https://blogs.oracle.com/javamagazine/post/java-garbage-coll...

It would actually be fun if someone pulled out the old versions from back then and did some benchmarks, though maybe asking someone to build a full stack application in such a dated tech would be a tough ask, unless they're passionate about it!

> Being pleasant isn't relevant for performance.

If the discussion is just about performance, then that's true.

If we look at things realistically, then there's more to it - like using a tech stack that allows you to iterate reasonably quickly, as opposed to making your developers want to quit their jobs every time they have to debug some obscure Servlet related bug or to work with brittle configuration in XML (been there dozens of times), to the point where not as much could even get built in a given amount of time with a particular stack due to its challenges.

I do hate when people say that additional nodes are way cheaper than developer salaries, but they're also correct most of the time. Of course, there's also the humanitarian take to just not forget about the developer experience, otherwise we'd have written all of our web software in C++ even back then. It'd work really fast, but we'd have way less software in general.


Yet, Ruby hasn't necessarly taken over the enterprise, beyond those stuck with Rails apps.


It's amazing to me that so many people make "stuck with Rails" arguments in the enterprise. It's extraordinarily clear to me, having worked in 3 Fortune 250's, that the single, most-attractive-to-management feature of alternative stacks like Java and Javascript is... dun dun dun!... MASSIVE project bloat! Justifying huge teams and years of development time, leading to huge budgets and personal power within the company.

As a single, full-stack guy, I've out-coded entire teams of Java programmers TWICE using Rails. And none of the projects inside even-a-Fortune-size company come anywhere near concerns about "scaling" like we're discussing here.

So my takeaway after decades of doing full-stack development (also with PHP and .NET) is that Rails absolutely murders every other stack for time-to-market or MVP or whatever time-based metric you want to us, and has no effective liability in performance. The only places were are even discussing this kind of scalability is on some of the highest-trafficked web sites in the world, and even then I'd bet real money that the team size and time to develop features are still killing it over other stacks that would "scale" better.


So where are those two projects now for all of us to amaze ourselves with the power of Rails?


Locked under company copyright, and not shareable, of course.


Pity, the world is prived from such wonder.


Well, something tells me that someone "mostly busy with Java, C#, and JS/TS" is going to be a tough sell anyway.


C# in 2006 was a joke, probably worse than Rails in performance. This was the webforms era and old EF - meant for enterprise customers with a couple of hundred active users max... ASP.NET being a competitive/performant framework is a very recent development (since core basically which became usable past 2.0)

Haskell, OCaml and D are niche languages, probably aren't mature enough now to use for a production system that needs to scale (in terms of org growth and building complex systems).

Java web frameworks were also terrible in 2006 (this is the Java era that gave Java it's reputation) and the only thing worse for productivity I can think of is C++ hahaha ...


A joke is this comment.

All of them were faster and used less resources than a very slow interpreted language, by having JIT and AOT compilers, state of the art GC and great IDE offerings, even the niche ones had better tooling (Leksah and Merlin, versus nothing).


Nobody cares about performance if you build a business application with a couple of users, a common use-case in 2005. The reason a lot of Java people jumped on the Rails bandwagon, was that an application that would take a month to build in Java with Spring/Hibernate, would take a day in Rails. See also: https://www.oreilly.com/library/view/beyond-java/0596100949/


Some Java people did, there is a reason why Ruby is hardly used outside Rails, while Java rules most of the backend workloads, a mobile OS, and plenty of embedded workloads.


There's also a reason Kotlin has become the language of choice for the Android development industry, Scala became a thing, and ThoughtWorks recommended against using JavaServerFaces.


Because Android team had some Kotlin shills that pushed for it with management blessing, and they are in bed with JetBrains for the Android IDE, that is why, and even them had to accept updating Java support, otherwise Android/Kotlin would lose the ecosystem of Java written libraries, hence Java 11 LTS last year, and Java 17 LTS this year going, back to Android 12 with APEX archives.

Scala became a thing indeed, where it is now besides Spark?

ThoughWorks is a consultancy that recomends whatever brings new projects.


So where are the Shopify's of that era build on Struts and JSF?


Amazon.


Amazon was founded in 1994, that's not the same era.

Ruby didn't even exist back then and was released a year later. Rails was released in 2004. Shopify was founded in in 2006, 12 years after Amazon.


Yet another reason that proves the point of Ruby not being something to be worthwhile using when performance matters.


No they weren't - ASP.NET webforms and old EF was such a pile of shit it didn't matter how fast C# was (and back then it really wasn't, granted order of magnitude better than ruby/python, but way behind JVM). The applications built with it were dog slow and buggy - they couldn't even scale in enterprise setting.

Haskell, OCaml, D with great IDE support in 2006 ? Do they have that even today ?

I mean you're suggesting people use C++ for writing web apps (and c++98/03 no less !) - that's got to be facetious.

The real contender back then was PHP and Java, RoR really addressed a lot of issues from both. They both adopted the improvements brought by it since, but it took years.


Stackoverflow and plenty of Microsoft shops are enterprise enough.

> Haskell, OCaml, D with great IDE support in 2006 ? Do they have that even today ?

I mentioned Lekshat and Merlin for a reason, way better than Ruby with TextMate and Sublime.

Yes plenty of people were using C++ for Web applications in 2000 - 2006, via Apache, ngix and IIS plugins. Microsoft had ATLServer, Borland/Embarcadero still ship their webserver to this day.

I can assert that plenty of Nokia Networks WebUIs, were powered by C++/CORBA and Perl back in 2006. Transition to Java started in 2005.

As did several CRM systems, like the original Altitude Software application server.

RoR is for people that don't care about performance to start with.


That was then and this is now. If you are building under endless VC money go ahead burn it. Most of us however do not have endless stacks of money to burn runing our code.


If you think time-to-market, and overall cost are going to be improved by building your vanilla website in Rust or Go, vs Rails, then I think you may be surprised


Yeah it’s nearly the main benefit of Ruby/Rails that you can spin up an mvp of your company in like a week, and have a decent feature set within a couple months.

The trap is when you start growing and it is hard to change. Because the features that took 1-2 months in RoR might take 3-4 months (or more!) to port to another language, and do you really want to stop your working business when it isn’t a problem?

Because Rails performs totally fine at small-mid startup scale. It’s only when you start getting a couple years old with lots of users that it starts to bite you. But at that point you already have gotten further than 90% of startups ever even make it. And at that point, honestly there are solutions for that too, like gradually pulling the poor-performing bits out into faster languages.

Writing this as someone who works for a startup that uses RoR, and I’ve seen it blow up over several years. I curse RoR daily because it pisses me off, but I don’t think this company would’ve gotten this far if it didn’t have the RoR speed at the beginning.

So are you better off starting your company on Go/Rust/Java? Maybe. But if getting to market fast will help you win, it’s hard to beat RoR.


It is not 2006 anymore. Rails was a trailblazer. The productivity difference is not as large as it was compared to alternatives.


There are not comparably productive alternatives for rust right now :(


Rust isn’t really a real language for web development though. Just in the bubble here.


> Most of us however do not have endless stacks of money to burn runing our code.

This is such an absurd take. Do you really think startups lose runway because of the runtime performance of their code, and not failing to achieve PMF, overhiring, or spending too much on stupid techbro bullshit?


These two points are entirely unrelated. Scaleability in that meme is not considering horizontal scalability, which approaches infinity for literally any language/framework. It only makes sense in the context of vertical scalability, and gross req/sec offers no insight into whether or not that's true.


I can't see how vertical scalability even matters for rails You can just start more instances. It's not like two active web requests need to interact with each other.


Rails scales fine. People use that as an escape hatch when they don't know how to write performant code.


You don't even have to write performant code. You can just start as many instances of the rails app as you want. The bottleneck is usually the database.


Each request gets its own server! Wait isn’t that lambda?


Rails has always been ahead of the curve.


That's not what he said.


Are there companies in the last 7 years that have started with rails and have become big like spotify?


Depends on your definition of big.

Partly, you need mobile now, so any Rails stuff is likely to be back-end and hidden. Plus big investors like to go for the exciting stuff.

But if you're looking at companies aren't household names (taking smaller amounts of investment), there are lots out there.

Syft (recruitment) were founded in 2016 and have revenues of over $100m per year - although that's partly due to acquisition by a larger competitor, so when I just looked, separating their valuation from the group wasn't immediately obvious.

I've freelanced and contracted across a few niche industries (construction, print, airport signage management!) where I was building something against competitor software that I discovered was at least partially built with Rails. Those big players in each niche would have revenue in the tens of millions and from what I could see, very small technical teams.

But anyone outside those industries would never have heard of these companies.


(I assume you mean Shopify not Spotify)

No exactly a fair question because Shopify is much older and is valued at 70B. For a company to have done it in half the time would have been impressive regardless of tech whereas on average it takes 7 years to become a unicorn.

I do know that Aircall is relatively young, on a good trajectory and runs Rails.


People in this thread don't know the difference between performance and scale.


You need the number of servers and server specs to answer that question.

Even if a piece of software could only handle 1 request per second you could handle 1.27M requests if you just run 1.27M servers.


What is the hardware spec here?


15% faster is great. But at what cost?

> Since Ruby 3.3.0-preview2 YJIT generates more code than Ruby 3.2.2 YJIT, this can result in YJIT having a higher memory overlead. We put a lot of effort into making metadata more space-efficient, but it still uses more memory than Ruby 3.2.2 YJIT.

I'm hoping/assuming the increased memory usage is trivial compared to the cpu-efficiency gains, but it would be nice to see some memory-overhead numbers as part of this analysis.


This is a particularly valid concern given ruby+rails seems quite memory inefficient to begin with. I've sometimes had smallish apps on 500mb heroku dynos crashing due to memory slowly climbing and eventually slowing things down as the dyno uses swap, and eventually 500mb of swap. IME ruby+rails doesn't seem to free up memory after it uses it, and that causes problems as the hours go by until the pod/dyno crashes or is restarted.


Ruby processes don't return the memory to the system,they reuse memory already allocated. This is for efficiency - allocating and freeing system memory isn't free. Even if it did, your peak memory usage would be the same. It doesn't allocate memory it doesn't need.

If your memory usage doesn't plateau you have a memory leak which would be caused by a bug in your code or a dependency.

But 500 to 1gb of memory required for a production rails app isn't unusual. Heroku knows this, which explains their bonkers pricing for 2gb of memory. They know where to stick the knife.


> Ruby processes don't return the memory to the system

That is not correct. Ruby do unmap pages when it has too many free pages, and it obviously call `free` on memory it allocated once it doesn't use it.

What happens sometime though is that because of fragmentation you have many free slots but no free whole pages. That is one of the reason why GC compaction was implemented, but it's not enabled by default.

But in most case I've seen, the memory bloat of Ruby applications was caused by glibc malloc, and the solution was either to set MALLOC_ARENA_MAX or to switch to jemalloc.


I'm correct in practice. There are scenarios where ruby might free memory, but ruby is mostly used for rails, and you won't ever see that under a standard rails workload. It will plataue and stay there until a restart. When people see this they think it's a "bug" or a "leak" but it isn't.

On the last fairly large rails app I tried to use jemalloc on there was no change in memory usage. I believe that advice is a bit outdated. Also note using jemalloc doesn't cause memory to be freed to the system. It reduces fragmentation, at the cost of cpu cycles. There's no free lunch.


> It will plataue

Yes, because extra empty pages are released at the end of major GC, which is occasional, and most web application will cyclicaly use enough memory that they will stabilize / plateau at one point.

> I believe that advice is a bit outdated.

It absolutely isn't, your anecdote doesn't mean much compared to the countless reports you can find out there.

> Also note using jemalloc doesn't cause memory to be freed to the system.

Yes it does, it has a decay mecanism, most allocators do. https://jemalloc.net/jemalloc.3.html

> It reduces fragmentation

Yes, and that allows it to have more free pages that it can release.

> at the cost of cpu cycles

Compared to glibc, not so much.


That kind of thinking is a bit flawed unfortunately. You might hit your peak for 20 minutes a day but you’ve provisioned your system for that temporary worst case for the entire day and other services are paying that penalty. If it’s the only thing you’re running, maybe. But in practice there are other things you want to run on the machine to improve utilization rate (since services are not all hitting their peak simultaneous generally)

That’s why good modern allocators like mimalloc and tcmalloc return memory when they notice it’s going unused, so that other services running on the machine can access resources. And this is in c++ land where things are even more perf sensitive.


Theoretically virtual memory and swap solve this problem really well. The OS is free to write the unused pages to disc to let other programs use the real memory.


Swap is horribly expensive and most hyperscalars run their servers without swap and set per-process memory limits, automatically killing workloads that go above their threshold..


swap is only expensive if you are using the swapped out memory. if you are in a case where a program is just holding on to pages it isn't using, swap is basically free. for most users, turning off swap is just losing performance since the OS can always use all of your RAM to cache disk access.


Swap is expensive compared to releasing unused memory back to the OS. The reason is that you spend memory and disk bandwidth writing “unused” data to disk. And that data could very well be unused RAM just sitting around in a memory allocator, which is effectively useless memory that you’re swapping because the allocator didn’t release it.

Zswap is always performance increasing. Swap to disk can be performance degrading (good implementations generally are not unless your working set is larger than your memory and you’re in thrashing) and certainly expensive $$ wise in that it wears out your SSD faster.

You seem to be thinking I’m arguing in absolute terms where all I’m saying is that swapping is a more expensive technique to try to reclaim that unused RAM vs the memory allocator doing it. It can be a useful low-effort technique, but it’s very coarse and more of a stop gap to recover inefficiencies in the system at a global level. Global inefficiency mitigation is generally not as optimally effective as more localized approaches.

Consider also that before the OS starts swapping, the OS is going to start purging disk caches since those are free for it to reload (executable code backed by file, page caches etc). These are second order system effects that are hard to reason about abstractly when you have a greedy view of “my application is the only one that matters”. This means that before you even hit swap, your large dark matter of dirty memory sitting in your allocator is making your disk accessed slower. And the kernel’s swap doesn’t distinguish working set memory from allocator so you’re hoping inherent temporal and spatial locality patterns interplay well so that you’re not trying to hand out an allocation for a swapped out block too frequently.


What if the other thing you're trying to run runs at the same time that your rails app is using peak memory? You have no choice but to have enough memory for peak load.

But if you really do need to cheap out you can generally configure your app server to kill idle worker processes, or bounce them on a schedule to return memory to the system, and hope.


So that’s generally not very likely. You’re going to have some time of day effects that are shared but true “peak” tends to be service dependent rather than something all your services experience simultaneously from what I’ve seen (YMMV).

Killing “idle” processes is also extremely expensive because you have to restart the process, reload all state, and doing graceful handoff is tricky.

It’s good to have graceful handoff for zero downtime upgrades, but I still say having your allocator return RAM is the cheapest and easiest option and something good modern allocators do for you automatically.


There is no one size fits all memory management technique. There are always tradeoffs. The scenario you are describing is not common for ruby apps. Ruby uses a memory management style that is suitable for most ruby workloads.

All the production quality app servers handle killing and and starting new worker processes gracefully and efficiently by forking a running process. Certainly there is some overhead, but that's why you don't underprovision memory, so you don't need to resort to that.


> If your memory usage doesn't plateau you have a memory leak which would be caused by a bug in your code or a dependency.

Extremely bold claim for a framework the size of ruby on rails. I would trot out my own evidence but the receipts are lost with time.

Also—why isn't the allocation behavior tweakable at runtime? Seems pretty trivial with no downsides. It's not difficult to think of a scenario where a non-monotonically-increasing-heap-size is desirable.


This person is incorrect, but even if they were correct, that wouldn't be a framework thing.

Memory management is handled by the language.


Many types of memory leaks are simply because you're holding on to data you don't need to hold onto anymore. Languages cannot prevent this, at least not that I've seen.


Sure, but the person I responded to was suggesting that Rails was deliberately holding onto memory to re-use it.

That's absolutely not something Rails does, but it is something that some managed languages and some (most?) allocators do.


I’ve observed same, and every time I switched to jemalloc and the issue was fixed.


Was it difficult to switch? What were the downsides / tradeoffs? (I read about jemalloc recently but don't know enough about it to confidently pursue it, but may try it on a small app if it's straight forward).


Super easy and have not had an issue with it in over 10 years of using it. There is an example here on how to do it with docker image. https://mailsnag.com/blog/optimized-ruby-dockerfile/.


Going to try this right now! Will report back.

OOC, why isn't this part a ruby default? Isn't it always better to be more memory efficient. (I'm trying to understand what the trade offs are, if any)

EDIT: well, exactly 6 minutes later, I'm done. I followed these instructions: https://elements.heroku.com/buildpacks/gaffneyc/heroku-build...

The app seems to work like usual, I'll just have to wait and see what happens to memory use.

I will reply here in 12 hours with a screen shot showing the results (before/after memory use), whatever they may be.

Also, for reference, here's the metrics for the past 24 hours (LOTS of memory problems): https://imgur.com/a/M8IHd5z


For anyone interested, here's the result: https://imgur.com/a/c62gjKQ (the red vertical line is the point from which jemalloc was used).

It looks like memory usage did indeed go down, and critical errors fell by about 84%.


For completeness, here are the metrics a full 24 hours after the change: https://imgur.com/a/lbdzFvN


Yeah, I had the same reaction last week. It's not a Ruby default because some versions of Linux can't use it. ¯\_(ツ)_/¯


Have you compared it against newer allocators like mimalloc or the rewritten tcmalloc (not the one in gperftools)? Jemalloc is a bit long in the tooth now.


Time spent profiling and optimizing inherently inefficient technologies is an undervalued factor when deciding what stack to use.


Am I really going to have to get out the premature optimization quote?

Most businesses fail. Those that don't fail, usually don't have interesting scaling issues. (You can go a really long way on a boring monolith stack.)

So in most cases, whatever gets things out into the world and able to see if the business can be validated makes sense, and then you optimize later. A nonscalable stack that you can iterate on 50% faster is more likely to produce a viable company than a more scalable stack that's slower to work with.

If you're a hired employee, it's easy to forget that the place you're working for is already a big exception just by the virtue of it grew large enough to hire you.


This hints at a false dichotomy. One that especially Ruby and Rails keep afloat.

Productivity and Scalability(in performance sense) aren't opposites.

Take Bash. Performs bad and is a guarantee for terrible productivity in a large category of software. But perfect for a niche. Take Java. Performs better than many, and allows for good productivity (if you avoid the enterprise architectures, but that goes for any language). Or take Rust. Productivity much higher than most C/C++ and in my case higher than with Ruby/Rails, and also much more performant.


It's a false dichotomy in theory. It's mostly not in practice. And that was far truer in 2006 when Shopify got started. Then there really weren't any modern web frameworks in performant languages.

Primarily it's not the language that makes people more or less productive, though it does have some influence. It's mostly the frameworks in those languages. And traditionally the most modern / full-featured web frameworks haven't been in systems languages. The major counterexample at the moment (while still obviously not a systems language) is that modern JS VMs are actually really fast, so while I don't love JS, it does hit that sweet spot at the moment of performance and mature frameworks.

Also, I've never worked in Rust, but am mostly a systems programmer, and while I understand that Rust is supposed to be easier than C or C++, I'm skeptical that it's as easy to work with as higher level languages, or that you could throw most web developers into Rust without some serious additional learning.


> serious learning.

That's another problem I have in this narrative. Productivity isn't measured by throwing an inexperienced developer at something and then looking how fast they get stuff done. That's learnability.

I'm an experienced Rails developer (some 15 years in) and my productivity has plataued for years now. I've been doing Java and Rust work for years too now. Web and application dev. It took years, but my productivity in both Java and Rust, on anything that lives longer than 6months, has vastly surpassed that of my Rails.

Productivity of a senior, or experienced dev, of a (large) team, of a team with high turnover, of a project over decades, all that is productivity too. And in all those categories, Rails isn't great.


We're talking past each other because we're arguing different things. If I understand you, you're saying that you can avoid technical debt by using tools that are intrinsically more performant, and that skilled developers are more productive with more advanced tooling.

That's all correct.

But the point I'm making is that if an MVP isn't accruing technical debt, it's over-engineered. Most of them will be thrown away, or rescoped, and so taking on technical debt is an advantageous strategy: you only have to pay the technical debt on the few survivors.

Shopify at its offset was a CRUD app (fun fact: it started as a snowboarding shop), and in 2006, Rails was a great choice for that.

Your notions are fine for an established company building a piece of infrastructure they're certain they'll need. But that's not what Shopify was, and it's not the spot most startups picking a framework are at.

Your thing about developer quality is kind of meh. Building the first versions of a shopping platform isn't rocket surgery. You don't need Anthony Bourdain to make a sandwich. Particularly if you're not sure anybody wants a sandwich.


I agree entirely about that tech debt. I've built, sold, grew and failed several startups myself. Many with Rails at the center. I've been building with Rails for way over a decade, so I'm well aware of the "options" back in the day.

What I'm arguing, however, isn't to take on debt¹, I'm saying that productivity and performance aren't always opposites.

Sure, Rails trades in performance for productivity. But, I've learned, this is mostly just Rails. There are many languages and frameworks that are just as productive (for a certain definition, see previous comment) as Rails, but also performant. And I'm arguing that performance affects productivity: a performant, scalable software is easier to work on, because it gives faster feedback (tests, ci, manual testing), wastes less time waiting for stuff (hundreds of single seconds add up over days and weeks weeks), and decreases friction (I'll postpone running the full test suite if I know it'll hog my machine for the next half hour. I'll gladly run it, if it takes a few minutes or less).

Edit: and, if what you say about tech debt is true (I think it is), wouldn't Shopify be at a position now to pay it back? Many startups that used Rails paid it back by migrating elsewhere. So maybe Rails in its entirety is Tech Debt?

¹A cautionary sidenote, that I've learned the hard way, is that taking on tech debt is an art in itself. Not all debt is alike. Many kinds will cripple my project. Where at the unlikely moment that I do need to scale, that's impossible. Or when I do need to pivot for the umpteenth time, we cannot, without that Giant Refactoring.


> Productivity and Scalability(in performance sense) aren't opposites.

They often clash with each other. Rust for example is a lot less pleasant to debug than interpreted languages and that is a loss of productivity.


Not in my case. Rust, for me, is much better for productivity than my other major languages Ruby and JavaScript. The main reason is type enforcement, which is why -for me- typescript is much more productive than JavaScript. A large category of bugs simply won't exist (are caught at compiletime). With Ruby, I'd have to write hundreds of edge-case unit-tests just to cover stuff that, with Rust is enforced compile-time for me.

The other reason is runtime speed. A typical Ruby test-suite takes me minutes to run. A typical Rails test suite tens of minutes. A typical Rust test-suite takes < a minute to compile and seconds to run. I run my tests hundreds of times per day. With a typical Rails project, I'm waiting for tests upwards of an hour per day (yes, I know guard, fancy runners with pattern matching etc).

The last reason, for me, is editor/IDE integration: Rust (and TS) type system make discovery, autocomplete and even co-pilot so much more useful that my productivity tanks the moment I'm "forced" to use my IDE with only solargraph to help.

And debugging: sure! I've had reasonable success with gdb and ruby debuggers in the past. Rust's gdb isn't much better. But stepping through a stack in a rails project is a nightmare: the stack is often so ridiculous deep (but it does show how elegant and neat it's all composed!) that it's all noise and no signal. Leaving a binding.pry or even `throw "why don't we get here?!"` also works, but to call that "productive" debugging is a stretch, IMO.


I like strong typing as well, and worked with a strongly typed language for years before Ruby.

Then I did Ruby+Rails fulltime for 9 years. Just recently moved on.

    With Ruby, I'd have to write hundreds of 
    edge-case unit-tests just to cover stuff that, 
    with Rust is enforced compile-time for me.
Never a problem for me.

It was one of my major concerns about Ruby, prior to starting out. But like... it just wasn't a problem.

It turns out that we just don't pass the wrong kind of thing to the other thing very often, or at least I and my teams did not. It certainly helps if you follow some sane programming practices. Using well-named keyword arguments and identifiers, for example.

    # bad. wtf is input?
    def grant_admin_privileges(input)        
    end

    # you would have to be a psychopath to pass this
    # anything but a User object
    def grant_admin_privileges(user:)
    end
   
Of course, this can be a major problem if you're dealing with unfamiliar or poorly written code. In which case, yeah, that sucks. I know that many will scoff at the old-timey practice of "use good names" in lieu of actual language-level typing enforcement, and that "just use a little discipline!" has long been the excuse of people defending bad languages and tools. But a little discipline in Ruby goes such a long way, moreso than in any language I have ever used.

    With Ruby, I'd have to write hundreds of edge-case unit-tests 
    just to cover stuff that, with Rust is enforced compile-time for me.
Well, you do need test coverage with Ruby. But you do anyway in any language for "real" work, soooooo.

I strongly dispute that you need extra tests for "edge cases" because of dynamic typing. Something is deeply wrong if we are coding defensive methods that handle lots of different types of inputs and do lots of duck typing checks or etc. to defend themselves against type-related edge cases.

     (yes, I know guard, fancy runners with pattern matching etc).
Yeaaaaaah. Rails tests hit the database by default, which is good and bad, but it is inarguably slowwww. I don't find pure Ruby code to be slow to test.

     The last reason, for me, is editor/IDE integration
Yes. I still miss feeling like some kind of god-level being with C#, Visual Studio, and Resharper. I liked the Ruby REPL which offset that largely in terms of coding productivity but was certainly not a direct replacement.

    But stepping through a stack in a rails project is a nightmare
Yeah. I always wanted a version of the pry 'next' method that was basically like, "step to the next line of code but skip all framework and Ruby core code"


I dare you to have a look at your rollbar, sentry or other exception logging of a rails project. And I'll put money on it, that the top 5 exceptions has several 'undefined method x' (probably on nil) errors.

Those warrant unit tests. Those will regress. Those would never exist in a strongly typed language (though Java still has null...ugh)


Yeah that's the usual argument and I don't agree.

It's true that 99.9% of production log errors are NoMethodError exceptions.

annnnnd 99.9% of those NoMethodErrors are just code not handling nils/nulls correctly

annnnnd 99.9% of those unhandled runtime nils/nulls are from external data (user inputs, database data, etc)

So strong typing doesn't help you there at runtime, it just blows up differently.


> So strong typing doesn't help you there at runtime, it just blows up differently.

It really does, though. Not with the Java-type of strong typing (still allows null) but with the Rust type of strong typing. Simply because it moves all this to the edge. At the point where you read the CSV/database/HTTP-response/user-input.

Everything inside of this edge (a strong boundary) doesn't need to to deal with "can this be nil" because it can't. Your `the_outlier(items: Vec<Measurement>)` will simply not compile if the type-checker sees that `items` can be nil, `items` can contain a nil, or, internal to that function an items[].measured_at might be nil, or maybe items[].measured_at is a Date instead of a DateTime.

You don't need a bazillion tests to deal with this situation around `the_outlier()`. That doesn't mean that the part that reads a Vec<Measurement> from a CSV (or json, or database or whatever) is covered by this typechecker. But it means this layer, the edge, the boundary, is where you put the protection. Validation, whatnot.


> # you would have to be a psychopath to pass this > # anything but a User object > def grant_admin_privileges(user:)

I had an app once where we used user objects, and later switched to ids to save db calls. Now you have some functions that can accept both, and some that accept one of them, and without type hints (that was long time ago) you can easily make a mistake.


That sounds like some malpractice.

Ruby has had keyword arguments since Ruby 2.0 from 2008 and earlier code could certainly use option hashes.

So I don't see a reason for any confusion there.

    # probably malpractice in a system where many methods
    # take IDs and some take Users
    def do_something_with_user(user)
    end

    # how hard is this? trivially easy and unambiguous.
    def do_something_with_user(user_id:)
    end
In the second example, you would really have to be asleep at the wheel to make a mistake like:

    # this is obviously wrong
    do_something_with_user(user_id: User.first)
I don't see the problem. It would be nice if a compiler/IDE could catch that, but on the other hand, it just looks blatantly wrong as you type it and will certainly blow up the first time you call it.


They used this stack because it was productive for them.

They don't need to do any of this. The product is fast enough. They make money. It's purely to fatten the bottom line.


TruffleRuby

What's the current state of Shopify running TruffleRuby, given the tragic loss of Chris Seaton?


Is TruffleRuby compitable with Rails? If so, I wonder how much TruffleRuby would improve the performance and memory footprint.

Especially with native images, I wonder how that would turn out.


> Is TruffleRuby compitable with Rails?

Rails proper, yes. Small rails app are generally drop-in compatible, but sizeable applications are likely to run in a few compatibility issues as most gems aren't tested against TruffleRuby.

> I wonder how much TruffleRuby would improve the performance and memory footprint.

The generally speaking Truffle is much faster at "peak" performance, but take very long to get there which makes it challenging to deploy.

It also uses way more memory, but it's partially offset by the fact that it doesn't have a GVL, so you get parallel execution with threads.


Thank you for the informative reply.

Ruby atm is working towards implementing true parallell execution with Ractors for example, and now with YJIT, the performance might increase some more.



PHP went through some crazy performance improvements from PHP 5.6 to 7.0, in some cases running twice as fast.

It's good to see Ruby doing the same. There is something neat about the same code running faster, solely by being on an upgraded platform.


Yep. And PHP is possibly getting an optimizing compiler:

https://www.reddit.com/r/PHP/comments/16hu7dq/php_is_getting...


I'm probably misinterpreting the numbers, but it sounds like the 3.3 interpreter also got some significant performance improvements - if 3.3 YJIT got a 13% speedup compared to 3.2 YJIT and a 15% speedup compared to 3.3 interpreter, that sounds like the 3.2 YJIT has only slightly better performance than the 3.3 interpreter. Is that interpretation correct? If so, what were the improvements in the 3.3 interpreter, or was 3.2 YJIT just not much of a speedup?


> Overall YJIT is 61.1% faster than interpreted CRuby! > On Railsbench specifically, YJIT is 68.7% faster than CRuby!

https://speed.yjit.org/

For 3.2 there also was an improvement of the interpreter:

> We now speed up railsbench by about 38% over the interpreter, but this is on top of the Ruby 3.2 interpreter, which is already faster than the interpreter from Ruby 3.1. According to the numbers gathered by Takashi, the cumulative improvement makes YJIT 57% faster than the Ruby 3.1.3 interpreter.

https://shopify.engineering/ruby-yjit-is-production-ready


That is exactly my question as well. Why would I want YJIT if it is only 15% faster than normal Ruby? Given the memory overhead.


The 15% is for the total request time including waiting for blocked IO.

> All that work allowed us to speedup our storefront total web request time by 10% on average, which is including all the time the web server is blocked on IO, for example, waiting for data from the DB, which YJIT obviously can't make any faster.

https://twitter.com/paracycle/status/1605706245955997697


Not to be pessimistic, but does this matter? Rails apps take 2-3x more resources to run than most other language stacks, including other dynamic languages, (including Perl!).


15% faster - and how much faster would Java, Rust, or Python be?


That’s not easy to answer, because it’s not quite an apples to apples comparison if you start factoring in libraries, frameworks and the specific workload.

My rule of thumbs:

Python has similar performance characteristics as Ruby.

With Java/C#/Go you’d expect about an order of magnitude of improvement.

With naive Rust/C++ you would likely be at the same average speed as Java for web applications but with less memory usage. Well until you make an effort to produce faster code.


Node.js is about 20x faster than RoR for instance


At what? At running calculations? How do you compare a runtime with a framework? I'm genuinely curious...


Next should be github then, I hope




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: