I had to deal with a lot of FFI to enable a Java Constraint Solver (Timefold) to call functions defined in CPython. In my experience, most of the performance problems from FFI come from using proxies to communicate between the host and foreign language.
A direct FFI call using JNI or the new foreign interface is fast, and has roughly the same speed as calling a Java method directly. Alas, the CPython and Java garbage collectors do not play nice, and require black magic in order to keep them in sync.
On the other hand, using proxies (such as in JPype or GraalPy) cause a significant performance overhead, since the parameters and return values need to be converted, and might cause additional FFI calls (in the other direction). The fun thing is if you pass a CPython object to Java, Java has a proxy to the CPython object. And if you pass that proxy back to CPython, a proxy to that proxy is created instead of unwrapping it. The result: JPype proxies are 1402% slower than calling CPython directly using FFI, and GraalPy proxies are 453% slower than calling CPython directly using FFI.
What I ultimately end up doing is translating CPython bytecode into Java bytecode, and generating Java data structures corresponding to the CPython classes used. As a result, I got a 100x speedup compared to using proxies. (Side note: if you are thinking about translating/reading CPython bytecode, don't; it is highly unstable, poorly documented, and its VM has several quirks that make it hard to map directly to other bytecodes).
Speaking from zero experience, the FFI stories of both Python and Java to C seems much better. Wouldn't going connecting them via a little C bridge a general solution?
JNI/the new Foreign FFI communicate with CPython via CPython's C API. The primary issue is getting the garbage collectors to work with each other. The Java solver works by repeatedly calling user defined functions when calculating the score. As a result:
- The Java side needs to store opaque Python pointers which may have no references on the CPython side.
- The CPython side need to store generated proxies for some Java objects (the result of constraint collectors, which are basically aggregations of a solution's data).
Solving runs a long time, typically at least a hour (although you can modify how long it runs for). If we don't free memory (by releasing the opaque Python Pointer return values), we will quickly run out of memory after a couple of minutes. The only way to free memory on the Java side is to close the arena holding the opaque Python pointer. However, when that arena is closed, its memory is zeroed out to prevent use-after-free. As a result, if CPython haven't garbage collected that pointer yet, it will cause a segmentation fault on the next CPython garbage collection cycle.
JPype (a CPython -> Java bridge) does dark magic to link the JVM's and CPython's garbage collector, but has performance issues when calling a CPython function inside a Java function, since its proxies have to do a lot of work. Even GraalPy, where Python is ran inside a JVM, has performance issues when Python calls Java code which calls Python code.
IPC methods were actually used when constructing the foreign API prototype, since if you do not use JPype, the JVM must be launched in its own process. The IPC methods were used on the API level, with the JVM starting its own CPython interpreter, with CPython and Java using `cloudpickle` to send each other functions/objects.
Using IPC for all internal calls would probably take significant overhead; the user functions are typically small (think `lambda shift: shift.date in employee.unavailable_dates` or `lambda lesson: lesson.teacher`). Depending on how many constraints you have and how complicated your domain model is, there could be potentially hundreds of context switches for a single score calculation. It might be worth prototyping though.
Go code and C code have to agree on how resources like address space, signal handlers, and thread TLS slots are to be shared — and when I say agree, I actually mean Go has to work around the C code's assumption. C code that can assume it always runs on one thread, or blithely be unprepared to work in a multi threaded environment at all.
C doesn't know anything about Go's calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.
It doesn't matter which language you’re writing bindings or wrapping C code with; Python, Java with JNI, some language using libFFI, or Go via cgo; it is C's world, you're just living in it.
Between Rails At Scale and byroot's blogs, it's currently a fantastic time to be interested in in-depth discussions around Ruby internals and performance! And with all the recent improvements in Ruby and Rails, it's a great time to be a Rubyist in general!
Is it? To me it seems like Ruby is declining [1]. It's still popular for a specific niche of applications, but to me it seems like it's well past its days of glory. Recent improvements are nice, but is a JIT really that exciting technologically in 2025?
Ruby will probably never again be the most popular language in the world, and it doesn't need to be for the people who enjoy it to be excited about the recent improvements in performance, documentation, tooling, ecosystem, and community.
I think ruby can get popular again with the sort of contrarian things Rails is doing like helping developers exit Cloud.
There isn’t really a much more productive web dev setup than Rails + your favorite LLM tool. Will take time to earn Gen Z back to Rails though and away from Python/TS or Go/Rust.
My impression is that a Rails app is an unmaintainable dynamically-typed ball of mud that might give you the fast upfront development to get to a market or get funded but will quickly fall apart at scale, e.g. Twitter fail whale. And Ruby is too full of "magic" that quickly makes it too hard to tell what's going on or accidentally make something grossly inefficient if you don't understand the magic, which defeats the point of the convenience. Is this perception outdated, and if so what changed?
If the the Twitter fail whale is your concern, then your perception is outdated. Twitter started moving off Ruby in 2009. Both the CRuby VM and Rails have seen extensive development during that decade and a half.
I never worked at Twitter, but based on the timeline it seems very likely they were running on the old Ruby 1.8.x line, which was a pure AST interpreter. The VM is now a bytecode interpreter that has been optimized over the intervening years. The GC is considerably more robust. There's a very fast JIT compiler included. Many libraries have been optimized and bugs squashed.
If your concern is Rails, please note that also has seen ongoing development and is more performant, more robust, and I'd say better architected. I'm not even sure it was thread-safe when Twitter was running on it.
You don't have to like Ruby or Rails, but you're really working off old data. I'm sure there's a breaking point in there somewhere, but I very much doubt most apps will hit in before going bust.
The CRuby VM, or the CRuby interpreter alone is at least 2-3x faster since Fail Whale time. And JIT doubles that to 4 - 6x. Rails itself also gotten 1.5x to 2x faster.
And then you have CPU that is 20 - 30x faster compared to 2009. SSD that is 100x - 1000x faster, Database that is much more battle tested and far easier to scale.
Sometimes I wonder, may be we could remake twitter with Rails again to see how well it goes.
My issue with Ruby (and Rails) has always been the "ball of mud" problem that I feel originates from its extensive use of syntactical sugar and automagic.
Rails can become a ball of mud as much as any other framework can.
It's not the fastest language, but it's faster than a lot of dynamic languages. Other than the lack of native types, you can manage pretty large rails apps easily. Chime, Stripe, and Shopify all use RoR and they all have very complex, high-scale financial systems.
The strength of your tool is limited to the person who uses the tool.
Python? Ruby with YJIT, JRuby or Truffle Ruby usually beats python code in benchmarks.
I haven’t seen a direct comparisons but I wouldn’t be surprised if Truffle Ruby was already faster than either elixir, erlang or php for single threaded CPU bound tasks too.
Of course that’s still way behind other languages but it’s still surprisingly good.
In my work I’ve seen that TruffleRuby codebases merging Ruby and Java libraries can easily keep pace with Go in terms of requests per second. Of course, the JVM uses more memory to do it. I mostly write Go code these days but Ruby is not necessarily slow. And it’s delightful to code in.
> Python? Ruby with YJIT, JRuby or Truffle Ruby usually beats python code in benchmarks.
Isn't that moving the goal post a lot?
We wen't from 'faster than a lot of others' to 'competing for worst in class'.
I'm not trying to be facetious, I'm curious as I often read "X is really fast" where X is a functional/OOP language that nearly always ends up being some combination of slow and with huge memory overhead. Even then, most Schemes (or Lisps in general) are faster.
Being faster single threaded against runtimes that are built specifically for multithreaded, distributed workloads is also perhaps not a fair comparison, esp. when both runtimes are heavily used to write webservers. And again, Erlang (et al) come out faster even in those benchmarks.
Is TruffleRuby production (eg. Rails) ready? If so, is it that much faster?
I remember when the infamous "Truffle beats all Ruby implementations"-article came out that a lot of Rubyists were shooting it down, however this was several years ago by now.
Moving the goal posts? Perhaps I misunderstand what you are asking.
Python is the not the worst in class scripting language. For example perl and TCL are both slower than python.
Originally you just asked, "such as" [which dynamic language ruby is faster than?]
Implying ruby is slower than every other dynamic language, which is not the case.
JRuby is faster than MRI Ruby for some Rails workloads and very much production ready.
Truffle Ruby is said to be about 97% compatible with MRI on the rubyspec but IMHO isn't production ready for Rails yet. It does work well enough for many stand alone non-rails tasks though and could potentially be used for running Sidekiq jobs.
The reason to mention the alternative ruby runtimes is to show that there's nothing about the language that means it can't improve in performance (within limits).
Whilst it's true that ruby is slower than Common Lisp or Scheme, ruby is still improving and the gap is going to greatly reduce, which is good news for those of us that enjoy using it.
Thank you for a great answer; I did not mean any ill will and apologize if that was how it came across.
Perl, Tcl, Smalltalk etc are basically non-existant from where I'm from, so they didn't occur to me.
Perhaps I'm projecting a lot here. I have worked a lot in high performance systems and am often triggered by claims of performance, eg. 'X is faster than C' when this is 99.9% of the times false by two orders of magnitude. This didn't happen here.
Java's Hotspot was originally designed for Smalltalk, and SELF.
Two very dynamic systems, designed for being a complete graphical workstation, Perl, Tcl, Python, Ruby were as originially implemented, not even close of the original Smalltalk JIT paper from Peter Deutsch's paper"Efficient Implementation of the Smalltalk-80 System." in 1984!
the ruby is faster than c is because of the yjit. they are moving a lot of c ruby standard library and core language stuff into ruby code so the yjit can optimize it better. akin to java and their bytecode being able to optimize things on the fly instead of just once at compile time.
No one uses Ruby because it is fast. They use it because it is an ergonomic language with a battle-tested package for every webserver based activity you can code up.
Crystal is an ergonomic language, too, looking a lot like Ruby even beyond a cursory glance. What Ruby has, like any longstanding language, is a large number of packages to help development along, so languages like Crystal have to do a lot of catching up. Looking at the large number of abandoned gems though, I'm not sure it's that big a difference, the most important ones could be targeted.
I'm not sure that has any relevance when compared with Python or JS or Go though, they seem to have thriving ecosystems too - is Rails really that much better than the alternatives? I wouldn't know but I highly doubt it.
I am still hoping once Crystal stabilise on Windows ( Currently it still feels very much beta ). They could work on making compiling speed faster and incremental compiling.
> is Rails really that much better than the alternatives?
I really think so. I've _looked_. I've tried all sorts of other web frameworks. And, admittedly, I am most familiar with Rails, so I'm maybe a bit biased. But it's hard to find anything that comes particularly close to the productivity of using Rails. The tooling's great, the ecosystem is great, it's organized well, the documentation is good. It's just... really a pleasant experience to use.
Elixir's Phoenix comes pretty close, as does PHP's Laravel, imo. Special shout out for Rust's Loco, too, which is relatively new, but looking potentially promising.
I recommend giving Rails an open-minded tire kicking. I think you'll be surprised by how quickly you can get going.
I've used Rails (I've possibly committed to it, though I've forgotten if I have), my point is that I don't know those other languages' frameworks well enough to judge the difference, but I don't see any complaints.
You even seem to admit as much while being most familiar with Rails. Do you know anyone who'd love to switch over? Or would you choose it ahead of a competitor if you were green? There'd have to be a large competitive advantage.
I’d jump ship if there was a mature, stable competitor in the Typescript ecosystem.
Unfortunately I think language differences mean it’s going to be a long time before anyone catches up. Ruby just makes for some really interesting wizardry that as far as I can tell isn’t possible (or perhaps not as ergonomic?) in Typescript.
Furthermore there seems to be a cultural difference. I haven’t met many JS devs who came to the Ruby side and were like, “Aw shit this is better.” (I’m one such dev, but I hated Ruby and Rails for a really long time before I changed my opinion and embraced it.)
But at this point in my career I value stable boring technology way more than my personal taste du-jour so I code in Ruby and really love Rails.
Unfortunately it is, because too many folks still reach out to pure interpreted languages for full blown applications, instead of plain OS and application scripting tasks.
There was a little drama that played out as Java was getting a proper JIT.
In one major release, there was a bunch of Java code responsible for handling some UI element activities. It was found to be a bottleneck, and rewritten in C code for the next major release.
Then the JIT became properly useful, and the FFI overhead was more than the difference between the hand-tuned C code and what the JIT would spit out on its own. So in the next major release, they rolled back to the all-Java implementation.
Java had a fairly reasonably fast FFI for that generation of programming language, but they swapped for a better one a few releases after that. And by then I wasn't doing a lot of Java UI code so I had stopped paying attention. But around the same time they were also making a cleaner interface between the platform-specific and the general Java code for UI, so I'm not entirely sure how that played out.
But that's exactly the sort of see-sawing you need to at least keep an eye out for when doing this sort of work. Would you be better off waiting a couple milestones and saving yourself a bunch of hand-tuning work, or do you need it right now for political or technical reasons?
That is where a JIT enters the picture, ideally a JIT can re-optimize to an ideal state.
While this is suboptimal when doing one shot execution, when an application is long lived, mostly desktop or server workloads, this work pays off versus the overall application.
For example, Dalvik had a pretty lame JIT, thus it was faster calling into C for math functions, eventually with ART this was no longer needed, JIT could outperform the cost of calling into C.
Depending on the math (this is a hedge) you need, FORTRAN is probably faster still. Every time I fathom a test and compare python, fortran, and C, Fortran always wins by a margin. Fortran:C:Python 1:1.2:1.9 or so. I don't count startup I only time time to return from function call.
Most recently I did hand-looped matrix math and this ratio bore out.
Sure, but that doesn't fit the desktop or server workloads I mentioned, I guess we need to except stuff like HPC out of those server workloads.
I would also add that modern Fortran looks quite sweet, the punched card FORTRAN is long gone, and folks should spend more time learning it instead of reaching out to Python.
When dealing with a managed language that has a JIT or AOT compiler it's often ideal to write lots of stuff in the managed language, because that enables inlining and other optimizations that aren't possible when calling into C.
This is sometimes referred to as "self-hosting", and browsers do it a lot by moving things into privileged JavaScript that might normally have been written in C/C++. Surprisingly large amounts of the standard library end up being not written in native code.
Ruby has realized this as well. When running in YJIT mode, some standard library methods switch to using a pure ruby implementation instead of the C implementation because the YJIT-optimized-ruby is better performing.
Well, most all of the compiler, runtime, allocator, garbage collector, object model, etc, are indeed written in C++
And so are many special operations (eg crypto functions, array sorts, walking the stack)
But specifically with regards to library functions, like the other commentator said, losing out on in lining sucks, and crossing between JS and Native code can be pretty expensive, so even with things like sorting an array it can be better to do it in js to avoid the overhead... Eg esp in cases where you can provide a callback as your comparator, which is js, and thus you have to cross back into js for every element
So it's a balancing game, and the various engines have gone back and forth on which functions are implemented in which language over time
FFI presents an opaque, unoptimizable boundary of code. Having chatty code like this is going to cost a lot. To the point where this is even a factor in much faster languages with zero-cost-ish interop like C# - you still have to make a call, sometimes paying the cost of modifying state flags for VM (GC transition).
If Ruby YJIT is starting to become a measurable factor (after all, it was slower than other, purely interpreted, languages until recently), then the same rule as above will become more relevant.
If FFI calls are slow (even slower than Ruby -> Ruby calls) then informs the way you use native code. You look for workflows whereby frequent calls to a FFI function are avoided: e.g. large number of calls in some inner loop. Suppose such a situation cannot be avoided. Then you may have no recourse than to move that loop out of Ruby into C: create a custom FFI for that use case which you can call once and have it execute the loop, calling many times the function you really wanted to call.
If the FFI call can be made faster, maybe you can keep the loop in Ruby.
Of course that is attractive to people writing an application in Ruby.
That's how I interpret keeping as much code Ruby as possible.
Nobody in their right mind wants to add additional application-specific jigs written in C just to use some C piece.
Once you start doing that, why even have FFI; you can just create a module.
One attractive point about FFI is that you can take some C library and use it in a higher level language without writing a line of C.
When we optimize Ruby for performance we debate how to eliminate X thousand heap allocations. When people in Rust optimize for performance, they're talking about how to hint to the compiler that the loop would benefit from SIMD.
Two different communities, two wildly different bars for "fast." Ruby is plenty performant. I had a python developer tell me they were excited for the JIT work in Ruby as they hoped that Python could adopt something similar. For us the one to beat (or come closer to) would be Node.js. We are still slower then them (lots of browser companies spent a LOT of time optimizing javascript JIT), but I feel for the relative size of the communities Ruby is able to punch above its weight. I also feel that we should be celebrating tides that raise all ships. Not everything can (or should be) written in C.
I personally celebrate any language getting faster, especially when the people doing it share as widely and are as good of a communicator as Aaron.
Node uses V8 which has a very advanced JIT compiler for the hot code, which does a lot of optimizations for reducing the impact of JS's highly dynamic type system.
The claim that Ruby YJIT beats this is not supported by the data to put it mildly:
Not at all saying Ruby's compiler is more capable, more that typical Ruby code is easier to optimize by their JIT design than typical JS, largely because Ruby's type system is more sane.
The whitepapers that inspired Ruby's JIT was first tested against a saner subset of JS, and shown to have some promising performance improvement. The better language/JIT compatibility is why the current Ruby JIT is actually showing performance improvements over the previous more traditionally designed JIT attempts.
JS can get insanely fast when it's written like low level code that can take advantage of its much more advanced compiling abilities; like when it's used as a WASM target with machine generated code. But humans tend to not write JS that way.
Agreed about Go as well, it tends to be on the slow side for compiled languages. I called it out not as an example of a fast language, but because it's typical performance is well known and approximately where the upper bound of how fast Ruby can get.
I did a quick search for the white papers and couldn't find them. Would you be kind enough to leave a link or a title? It sounds interesting, I'd like to read more.
For Ruby, it's code where variables and method input/return types can be inferred to remain static, and variable scope/lifetime is finite. From my understanding, much of the performance gain was from removing the need for a lot of type checking and dynamic lookup/dispatch when types were unknown.
So basically, writing code similarly to a statically typed compiled language.
I’ve only seen a handful of people use it for local development but I’ve seen plenty of people use it for servers in production. Look like the latest cpython has a JIT built in. It would be cool if it saw the same gains that ruby did.
Looks the the one at that link does unless there's some newer versioning thing I'm not aware of. The top results seem to be comparing pypy 3.10.14 and ruby/yjit 3.4.1.
Not only, you are missing Vignete, and our own Safelayer (yes I know it isn't public).
However exactly because of the experience writing Tcl extensions all the time for performance, since 2003 I no longer use programming languages without JIT/AOT other than for scripting taks, or when the decision is external.
The founders at our startup went on to create OutSystems, with many of the learnings, but using .NET instead, after we were given access to .NET during its "Only for MSFT partners eyes" early state.
The totally safe and sane approach is to write C code that gets passed data via the command line during execution, then vomits results to the command line or just into a memory page.
Then just execute the c program with your flags or data in the terminal using ruby and viola, ruby can run C code.
Can you please elaborate on this because I'm struggling to follow your suggestion. Shelling out to psql every time I want to run an SQL query is going to be prohibitively slow. It seems to me you'd need bindings in almost the exact same cases you'd use a shared library if you were writing in C and that's really all bindings are anyway -- a bridge between the VM and a native library.
Spawning a process isn't the right tool for ALL X-language communication. But sometimes it is - and the bias tends to be to overlook these opportunities. When you are comfortable using libraries, you make more libraries. When you know how to use programs, you more often make programs.
> Shelling out to psql
I would recommend using a Postgres connection library, because that's how Postgres is designed.
Note that ongoing communication can still work with a multi-process stdin/stdout design. This is how email protocols work. So someone could design a SQL client that works this way.
I have absolutely written batch import scripts which simply spawn psql for every query, with great results.
> in almost the exact same cases you'd use a shared library
That's the thing. Libraries are an entangling relationship (literally in your binary). Programs in contrast have a clean, modular interface.
So for example you can choose to load the imagemagick library, or you can spawn imagemagick. Which one is better depends, but most often you don't need the library.
Here is a list of examples I have seen solved with a library that were completely unnecessary:
- identify the format of an image.
- zip some data
- post usage analytics to a web server at startup
- diff two pieces of data
- convert math symbols to images
- convert x format to y
I have even seen discourse online that suggests that if you are serious about AI your web stack needs to be in python - as if you can't spawn a process for an AI job.
> Spawning a process isn't the right tool for ALL X-language communication. But sometimes it is
I'm with you here.
> ...many people do not understand Unix processes and don’t realizing how rare it is to need bindings, ffi, and many libraries
But, this is a much stronger claim.
I can't tell if you're making a meta point or addressing something in the Ruby ecosystem. I mentioned database library bindings because that's far and away the most common usage in the Ruby ecosystem, particularly because of its frequent adoption for web applications.
The author is advocating for not using native code at all if you can avoid it. Keep as much code in Ruby as you can and let the JIT optimize it. But, if you do need bindings, it'd be great if you didn't have to write a native extension. There are a lot of ways to shoot yourself in the foot and they complicate the build pipeline. However, historically, FFI has been much slower than writing a native extension. The point of this post is to explore a way to speed up FFI in the cases where you need it.
It needs to be taken on faith that the author is doing this work because he either has performance sensitive code or needs to work with a library. Spawning a new process in that case is going to be slower than any of the options explored in the post. Writing and distributing a C/C++/Zig/Rust/Go application as part of a Rails app is a big hammer to swing and complicates deployments (a big part of the reason to move away from native extensions). It's possible the author is just complicating things unnecessarily, but he's demonstrated a clear mastery of the technologies involved so I'm willing to give him the benefit of the doubt.
A frequent critique of Ruby is that it's slow. Spawning processes for operations on the hot path isn't going to help that perception. I agree there are cases where shelling out makes sense. E.g., spawning out to use ImageMagick has proven to be a better approach than using bindings when I want to make image thumbnails. But, those are typically handled in asynchronous jobs. I'm all for anything we can do to speed up the hot path and it's remarkable how much performance was easily picked up.
You interpreted my comment as an attack on the author. If they have a special case they want to optimize FFI for this seems like a great way to do it.
> Spawning processes for operations on the hot path isn't going to help that perception.
Yes it will. Because then you do the number crunching in C/Java/Rust etc instead of Ruby. And the OS will do multi-core scheduling for free. This is playing to Ruby’s strength as a high level orchestrator.
I think you’re vastly overestimating the fork/exec overhead on Linux. If your hot path is this hot you better not be calling Ruby functions.
> Writing and distributing a C/C++/Zig/Rust/Go application as part of a Rails app is a big hammer to swing
You’re already doing it. Do you have a database? Nginx? Me cached? Imagemagick? Do you use containers?
Consider the alternative: is the ideal world the one where all software is available as a ruby package?
> You interpreted my comment as an attack on the author. If they have a special case they want to optimize FFI for this seems like a great way to do it.
My mistake. I assumed the conversation was relevant to the post. I hadn't realized the topic had diverged.
> I think you’re vastly overestimating the fork/exec overhead on Linux. If your hot path is this hot you better not be calling Ruby functions.
We're talking about running a Ruby application. Let's use Rails to make it more concrete. That's going to be a lot of Ruby on the hot path. Avoiding that is tantamount to rewriting the application in another language. But, Ruby on the hot path can be fast, particularly when JIT compiled. And that's going to be faster than spawning a new process. The article even demonstrates this showing hot `String#bytesize` is faster than all other options. That's not to say writing a program in C is going to be slower than Ruby, but rather that integrating Ruby and C, which will involve translation of data types, is going to favor being written in Ruby. And, of course, the implementation of `String#bytesize` is in C anyway, but we don't have to deal with the overhead of context switching.
> You’re already doing it. Do you have a database? Nginx? Me cached? Imagemagick? Do you use containers?
All of those things ship as packages and are trivial to install. I don't need to compile anything. Packaging this theoretical application in a container out of band isn't going to help. Now, I need to `docker exec` on my hot path? No, that won't work either. So, instead, I need to add an explicit compilation step to what is otherwise a case of SCPing files to the target machine. I need to add a new task whether in the Dockerfile or in Rake to build an application to shell out to. At least with a native extension packaged as a gem there's a standard mechanism for building C applications.
There's no getting around that this complicates the build. I'm not sure why that's debatable: not having to do all of that is easier than doing it.
> Consider the alternative: is the ideal world the one where all software is available as a ruby package?
I'm not saying we should rewrite everything in the world in Ruby. But, yes, for a Ruby application the ideal world is one in which all code is written in Ruby. It's the simplest option and gives a JIT compiler the most context for making optimization decisions. I'm not interested in using Ruby as a glue language for executing C applications in response to a routed Rails request. At that point I may as well use something other than Ruby and Rails.
I’m not trying to be rude. You extracted that one sentence out of a paragraph where I tried to explain why my reply was oriented the way it was. We were evidently talking about two entirely different situations. You removed the context to make it look harsher than it actually is.
But, I agree that this conversation is no longer productive. If my mind seems made up it’s because I’ve done extensive work on three Ruby VMs, two Ruby JIT compilers, and run large Rails applications in production. My assertions on Ruby performance are pretty well-informed. It feels to me like you’re talking past most of points to tell me I’m wrong. Without addressing any of my points at more than a superficial, seemingly flippant, manner it’s hard to have a productive discussion.
Leading off with “I think many people do not understand Unix processes and don’t realizing how rare it is to need bindings, ffi, and many libraries” is a pretty bold claim. It’s of course possible the entire Ruby community is doing things wrong. It’s also possible there are aspects of the problem that haven’t been considered. But, I was hopeful a bold claim would have data to support it.
I care a lot about Ruby performance and if you’ve got novel ideas that can work in practice I’d love to hear them. But, this works best with concrete implementations. If you have benchmarks that show I’m wrong I’ll be happy to concede the entire thread.
somewhat related, this library uses the JVMCI (JVM Compiler Interface) to generate arm64/amd64 code on the fly to call native libraries without JNI https://github.com/apangin/nalim
If what could be written in C? The FFI library allows for dynamic binding of library methods for execution from Ruby without the need to write a native extension. That's a huge productivity boost and makes for code that can be shared across CRuby, JRuby, and TruffleRuby.
I suppose if you could statically determine all of the bindings at boot up you could write a stub and insert into the method table. But, that still would happen at runtime, making it JIT. And it wouldn't be able to adapt to the types flowing through the system, so it'd have to be conservative in what it accepts or what it optimizes, which is what libffi already does today. The AOT approach is to write a native extension.
> you should write a native extension with a very very limited API where most work is done in Ruby. Any native code would be a very thin wrapper around the function we actually want to call that just converts Ruby types in to the types required by the native function.
I think our main disagreement is your assertion that any compilation at runtime qualifiees as JIT. I consider JIT to be dynamic compilation (and possibly recompilation) of a running program, not merely anything that generates machine code at runtime.
> Now, usually I steer clear of FFI, and to be honest the reason is simply that it doesn’t provide the same performance as a native extension.
I usually avoid it, or in particular, gems that use it, because compilation can be such a pain. I've found it easier to build it myself and cut out the middleman of Rubygems/bundler.
In libffi you built up descriptor objects for functions. These are run-time data structures which indicate the arguments and return value types.
When making a FFI call, you must pass in an array of pointers to the values you want to pass, and the descriptor.
Inside libffi there is likely a loop which walks the loop of values, while at the same time traversing the descriptor, and places those values onto the stack in the right way according to the type indicated in the descriptor. When the function is done, it then pulls out the return according to its type. It's probably switching on type for all these pieces.
Even if the libffi call mechanism were JITted, the preparation of the argument array for it would still be slow. It's less direct than a FFI jit that directly accesses the arguments without going through an intermediate array.
FFI JIT code will directly take the argument values, convert them from the Ruby (or whatever) type to the C type, and stick it into the right place on the stack or register, and do that with inline code for each value. Then call the function, and convert the return value to the Ruby type.
Basically as if you wrote extension code by hand:
If there is type inference, the conversion code can skip type checks. If we have assurance that arg1 is a Ruby string, we can use an unsafe, faster version of the RubyToCString function.
The JIT code doesn't have to reflect over anything other than at worst the Ruby types. It doesn't have to have any array or list related to the arguments. It knows which C types are being converted to and form, and that is hard-coded: there is no data structure describing the C side that has to be walked at run0-time.
I am surprised many don't know how libffi works. Yes, it does generate native machine code to handle your call. Look it up.
Yes it's probably worse than doing the jit in Ruby interpreter, since there you can also inline the type conversion calls, but there principles are the same.
It certainly uses native machine code, but I don't think it generates any at runtime outside of the reverse-FFI closures (at least on linux)? PROT_EXEC at least isn't used outside of them, which'd be a minimum requirement for linux JITting.
Running in a debugger an ffi_call to a "int add(int a, int b)" leads me to https://github.com/libffi/libffi/blob/1716f81e9a115d34042950... as the assembly directly before the function is invoked, and, besides clearly not being JITted from me being able to link to it, it is clearly inefficient and unnecessarily general for the given call, loading 7 arguments instead of just the two necessary ones.
Oops, you are right. I think because the other direction of libffi - ffi_closure - has a jitted trampoline, I mistakenly thought both directions are jitted. Thanks for the correction.
And the JITting in closures amounts to a total of three instructions; certainly not for speed, rather just as the bare minimum to generate distinct function pointers at runtime.
libffi can't know how to unwrap Ruby types (since it doesn't know what Ruby is). The advantage presented in this post is that the code for type unboxing is basically "cached" in the generated machine code based on the information the user passes when calling `attach_function`.
libffi doesn't JIT for FFI calls; and it still requires you to lay out argument values yourself, i.e. for a string argument you'd still need to write code that converts a Ruby string object to a C string pointer. And libffi is rather slow.
(the tramp.c linked in a sibling comment is for "reverse-FFI", i.e. exposing some dynamic custom operation as a function pointer; and its JITting there amounts to a total of 3 instructions to call into precompiled code)
> Even in those cases, I encourage people to write as much Ruby as possible, especially because YJIT can optimize Ruby code but not C code.
But the C code is still going to be waaay faster than the Ruby code even with YJIT. That seems like an odd reason to avoid C. (I think there are other good reasons though.)
> the C code is still going to be waaay faster than the Ruby code even with YJIT.
I can't find it but I remember seeing a talk where they showed examples of Ruby + YJIT hitting the same speed and in some cases a bit more than C. The downside was though that it required to some warmup time.
I find that hard to believe. I've heard claims JIT can beat C for years, but they usually involve highly artificial microbenchmarks (like Fibonacci) and even for a high performance JITed language like Java it ends up not beating C. There's no way YJIT will.
The YJIT website itself only claims it is around twice as fast as Ruby, which means it is still slower than a flat earth snail.
The benchmarks game has YJIT and it's somewhere between PHP and Python. Very slow.
Does ruby have its equivalent to typescript, with type annotations? The language sounds interesting but I tend not to give dynamically typed languages the time of day
I continue to think it was a big mistake not to add syntactic support for type annotations into the base language. python did this right; annotations are not enforced by the interpreter, but are accessible both by external tools as part of the AST and bytecode, and by the running program via introspection, so tools and libraries can do all sorts of interesting things with them.
having to add annotations in a separate header file is simply too high friction to get widespread adoption.
IMHO (and I don't expect most people to agree but please be tolerant of my opinion!) annotations are annoying busywork that clutter my code and exist just to make people feel smart for “““doing correctness”””. The only check I find useful is nil or not-nil, and any halfway-well-designed interface should make it impossible for some unexpected object type to end up in the wrong place anyway. For anything less than halfway-well-defined, you have bigger issues than a lack of type annotation.
edit: I am quite fond of `case ::Ractor::receive; when SomeClass then …; when SomeOtherClass then …; end` as the main pattern for my Ractors though :)
as your codebase and number of collaborators get larger, it's super useful to have the type checker be able to tell you "hey, you said your function arg could be a time or an int, but you are calling time-specific methods on it" or conversely "the function you are calling says it accepts time objects but you are passing it an int"
also once you get into jit compilation you can do some nice optimisations if you can treat a variable type as statically known rather than dynamic.
and finally, even if you're not writing python at scale it can be very nice to use the type annotations to document your function parameters.
Sorbet is the most mature option. RBS barely has any tooling, while Sorbet works well.
It definitely isn't at the level of Typescript adoption, even relatively speaking. And it's more clunky than Typescript. But it works well enough to be valuable.
> Why does it remain relatively unpopular and what can be done so that more people get to use it?
Because Ruby-ish syntax without Ruby’s semantics or ecosystem isn’t actually all that big of selling point, and if people want a statically typed language, there are plenty of options with stronger ecosystems, some of which have some Ruby-ish stntactic features.
If you're looking for static typing a dynamic language is going to be a poor fit. I find a place for both. I love Rust, but trying to write a tool that consumed a GraphQL API with was a brutal exercise in frustation. I'd say that goes for typing of JSON or YAML or whatever structured format in general. It's refreshing being able to just work with data in the form I already know it's in. Ruby can be an incredibly productive language to work with.
If you're looking for static analysis in general, please note that there are mature tools available. Rubocop¹ is probably the most popular and allows for linting and code formatting. Brakeman² is a vulnerability scanner for Rails. Sorbet³ is a static type checker.
The tooling is there if you want to try things out. But, if you want a statically typed language then that's a debate that's been going since the dawn of programming language design. I doubt it's going to get resolved in this thread.
I’ve used rubocop and sorbet. But now that I’ve used TypeScript it’s clear there’s no comparison. TS will even analyze your regex patterns. Every update gets better. I’m eagerly waiting for the day they add analysis for array length and ranged numbers.
Rails has more mindshare, it's easier to hire for, there are more tutorials etc to help you when you get stuck, and Ruby has a more mature ecosystem of libraries/plugins than Elixir has.
I'd still pick Phoenix over Rails any day of the week, but if I had to make the case for Rails, that would be it.
I can't think of anything specific that was a huge problem. But when you're integrating with a third-party tool (e.g. for analytics, error reporting, email delivery, anything really), it's very common to see that they provide a Ruby SDK but not an Elixir SDK. Or if there's an Elixir SDK, it's something unofficial (and possibly unmaintained) from a third party, while the Ruby SDK is something in-house that has official support.
Most of the time these SDKs are just thin wrappers around their JSON API, so it's easy to build your own thing in Elixir anyway. And LLMs make it really easy. But it's something I've experienced pretty often.
A direct FFI call using JNI or the new foreign interface is fast, and has roughly the same speed as calling a Java method directly. Alas, the CPython and Java garbage collectors do not play nice, and require black magic in order to keep them in sync.
On the other hand, using proxies (such as in JPype or GraalPy) cause a significant performance overhead, since the parameters and return values need to be converted, and might cause additional FFI calls (in the other direction). The fun thing is if you pass a CPython object to Java, Java has a proxy to the CPython object. And if you pass that proxy back to CPython, a proxy to that proxy is created instead of unwrapping it. The result: JPype proxies are 1402% slower than calling CPython directly using FFI, and GraalPy proxies are 453% slower than calling CPython directly using FFI.
What I ultimately end up doing is translating CPython bytecode into Java bytecode, and generating Java data structures corresponding to the CPython classes used. As a result, I got a 100x speedup compared to using proxies. (Side note: if you are thinking about translating/reading CPython bytecode, don't; it is highly unstable, poorly documented, and its VM has several quirks that make it hard to map directly to other bytecodes).
For more details, you can see my blog post on the subject: https://timefold.ai/blog/java-vs-python-speed
reply