This is a fantastic talk, thanks so much for posting it. The thesis was both novel and interesting to me as a heavy Ruby user. I'll buy you a drink if you're ever in SF.
I don't know Ruby, so I had never paid JRuby any attention or heard of any of this, despite writing VMs for fun and following PyPy development. This talk was amazing. The fact that you include a C parser and interpreter (CTruffle) [1], can parse (some) MRI Ruby extensions written in C, and treat it just same as Ruby and inline/partially evaluate/etc everything together, and hence run faster than compiled C extensions, is just mind blowing. And it's all JIT compiled to JVM bytecode which then gets JIT compiled by HotSpot, what an incredibly complex stack.
It makes me wonder about writing a JIT compiler to produce LuaJIT bytecode, since I'm currently working on AOT compilation of a dynamic language to LuaJIT bytecode.
The thesis was that you can't make Ruby fast by implementing, for example, a fast sort routine as an extension in C or Rust, because that extension is then a big black box to any optimisation that the VM might want to do.
You might be able to make a standalone sort fast, but you really need to be able to make a sort followed by an index fast (or followed by a reverse, or any other endless combinations you can't plan for), and if your sort is an external routine, your VM can't simplify it with the knowledge that it's going to be followed by an index operation.
That's particularly important in idiomatic Ruby code, because so much of it is just calls to the core library. This is what makes Ruby slow, and is what needs to be addressed to make it fast.
I use JRuby as an example, but it also applies to Rubinius and MRI.
I guess that was the key takeaway from your talk:
Ignore local optimisations in favour of more global ones.
The fact that you parse C extensions was really unexpected.
Given something like Object#blank? using the Ruby implementation vs Sam Saffron's "fast_blank",
after JRuby+Truffle has done it's work, which implementation ends up more performant?
And would a future JRuby+Truffle (given both the Ruby and C implementation) make an intelligent decision of which one to use?
Truffle at the moment wouldn't make #blank? fast, because it uses a regex, and at the moment that's still a black box to our optimiser (which is a shame).
Do you develop with JRuby or only use it for running production code? I gave a try to it in 2015 and I've found it almost unbearably slow to develop with: the JVM startup time and the impossibility to make it optimize code that runs only once. This despite the recommended optimizations (disabling JIT etc). Running tests is hell, starting irb/rails c is hell, etc. I read https://github.com/jruby/jruby/wiki/Improving-startup-time now and I'm not sure that all of this was available when I used JRuby for the last time. I'd love to give it a try again and find it fast so if you develop with it, what's your setup?
I use MRI ruby locally, JRuby on production servers.
I have had no real issues with it to be honest. Never any incompatibilities. Some slight care needed when picking gems to make sure no cext but it is rare that I find one.
I use JRuby on the CI box to make sure nothing untowards goes to prod.