Hacker News new | past | comments | ask | show | jobs | submit login
Static Typing for Ruby: Adopting Sorbet at Scale (shopify.engineering)
155 points by ufuk on Nov 20, 2020 | hide | past | favorite | 70 comments



I've been using Sorbet for a while integrated in my SublimeText. I have never added a type signature, not even once. I just use it to prevent me from silly mistakes (like changing a variable name somewhere and breaking code downstream). The immediate red underline, which we are used to having in JS with ESLint, is such a great productivity booster.


This is a great way to use Sorbet and get its benefits without having to invest 100% into typing. Great example, thanks for sharing.


It is the same with Typescript. The VSCode Typescript plugin gives you type signatures of DOM Related functions that you would otherwise never get.


This looks great in general, but does anyone else find Sorbet's signature formulation offputting? Feels cluttered and not at-a-glance readable.

> sig {params(name: String, id: String).returns(Integer)}

My ruby isn't quite good enough to parse the language constructs that make up that line, but it's not pretty. I guess....sig is a class method which takes a block....and params is a class method (Added to BasicObject or something, perhaps?) to which you can pass any number of keyword params, and which returns an object that has a return method, to which you can pass any object type. I guess I don't know why you need to call sig at all, as opposed to just params( args ).returns( type ). I also have no idea how the sig call gets associated with the method that follows it.

I just wonder if there was a cleaner way to phrase this that's still syntactically viable.


The block passed to sig is evaluated in a different context, one where the local object has those methods. The methods aren't added globally, which is why you need the first method to switch the context. It's generally a good policy to avoid adding those class methods at the top level, which Sorbet does assiduously.

Sig also turns into a no-op if you have runtime verification turned off, which is another good reason not to call params right away, because (in ruby) you can ignore everything in the block if the block is not called, sort of like a debug macro in C, but you can't do that with a method that is called - it must evaluate its parameters.


That's super informative, thank you! Can you also add a word of explanation as to how the `sig` call gets associated with the method that follows it? What ties them together, is there some static parser?


Sorbet has its own written-in-C++ parser that does the actual parsing. At runtime, the sig call basically sets a flag for the next method that is defined which hooks into it to validate that the parameters passed in are as defined, and that the return value is as expected. I believe they're delving into the dark magic in the interpreter directly, the docs are here: https://sorbet.org/docs/runtime


For the runtime, there is not much dark magic. Each type that has an `extend T::Sig` has `method_added` hooks registered, which notifies Sorbet runtime whenever a method is defined on the type. When that `method_added` hook is called, Sorbet runtime uses the sig flag that you mention to associate the `sig` with the method definition that follows it.

It is, more or less, an implementation of this idea: https://yehudakatz.com/2009/07/11/python-decorators-in-ruby/


Yes indeedy, I'd consider `method_added` to be pretty dark magic though :)


The actual signature checking is complicated, but associating the sig call with the method is pretty simple in Ruby — you just hook `method_added`. (Of course the actual implementation in something like Sorbet is a lot more sophisticated than just `def method_added`, but that's the basics of how you make a method call alter the next method definition.)


I imagine they could (in theory) simplify it to

  sig(name: String, id: String).returns Integer
Losing the block maybe has some undesirable performance implications, but it looks a little nicer without the curly braces.


I think they did this originally but the performance implications were a deal breaker, and had to move to blocks. Iirc, which is questionable.


params in rails is pretty standard for controllers. wrapping it in something like sig makes the namespacing not collide.

https://github.com/search?l=Ruby&q=params&type=code


I wish this had been available years ago!

I think Ruby was an excellent replacement for Lua to do ML, once Torch ran out of steam due to LuaJIT memory limitations. However, historically, MRI Ruby was slow and problematic.

Personally, I prefer Ruby to Python. Ruby is very close to Smalltalk, with some ideas from Perl and Lisp.


God do I love Ruby. It's just such a beautiful language. Feels more expressive and imaginative than any other language I've touched by a mile.

But while I personally enjoy it more than Python, Python will always be the one I would recommend to anyone for work or learning-- it's so much more straightforward and predictable. The "one right way to do things" mentality that pervades python makes things so consistent and intuitive. Ruby, on the other hand, gives you 10+ ways to form a loop. The paradox of choice and all.


In the words of David Heinemeier Hansson:

> In fact, Ruby, to me, so much of the enjoyment in Ruby is these incredible subtleties, of how many different ways you can structure a conditional. Like, Ruby has, I don't know even the count, there's gotta be 60 different ways you can say `if something`, right? And it is in those 60 different ways that I find half the enjoyment of writing Ruby. Like, it was one of those things where I knew, very early on, that Python was not a language for me because it said, right in the manifesto, there should be preferably one and only one way to do things. Ruby has the exact opposite approach, there should be preferably ten thousand subtle different ways of doing things, that will allow you to write that particular conditional, with just the right emphasis, do you write it in the front, do you put it at the back, is it multi-line, is it single line? Like, there's so much variety and it's in that variety that I find poetry. And it is the poetry of writing Ruby code, of making those subtle distinctions where, at the end, you can like, "Ehh, should we move it around" like, where I just go like, giggles, right? Like, this where like we talked about that big smile, right? So much of that big smile comes from, not just like solving the problem, but solving it in a poetic way.


I respect that perspective. I both love and often avoid Ruby precisely because of this. But even if I don't use Ruby much day to day, I really appreciate what it taught me (Rails, too) and the fun I had writing it.


And I love Ruby for exactly that reason. Well said, totally agreed.



Hm interesting, for anyone interested, LuaJIT has a 2 GiB limit, which is unsurprisingly problematic for machine learning:

https://kvitajakub.github.io/2016/03/08/luajit-memory-limita...

https://github.com/karpathy/char-rnn/issues/80


The 2gb limit has been fixed for a many years now with the GC64 mode that was set to the default build mode last year.


> MRI Ruby was slow and problematic.

Why problematic?

On the slowness, I would agree that Ruby is slower than many languages, but as an interpreted language this is -up to a certain point- by design, or at least an accepted part of the trade-off one accepts when he takes a programming language. Moreover, Ruby's natural competitor is Python, not -say- Clojure; and I wouldn't that Ruby is significantly slower than Python.


It had lots of memory leaks. I love Ruby, and I used it anyway, but this bothered me a lot back in the mid-to-late 2000s.


The ever present memory bloat with Ruby has been discovered to be a glibc issue. Jemalloc makes MRI memory usage much more predictable and stable.


It's gotten a lot better since then, but so have all the other languages! I still love Ruby but I would use Python for ML in a heartbeat.


I work with Ruby, Python, and R daily.

I use Ruby for everything I can, Python for ML, and R for data analysis. I just find it easier to not swim upstream and force ruby onto all of those contexts when the other 2 languages have loads of inertia behind them in those spaces. It's unfortunate.


I keep looking at Sorbet, and I have since it was in private beta. It just seems to be so so much work to get anything out of it. Last time I tried it was last year when I was at Rubyconf in my spare time on a microservice.

It's frankly easier for me to rewrite a small service in Rust, and have native typing (plus thread safety, no race conditions, easy parallelism, etc).

While I'm best at Ruby, and have used it professionally for well over a decade - it's wearing on me.


Somewhat intentionally, there is no doc page on "Starting a new project with Sorbet" only "Adopting Sorbet in an existing codebase"—Sorbet was built for and delivers the most value to large, existing Ruby codebases.

You'll also notice the three selling points on the home page:

- It's fast (it has to be, else it wouldn't work in large existing codebases)

- It's IDE ready (so that a dev tooling team can go to their organization and say "if we adopt this, the IDE features will make our engineers more productive")

- It's gradual (so that typed and untyped code lives side by side)

If you have the luxury of starting from scratch, there are far better ways to ensure 100% type coverage from day 1. Unfortunately, companies like Shopify and Stripe don't have that luxury, and are contending with hundreds of developers who maintain millions of lines of Ruby code.


The second part of this two-part blogpost is here: https://shopify.engineering/adopting-sorbet


Happy to answer any questions here about our process, results or tools!


Having used Sorbet, one thing immediately noticeable is that it's incredibly verbose due to needing to be valid ruby syntax. Was there any developer pushback on that?

Also, maybe I missed it, but Sorbet has a very fleshed out plugin for VSCode but that's about it. How was the experience getting people who don't use VSCode to integrate it into their workflow?


There was definitely some general suspicion when we first introduced Sorbet. However, there were also some early adopters who didn't mind the syntax but really wanted the benefits. That's why we didn't push for types adoption, we allowed teams to adopt (or not) at their own pace. Once people saw the benefits, they wanted more of it. Syntax also stopped being a huge concern.

Quoting from the blogpost:

> - Developers get used to Sorbet syntax over time

> ...

> Our main observation is that developers enjoy Sorbet more as the typing coverage increases.

On the editor integration, the funny thing is that VSCode is the only editor that needs a special extension to integrate with Sorbet. The way Sorbet supports editors is via the built-in Language Server Protocol (LSP) mode which can be used with any LSP plugin for any editor. I know people who are using it with Vim, Sublime Text, etc. Integration is, in a nutshell, running `srb tc --lsp` as a subcommand and piping data in/out via stdin/out. Our tool Spoom actually uses the same LSP mode to provide extra developer tools and analysis on top of Sorbet.


Dmitry here, from Stripe. One of founding members of Sorbet(though I no longer work on it anymore).

Story that Ufuk shares is common.

We saw this process repeat in many companies, Stripe, Shopify and many others - folks are initially bothered with verbosity of `sig` syntax, but as the company starts using it in practice stop being bothered pretty quikcly. It happens even faster now that IDE integration allows to auto-complete entire signature and can in most cases correctly guess the types of arguments & result type. You rarely type `sig`s yourself.

The common internal asks at Stripe are about even better IDE support, new features & etc and that's where most of investments are. Well, and and obviously performance. Sorbet was already very fast and we intend to keep it fast as our huge codebase grows.


Do you have any tips or tricks for migrating a Rails app from Sorbet `srb rbi` generated files to one using Tapioca?

I was unsure exactly which bits of Sorbet rbi were still required with tapioca https://github.com/Shopify/tapioca/issues/114

We are currently using `sorbet-rails` but it appears Tapioca would be a replacement for that as well, is that correct?


Ultimately, the goal is for Tapioca to replace all `srb rbi` tooling and `sorbet-rails`. Right now, we exclusively use Tapioca for gem RBIs and we don't use any `srb rbi` tooling at Shopify. That works perfectly fine, but you don't get any `sorbet-typed` RBIs, which, right now is not a concern for us.

The DSL generators are not 100% complete to fully replace `sorbet-rails` right now, but we are preparing a 1.0 release of the gem that should be able to do that.


One of the challenges sorbet-rails faced [1] was with the usage of method_missing in certain places. The one that bit me was Rails automatically piping a class method from Model into the Model's CollectionProxy, effectively making class methods into scopes. We use this pretty extensively at work for complex scopes, so this is one of the reasons I've not been able to get complete buy-in for sorbet. Is that better in tapioca?

(Also: Thank you for sorbet! It's the biggest reason I continue to use Ruby for my personal projects.)

1: https://github.com/chanzuckerberg/sorbet-rails/issues/104


I am afraid that is a shared concern for what Tapioca is doing as well. There are ways of mitigating that concern, by, for example, lifting all static methods on a model to be methods on the collection proxy in RBI files. This is certainly doable, and we are less interested in the correct signatures for such methods than having the method definitions in place, in the first place. On the other hand, we also have some Sorbet feature ideas that we want to experiment with where we might be able to annotate that a certain type delegates all missing methods to another type, for example.

This is indeed a problem in our codebase as well, and so far our team has been suggesting that people add shim (i.e. manual) RBI definitions for the methods that they find are missing from the types they've expected them on. This is a good stop-gap measure to solve a problem that is not very common with an easy solution to implement.


Thanks for the reply!

> but you don't get any `sorbet-typed` RBIs, which, right now is not a concern for us.

Ahh ok! I think this might have been a bit I missed, cause I was commonly trying to do both which likely caused issues for me.

> The DSL generators are not 100% complete to fully replace `sorbet-rails` right now, but we are preparing a 1.0 release of the gem that should be able to do that.

Will keep my eyes open, thanks!


I'm not sure about Sorbet. Shopify are definitely talking the talk but the issues list on GitHub is getting pretty large and the much-trumpeted VSCode plugin has disappeared. My first foray was cut short by a bug (https://github.com/sorbet/sorbet/issues/3603). With the up and coming typing support in Ruby 3 there's competition, which is good, but given the current state of play I'm going to wait for things to settle before trying to adopt again.


I made some contributions to make sorbet work with jruby but It's still definitely 3rd or 4th class citizen.

That's surprising since I would have figured most enterprise shops are on JRuby


> Finally, we track how many times our developers ran the command dev tc to typecheck a project with Sorbet on their development machine.

How? Do they track all of their devs commands?


`dev tc` is a command at Shopify that runs Sorbet. I'm assuming they instrument all subcommands of their `dev` command and send aggregate statistics to an internal statsd service.

We do the same thing where I work. If you're working on a developer productivity team, the service whose SLA you're responsible for is the dev tools, so it's critical to know how long they take, which commands are run most frequently, etc.

In fact, Sorbet, has this built in: you can give it an assortment of `--metrics-...` [1] and `--statsd-...` [2] flags that configure Sorbet to talk to a statsd service directly:

[1] https://sorbet.org/docs/metrics

[2] https://sorbet.org/docs/metrics#reporting-metrics-directly-t...


Thanks!


Semi related: what kind of process do you have that makes you feel confident pushing 40x a day? Do you run some kind of automatic analysis that helps you automate the push completely? What if you know that HEAD has a regression that shouldn't be pushed?


We have a lot of automation and infrastructure in place to give us that confidence. We have an extensive test suite of hundreds of thousands of tests, linting and static checks that run on CI, a deploy automation tool that automatically merges and deploys on green (https://shopify.engineering/successfully-merging-work-1000-d...) and a canary deploy system where we quickly detect regressions that might have escaped our systems to that point. We have a lot of posts about those systems written up in our engineering blog, as well.


Isn't crystal lang trying to do the same thing? Why not join forces?


> Isn't crystal lang trying to do the same thing?

Nope, Crystal is a statically-typed language with distinctly different than Ruby semantics that features Ruby-ish syntax.

Sorbet is an optional static type system for Ruby.

They aren't trying to do the same thing. One is trying to appeal to the aesthetic preferences of developers who like Ruby with static typing, the other is trying to enhance the Ruby ecosystem with static typing. There's only a superficial similarity between these things.


Beyond the syntax looking a bit similar, Crystal is a very different language. Porting over a production application such as Shopify isn't something you do in an evening.


It goes past syntax. Yes, they're different languages, but syntax, behaviour, stdlib are so close you can port non-trivial libraries between them in a relatively short time.


The key word there is non trivial. Something with as many moving parts as Rails is non trivial, and Crystal doesn't have the run-time metaprogramming that Rails makes extensive use of. A trivial example is ActiveRecord generating attribute methods from your `select` operation.


I don't think rails is a fair example. Rails loves its runtime metaprogramming and that is not where crystal offers much, but I don't think it invalidates the overall similarity. There's trivial (scripts with simple flow), non-trivial (libraries with heavy oop design and a little bit of trickery), and there's almost everything else before we get to rails itself.

Just like in C, you have trivial, nontrivial, complex apps, then there's still a long way before production OS kernels at the extreme.

Many advanced libraries don't use metaprogramming much, and even if they do, it can be often made either more explicit or shifted to compile time.

Edit: The downvotes are interesting since the portability claim comes from my experience doing just that.


Shopify uses Rails. To your original question, which you've answered yourself: that's why they haven't joined forces. They were very clear in the article about the size and activity of their codebase, and their commitment to Rails in particular.


That wasn't my question :-) I was only addressing the syntax similarity bit since it's repeated write a lot.


Like others have stated in the thread, Crystal is a different language that shares some its syntax with Ruby. Our aim is to keep our Ruby codebase but adopt static types gradually.


Is there any runtime performance gain to be had in the interpreter at runtime when adding types. I understand the benefits to your workflow but is there an opportunity to allow the interpreter to do less work when types are used?


Currently, no. On the contrary, since Sorbet syntax is pure Ruby, it needs a runtime component, so there is actually a small negative impact in performance at runtime. The reason why we cannot use types to make the interpreter do optimizations is because the interpreter is not aware of types.

Having said that, such optimizations are indeed possible and we are already thinking about how we can use the type information we have to that effect. There are still many things that need to line up for that to be a thing, so we are not expecting any new developments on this soon. But this is a very exciting direction for our Ruby Infrastructure team at Shopify.


It's one thing to adopt a layer of static typing on top of your massive production codebase, it's another to switch languages entirely.


Holy hell, I didn't know there was such animosity between the two languages! I really need to refrain from asking questions on HN!


There's little animosity there really. The answers you got are pretty simple/factual. Nobody's really judging either language here.


At this point Ruby (which is still slow and dynamic compared to most other faster languages) looks like a sunk cost.

Only reason why it's still alive is Rails, I don't see much hope for it longer term, Crystal and Elixir does IMO.

Using Ruby/Sorbet screams using JS/Flow to me, instead of using a language with types built in.


Obviously programming language preferences are like favorite flavors of ice cream, but Ruby is still the best scripting language there is IMHO.


> At this point Ruby (which is still slow and dynamic compared to most other faster languages) looks like a sunk cost.

Can you expand on why you think this is?


What makes Ruby so irreplaceable? Why struggle to make it something it's not when you could pick an existing statically typed language and build your system? Also how is Ruby a safe language for financial transactions when an engineer can hijack a running process and change memory without leaving a trace


What makes millions of lines of code and decades of knowledge irreplaceable?

Is there any language or ecosystem that can protect against a rogue employee?


> hijack a running process and change memory without leaving a trace

Doesn't sound like a Ruby problem.


No but, Ruby makes it trivial. No decompilations, no assembly, no debuggers necessary. Drop into an irb in a running process, change stuff and get out in seconds


Compilation is not a security control. Also if you're handling transactions at a significant rate, PCI (with all its problems) makes sure there's a trace.


> drop into an irb in a running process

This is a thing?


It also is a thing in most other languages, including C, Python, Java, Erlang, ...


I'm not so sure. You can certainly attach a debugger to any running process. But that's not what parent was suggesting. He was saying anyone could attach a REPL. That's a totally different animal.

It's your own dumb fault if you expose the web-console or similar on production.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: