Surely the actual work is mostly being handled in the database, with the ruby code mostly acting is glue code? I'd expect to run into database scaling issues (or data model issues) long before running into issues with the relatively easy to scale horizontally ruby code.
Nope. The biggest bottleneck is in the async Sidekiq workers. I have 7 Sidekiq processes using a total of 5GB of RAM and as much CPU as they can get, but my database server is at around 5% CPU and IO.
If the Rails part scaled as well as my DB, I believe I could trivially handle 10x the number of users I have today.
You might want to double-check your DB pooling numbers. Running out of DB connections can end up looking a lot like heavy CPU usage on Ruby.
This bottleneck probably relates to the delivery of new messages to feeds. That's the busiest part of the backend and requires 5-7 DB requests and a bunch of Redis requests per recipient from a post on your server.
I'm sure spending time just on making this part of the platform more efficient would have massive impacts on performance across on a Mastodon instance.
Yes, but are the connection pools in Rails & Sidekiq configured for a very large number of connections. You can probably squeeze up to 100+ out of a single process.
Yes on the "big user" side (you're just burning little extra money every month on slow code), but no on overall.
Having fast code means that random $10 a month VPS can now support much bigger community. The hosting providers can also provide cheaper/better service for people that want just pay someone to run it for them.