These are great tips for load testing any web service. Don't forget that you can also saturate memory, disk or network too, so watch those graphs too. I've seen load tests unable to saturate CPU because another resource was limited.
Also don't forget that you are load testing all the dependencies of your service. Database, caching tier, external services, etc. Make sure other teams are aware!
Also nothing beats real world traffic. Users' connections will stay open longer than a synthetic tool may hold them open due to bandwidth, they make very random, sporadic requests too. Your service will behave very differently under large amounts of real world traffic vs synthetic.
Other options if you are running multiple web servers is to shift traffic around to increase traffic to 1 host and see where it fails. That is usually a very reliable signal for peak load.
And don't forget to do this on a schedule as your codebase (and your dependencies codebases) changes!
If you're looking to run programmable load test scenarios as described in this blog post, consider checking out https://k6.io/, which I've found to be extremely easy to use and powerful.
I've never done load testing before, but would it be hard to write a script in pure ruby (maybe with a few libraries) that makes a lot of concurrent requests to whatever endpoints and using whatever params you like?
Don't write your own load testing tool other than as a fun little exercise. At least not without understanding coordinated omission and thinking about workload modeling (open? closed? hybrid? all of the above?) [1]. Get this wrong and the results produced by your tool will be worthless.
Once you've got that out of the way, don't forget that you'll want a distribution story. It does not matter how efficient your tool might be on a single machine - you'll want to distribute your tests across multiple clients for real-world testing.
"Sure it's easy" you might say, "I know UNIX. Give me pssh and a few VMs on EC2". Well, now you've got 2 problems: aggregating metrics from multiple hosts and merging them accurately (especially those pesky percentiles. Your tool IS reporting percentiles rather than averages already, right?!), and a developer experience problem - no one wants to wrangle infra just to run a load test, how are you going to make it easier?
And, this developer experience problem is much bigger than just sorting out infra... you'll probably want to send the metrics produced by your tool to external observability systems. So now you've got some plugins to write (along with a plugin API). The list goes on.
I think mentioning you are a founder in the original comment might have prevented that particular reply from happening. I think one of the things I, personally, am becoming less okay with as a reader here is seeing recommendations without properly disclosing a connection.
Not saying everyone has nefarious reasons for doing it, but, it's just... everywhere.
I also play guitar, and there is a popular store in Europe with a pretty dang popular YouTube channel that I sometimes watch when the topic seems interesting. There was a whole kerfluffle a few months ago because one of the brand names that was getting a lot of air time on their YouTube channel was one that was financially backed by the owner of the store and a host of the channel. It took a ton of research of another YouTube to uncover this, and after it was found out, the owner of the store and host of the channel, finally disclosed his relationship with the brand he was promoting.
I feel like this was my more eye opening moment that tons of people out here on all variety of services are recommending their products but not disclosing their relationship clearly.
Now, you are saying so in your profile, but how many people are going to click into your profile?
I'm not saying you _have_ to do this, just suspecting that there are more and more people who are giving every recommendation the side eye these days because lack of disclosure. Disclosure isn't a bad thing, it just puts the bias in the open and people can gauge the recommendation more easily with that bias in mind.
None of this is probably new to you, but, trying to add something to the conversation rather than just call someone out, which is the easy and far more violent thing to do.
There's a lot of subtleties. It's really easy to accidentally load test the wrong part of your web application due to differences in compression, cache hit ratios, http settings, etc.
Shameless self promotion but I wrote up a bunch of these issues in a post describing all the mistakes I have made so you can learn from them: https://shane.ai/posts/load-testing-tips/
This is a great blog post! just taking the opportunity here to comment on this:
> Finally for full scale high fidelity load tests there are relatively few tools out there for browser based load testing.
It exists as of a few months ago and it's fully open source: https://github.com/artilleryio/artillery (I'm the lead dev). You write a Playwright script, then run it in your own AWS account on serverless Fargate and scale it out horizontally as you see fit. Artillery takes care of spinning up and down all of the infra. It will also automatically grab and report Core Web Vitals for you from all those browser sessions, and we just released support for tracing so you can dig into the details of each session if you want to (OpenTelemetry based so works with most vendors- Datadago APM, New Relic etc)
Load testing can be harder on the client side than the server side at high loads, which might be surprising, but consider how many servers need to handle large numbers of clients connecting and how few clients need to connect to large number of servers --- if you do it yourself, you're going to have to relearn all the techniques to make large number of connections possible.
If your desired load for testing is small, it's not a big deal, of course.
Depends on needs and familiarity. I used Bash and Curl, then PHP, then settled on Jmeter for a tricky bug due to a memory leak from a race under load. All three could reproduce the problem. Jmeter had quite a learning curve, and preparing data for it was a pain. Ultimately though it was more reliable, can do distributed testing, and has lots of nice built in features and analytics.
It is simple enough to do a script doing that in Ruby, but there also enough off the shelf tools specifically for load testing that encodes a lot of experience and avoids a lot of obvious and not so obvious mistakes, so it's usually not worth writing your own.
If you'd like to write your load tests in Ruby (plain ruby or browser based), and use your own libraries (internal libraries, specs, ets) browserup does that:
Having your test database mirror production as closely as possible is also an important habit, however I’m biased since that’s part of the offering that I’m building.
You might want to try and maintain a synthetic dataset for testing and staging that has the same "shape" as your production data - to avoid exposing sensitive data.
We're currently trying to have each rails model implement a #new_example method that builds a valid subgraph filled in by Faker, ready to save. Ie a
user = User.new_example
will come with a Company.new_example if every user needs a company relationship.
We're doing the same, but for the TypeScript world with "Snaplet Seed." We use generative AI to generate deterministic values + the required relational data:
https://www.snaplet.dev/seed
We generate data based off of your database schema and your production data (if you give us access.)
Since you've kinda already built something like this I would be curious to hear what you think!
I can understand that. I prefer keeping my models clean without environment-specific implementation details, which is why I've settled on the FactoryBot approach for testing, seeding, etc.
> If the application can’t saturate the CPU, there’s a fundamental problem. It’s a shame because it makes adding more servers less efficient. Money is being wasted on hosting costs, and this should be a priority to address.
Talking about efficiency being a priority, but using RoR. I guess that is one way of saturating the CPU.
Doesn't rails not have multithreading? How do you gather requests to batch your database calls?
A coworker made similar claims to me about Laravel, but the framework really encourages you to do half a dozen database queries in even a pretty minimal request, and for example implemented bulk inserts as a for loop that did single inserts. If you didn't know better with an access pattern like that, you might think the database is the bottleneck long before it actually should be. Is Rails different? My sense was they are very similar.
Interesting. My understanding is that part of why mastodon is so slow/resource hungry is that it serializes background tasks to redis for a sidecar process to pick up, and that that's the normal way to do things. If rails has a concurrent runtime, why don't they just run background work directly?
Rails isn't super opinionated about database writes, its mostly left up to developers to discover that for relational DBs you do not want to be doing a bunch of small writes all at once.
The way my team handles it is to stick Kafka in between whats generating the records (for us, a bunch of web scraping workers) and and a consumer that pulls off the Kafka queue and runs an insert when its internal buffer reaches around 50k rows.
Rails is also looking to add some more direct background type work with https://github.com/basecamp/solid_queue but this is still very new - most larger Rails shops are going to be running a second system and a gem called Sidekiq that pulls jobs out of Redis.
In terms of read queries, again I think that comes down to the individual team realizing (hopefully very early in their careers) that's something that needs to be considered. Rails isn't going to stop you from doing N+1 type mistakes or hammering your DB with 30 separate queries for each page load. But it has plenty of tools and documentation on how to do that better.
`insert_all` seems to be an example of what I mean about how the framework encourages you to do the wrong thing. Here there is a lower-level hatch to do a bulk insert, but it says it doesn't run your callbacks/validations. So if you're using "good" design (or using libraries that work by hooking into that functionality), you can't use it. Laravel was the same way.
The new queue you linked is database backed, but the whole point is that you want to just run a job without needing to serialize anything outside of your process. It should just schedule it onto the thread pool and give you a promise for when it's done.
The Kafka thing also seems to be an example of what I mean: in Scala I'd just make a `new Queue` with a thread safe library, and have a worker pull off and do an insert every hundred rows or so, or after e.g. 5 ms have passed, whichever is first. No extra infrastructure needed, minimal RAM used, your queueing delay is in the single digit ms, and you get the scaling benefits. Takes maybe 10-20 lines of code.
You can then take that and abstract it into a repository pattern so that you could have an ORM that does batching for you with single item interfaces (for non-transactional workflows), but none of them seem to do this.
I supposed I've just been in Rails land for a while so I can't make an apples to apples comparison to how other frameworks approach things but I don't think insert_all is encouraging anything wrong - by the time a Rails team is reaching for it I can almost guarantee they understand the implications of it.
And again maybe I'm just not understanding but I really like having our background processes handled completely separately from our main web application. Maybe its just the peace of mind knowing that I can scale them independently of each other.
It's not that insert_all is encouraging anything wrong; it's that the normal way to use ActiveRecord does. insert_all is the right way to do things performance-wise, so you'd want to use it when possible, but if you were using other features of the framework like callbacks/validations for create/update, then you can't. The happy-path of an ORM tends to push you in a direction where bad performance all over the place is the default, and it does it in a way where if you didn't have properly calibrated performance expectations, you might think that the bottleneck is because IO is slow, but actually it could easily handle 10x the workload with better access patterns.
Having a redis job queue is extremely standard, especially for web app development, regardless of language. For one thing if the web server crashes for any reason the jobs still continue processing and also you have a log of jobs in case they fail etc
Are people using it for reliability though? Are they running redis in a mode where it persists a journal? If not, then if redis crashes for any reason, you're in the same situation.
And, like, Mastodon apparently uses a queue to do things like send new user registration emails. Why not just send the email from the new user request handler? Then if there's an error, you can tell the user in the response instead of saying "okay you should get an email" and then having it go into the ether. I was under the impression this had something to do with not wanting to tie up the HTTP worker because you want it to quickly get back to doing HTTP requests, but if it can concurrently process requests, there's no issue.
Similarly they have an ingest queue for other federated servers sending them updates. But if things are fast, why wouldn't they just process the updates in the HTTP handler? You don't need a reliable queue because if e.g. you crash, the other side will not get their HTTP response, and they'll know to retry.
It may just be out of habit and not any underlying language reasoning. Things like sending emails or doing anything but simple database operations make sense to do in a queue. For instance I’ve worked at multiple places where we did this using celery and python or bullmq/javascript. Some of them we did have a log that persisted for a certain amount of time so we could rerun e.g. emails that never got sent
I'm biased as I usually come on to the scene because a company has a successful product and is experiencing engineering scaling pains.
That said, in my experience, CPU is often the ultimate bottleneck with PHP, Ruby, Python, and.. Like everything. Over the years serializers have often been a pain point; XML in PHP and RoR, and the Rails "serializers" currently. Any sort of mapping or hydration(which is a LOT of what happens in web apps) is comparatively slow, often order of magnitude or more, over something like nodejs, C#, golang, and etc.
> Most bottlenecks are either that database choices or poor code/design choices by developers
Perhaps in sheer quantity, but with experience those are often low hanging fruit. After those are addressed you are left with the pain of the language and framework inefficiencies.
The author means efficiency in terms of the software that is running, not efficiency in terms of all possible optimizations such as switching to a different lang.
While you missed the point of the saturation comment, this is why we love AWS Lambda over ECS+Fargate.
Rails has really poor startup time due to loading all codepaths. We switched to Django and it runs beautifuly on AWS Lambda where our CI is more expensive than actual server costs. We're a b2b application so traffic is quite low so we REALLY don't saturate the CPU in a normal Fargate setup.
I'm surprised to see a mention of Django when talking about fast startup times! This is one of my main issues with Django at the moment. How big is your project?
Our ~500k lines app takes multiple seconds to start, which is why I'm not really investigating a lambda-style setup... Do you have specific strategies to make startup fast?
fwiw, I know it's the right thing to run tests from a different computer. But it's more annoying. And I hereby tell you that 90% of the time it probably doesn't matter.
Definitely times it isn't true. But if you're not doing a load test bc it's a pita, do it locally. Most of the time I've wanted to do this, all the action is inside the app. Just be careful to acknowledge that there could be limitations / surprises.
Also don't forget that you are load testing all the dependencies of your service. Database, caching tier, external services, etc. Make sure other teams are aware!
Also nothing beats real world traffic. Users' connections will stay open longer than a synthetic tool may hold them open due to bandwidth, they make very random, sporadic requests too. Your service will behave very differently under large amounts of real world traffic vs synthetic.
Other options if you are running multiple web servers is to shift traffic around to increase traffic to 1 host and see where it fails. That is usually a very reliable signal for peak load.
And don't forget to do this on a schedule as your codebase (and your dependencies codebases) changes!