How Amazon uses chaos engineering to handle 80k requests per second

aeturnum · on Sept 5, 2023

I'm pretty unclear on the "how" here - but from what I can understand in the article the search resilience team injected properly tagged synthetic traffic into their system to do testing? That does seem like the kind of practice that could be part of healthy holistic approach - but the article elides a ton of details. I suppose the idea is that it promotes AWS services (with the idea of suggesting that this kind of resiliency comes easier on their platform) - but this is a great example of how good writing strips things down to the barest details. I would love to take lessons from it but I think the details actually aren't here.

omgJustTest · on Sept 6, 2023

Oh... "Chaos systems"=="event-driven development". For those confused about an analogy to chaos theory... as far as i can tell, there isn't one. In physics a chaos system has small perturbations that lead to an instability. I would argue this is just a large perturbation that leads to certain instabilities. I would also classify this as "network systems fault tolerance" engineering.

As a stand-alone article it is fine, and is likely to trip the more fluff than stuff alarm on many people's bs-detectors.

losteric · on Sept 6, 2023

Who's drawing parallels with chaos theory? The origin is Netflix injecting "chaos" like taking servers down randomly. It's not tied to any scientific theory: https://en.m.wikipedia.org/wiki/Chaos_engineering

"Stress testing" might be more intuitive but that's already established as simply testing under high traffic

_a_a_a_ · on Sept 5, 2023

> I suppose the idea is that it promotes AWS services

an advert in disguise then?

catchnear4321 · on Sept 5, 2023

anything that uses the template “How X did Y to improve Z by METRIC” and also hosted on company x domain is an ad.

for that matter, any blog post by a company is an ad. maybe not to sell product, but to at least build exposure and familiarity with the brand.

vrosas · on Sept 6, 2023

To add to that, I’ve read enough of my own companies blog posts to know that Y was an incorrect solution to Z, was half-implemented at best, barely improved METRIC, and has already made a lot of peoples’ lives harder for simply existing. So now I take every single one of these posts with a massive grain of salt.

1vuio0pswjnm7 · on Sept 6, 2023

Recruiting ad

ljm · on Sept 6, 2023

The solution to any problem in a cloud provider is another service the same provider offers, as far as they’re concerned.

AtlasBarfed · on Sept 6, 2023

Amazon's ultimate chaos engineering is a working culture that turns over its entire staff every two years and stays afloat.

balls187 · on Sept 6, 2023

Seconded by Search on Amazon.com returns almost entirely sponsored results.

scrum-treats · on Sept 9, 2023

Agreed. The only thing chaotic is the turnover and the level of managerial incompetence.

Weryj · on Sept 6, 2023

Can confirm, working there was Chaos, not sure that's what they meant by chaos engineering though.

hughesjj · on Sept 6, 2023

It's where they delete entire teams and remove funding for critical infrastructure and see how well the system self-repairs

It's just in this case the system is the company itself instead of just a suite of software services

andrewmcwatters · on Sept 5, 2023

I would love to know what software stack, hardware, and uplink connections in total they utilize to accomplish a real-world 80k request per second throughput. How many instances do you guys think Amazon runs for its primary e-commerce front-end stack? In total, and per region? Assuming they have a multi-region rollout.

If it's the real-deal, and not like people saying "Bun.js can serve 65k req/s+ (cough cough to localhost,)" that's impressive.

But I never see anyone talk about real-world numbers. Just synthetic poopoo.

I think I read the article correctly, but I think it only talks about how they introduce "chaos engineering," I didn't recall them talking about how they actually handle a volume of traffic like 80k req/s.

yazaddaruvala · on Sept 5, 2023

> If it's the real-deal, and not like people saying "Bun.js can serve 65k req/s+ (cough cough to localhost,)" that's impressive.

Not all req/s are made the same.

Amazon search is made of 100s of services, and Amazon's search page loads 20 products per page, that means 80k search req/s translates to 1.6 MM product API req/s for example.

FWIW a search request at Amazon hits roughly 100 unique search clusters (think of this as ElasticSearch clusters - but its not ES), with different product groupings in each cluster. Each cluster is made up of 1000s of nodes running Lucene (think similar to ES shards). This is just for the "match set", i.e. which products to return.

Then there are services to re-sort those matched products based on popularity, likelihood of purchase, etc. Think giant ML models. Then there are product lookups. Before all of this, there is Query analysis to simplify/improve the query (think giant ML models) to classify "Apple" into electronics vs groceries based on the other keywords and your current context.

Meanwhile, Bun.js is talking about 65k "hello world" type req/s. The compute per req is magnitudes different.

Tomis02 · on Sept 6, 2023

> Not all req/s are made the same. > Amazon search is made of 100s of services, and Amazon's search page loads 20 products per page, that means 80k search req/s translates to 1.6 MM product API req/s for example.

That's their problem, no? Nobody's forcing them to have an architecture where a request propagates to hundreds of services.

txcwpalpha · on Sept 6, 2023

Why is it anyone's "problem"? Nobody said they're being forced to do it this way - just that they are. And I guarantee that there are thousands of other companies out there that have an API-fanout model as well, and might be interested in how Amazon does it.

I don't get the hostility around this article. Nobody is forcing you to read it or to do it this way. If your system is architected in a different way where you can run your whole system on a single instance, then good for you! But Amazon presumably doesn't have that luxury, and others may not either.

yazaddaruvala · on Sept 6, 2023

> That's their problem, no? Nobody's forcing them to have an architecture where a request propagates to hundreds of services.

I mean, physics kinda is.

Just the volume of data that needs to be hosted, needs multiple nodes. ElasticSearch has some good general documentation of search engines if you want to learn more. In Amazon's case, a general query will hit a fanout of about 10,000x nodes - I wasn't even counting that fanout because it is all technically 1 service.

Meanwhile, Bun.js (no offense to Bun) is built for IO-bounded workloads and 65k req/s is great for that. However, executing the required Natural Language Processing (i.e. multiple ML models) would result in an exhausted CPU (even if Bun could distribute the compute across cores or compute the ML inference on a GPU). I'd be willing to bet even on a great CPU it gets throttled at 10 req/s (at most 100 req/s).

This really isn't some "They should have just done it all in Postgress, Bun.js can just use a nice ORM and be IO-bounded" type situation - which is a philosophy I very much agree with for 99.9999% of usecases.

yathaid · on Sept 6, 2023

My favorite kind of HN answer. 100% snark, 100% certain, 100% wrong.

So, instead of that, what would you have? A single, binary that makes multiple DB calls? Hmmm, let's see, some people have tried that and written extensively about the problems they faced doing that. Wait, one of them is actually a small e-commerce firm named after a larger river. Wonder what the issues were ...

dmoy · on Sept 6, 2023

> That's their problem, no? Nobody's forcing them to have an architecture where a request propagates to hundreds of services.

Well, Jeff Bezos circa 2002 or so did:

https://news.ycombinator.com/item?id=3102800

It's hard to remove that ethos now

hughesjj · on Sept 6, 2023

And thank God he did, amazon would have direct without the SOA enforcement.

Scale issues are emergent issues. It's a phase transition.

NBJack · on Sept 6, 2023

I suspect it grew into that. Amazon has been around for almost 30 years. That's a long time to incrementally change a service ecosystem.

tacozilla · on Sept 5, 2023

Amazon doesn't really have a "primary e-commerce front-end stack" in any concrete sense. They have hundreds/thousands of teams that deploy bits and pieces to a massive pipeline that ultimately makes up what you see on Amazon.com, but each team can have their own infrastructure backing things. Some teams might run everything off a dozen low-end EC2 instances while another sibling team has 3k+ instances; it's really all over the place, and that's ignoring specific events like Black Friday or Prime Day, etc. where teams need to prescale things in advance.

Source: Worked at Amazon/AWS for almost a decade.

andrewmcwatters · on Sept 5, 2023

Thanks for the neat insight!

baby_souffle · on Sept 5, 2023

> I would love to know what software stack, hardware, and uplink connections in total they utilize to accomplish a real-world 80k request per second throughput. How many instances do you guys think Amazon runs for its primary e-commerce front-end stack? In total, and per region? Assuming they have a multi-region rollout. > But I never see anyone talk about real-world numbers. Just synthetic poopoo.

The number probably changes all the time based on load. They'll never release these details because it's a competitive advantage to have the "how popular are they in $place at $time-of-day" data private.

When they do share numbers, it'll always be the most flattering and devoid of any context beyond the "wow" factor.

tacozilla · on Sept 5, 2023

While I can understand the cynicism, the real answer is a lot closer to something much more boring, which is most people just don't care about the actual numbers, and if they were to release them, while interesting to a small few, generally no one would actually care.

There's also a common misnomer that Amazon.com is somehow just this one giant app running on a set of servers, which isn't remotely how it's actually deployed, and that's before we spend time arguing whether a team's instances even count as "primary e-commerce front-end stack" or not. :P

hughesjj · on Sept 6, 2023

Also their "stack" includes various degrees of AWS (maws, naws, and I'm sure a bunch of snowflake situations innumerable here, I mean do you count corpinfra? Controls?)

Not to mention all the data center logistics

andrewmcwatters · on Sept 5, 2023

I'm not sure it's the biggest deal in the world, plus real-world market presence data is regularly detailed in shareholder reports for various companies.

You could take low-end instance specifications and standard industry stacks and extrapolate forward how many instances they might need to maintain at a maximum, but those numbers are going to be off.

Are they running 400 low-end front-end instances across the globe? Probably, (plus the 40 or so other services they claim to need, multiplied by region count at a minimum) and that would actually be well below realistic and reasonable for a company like Amazon. You can take a bunch of regional instances that handle roughly 200 req/s and make that work.

superfrank · on Sept 5, 2023

I used to work on a system that did about 55k/sec at peak. The service was internal it was only handling grpc calls which were coming from inside our VPC and it was written in Go. It's main job was was reading and writing to an SQL db that was sharded across 3 or 4 of the biggest instances AWS offered at the time (2017ish).

Everything was Dockerized and I think we were using Docker Swarm for container orchestration. I don't remember the specs for each box, but we had auto scaling set up so at peak we'd hit a little over 200 containers.

Looking back now, I'm sure we could have gotten much better performance out of that service, but the team was young and inexperienced and throwing money at the problem was an easier solution.

andrewmcwatters · on Sept 5, 2023

Sounds about in the ballpark I would expect, though!

Karrot_Kream · on Sept 5, 2023

Having led API teams at a big tech where we handled similar (slightly lower) request numbers at the edge, I can tell you that the entire stack was engineered to be defensive and handle a certain amount of load. As other commenters have said, fan-out means that 80k qps at the edge means you're getting probably 10-100x that on some of the most heavily hit backend systems. A lot of the work we did was very aggressive caching and sharding. Autoscaling to handle load spikes.

Observability was our secret sauce. We would monitor everything. Our caches, NICs, our load balancers, etc. Cache hotspotting and DB problems were the problems that kept us up at night, though my teams didn't deal with much stateful data.

ilyt · on Sept 6, 2023

Well, to get actual impact you'd need infrastructure-wide tracing and that's hard.

Like, you could hit a cache and serve 95% of page from it. Or hit some long path that will burn half a second on 20 servers in the backend to serve some big query

But yeah title is useless

hughesjj · on Sept 6, 2023

I mean, coralmetrics+pmet has been doing that infrastructure wide tracing for decades (albeit being slowly replaced in spots now).

Back in 2014 they were still sharing a few years old (at that time) detail-level service call graph that had so many nodes and lines it looked like string art.

offmycloud · on Sept 6, 2023

Amazon product search results are the chaos.

moribvndvs · on Sept 6, 2023

1000 sponsored links that are exactly not what you searched and filtered for.

baja_blast · on Sept 6, 2023

And a hundred different identical products at vastly different prices from brand new sellers like `z-qq-yadonk-8771` that some how has 4.7k reviews at 4.5 stars.

efitz · on Sept 6, 2023

I feel like my eyes were just assaulted trying to read that white-on-gold mobile page. I was unable to read the article due to the accessibility-hostile theme that disables reader mode on mobile Safari.

DevKoala · on Sept 6, 2023

Some background on the challenges around e-commerce search/product discovery/matching.

https://aws.amazon.com/blogs/industries/building-blocks-for-...

80k RPS is not just for a single database lookup, don’t be silly.

truetraveller · on Sept 5, 2023

Surprised that 80K/second is called "massive" for Amazon.com's main search feature.

andrewmcwatters · on Sept 5, 2023

It dawned on me that in web software, people talk about req/s from two entirely different perspectives and it's borderline fraud:

req/s from localhost to localhost, and req/s from the Internet to any user.

The latter is actually interesting. People saying you can get 10k req/s from Node.js is stupid. You're not actually getting that on say, a single low-end instance over the Internet, which is what most developers are actually going to do.

Instead, you'll get two orders of magnitude fewer requests per second.

What Amazon is talking about here is most likely non-synthetic, real-world 80k requests per second. Which is actually a decent job.

adamckay · on Sept 5, 2023

> People saying you can get 10k req/s from Node.js is stupid.

No, it's not, for exactly the reason you state:

> You're not actually getting that on say, a single low-end instance over the Internet

Some languages are, of course, more efficient, but it doesn't matter - you can get very good performance out of any language/runtime - it's all about your architecture and infrastructure.

ndriscoll · on Sept 5, 2023

Where are you saying the difference would exist? I haven't seen local network tests be worse than localhost (usually it's better since the client uses a lot of CPU itself). Why would Internet latency matter? TCP ACKs should be done by the loadbalancing appliance, so they'll be low-latency for the application. TLS handshakes should also be offloaded to the appliance.

From what I've measured, code I've written performs around the same in production as it ran locally given similar hardware. If you're deploying to a VM with 3000 IOPS and 1/2 a CPU core, obviously it's going to run like garbage. If you wouldn't run your business on a raspberry pi 3, you probably shouldn't be running it on an AWS xlarge instance either.

nonameiguess · on Sept 6, 2023

These aren't requests for TCP ACKs to establish a session, nor even requests for a simple static resource. They're requests for the live status of an inventory of physical goods spread across thousands of distribution centers on six continents that are themselves gaining and losing thousands of products per second. A system that can return a reasonably accurate view of that state 80k times per second is not the same thing as a system that can send 80k http responses with "Hello ${NAME}" per second.

ndriscoll · on Sept 6, 2023

I'm not talking about just establishing a session. My questioning there was just why Internet vs. local would be different. On a local network, I've gotten 70k json CRUD requests out of a netty based service + postgresql with 4 cores and a single SSD.

I imagine search is more complex and expensive than CRUD, but 80k isn't something you can only do with a "hello world" tier application.

coffeebeqn · on Sept 6, 2023

It’s not the best metric. Just responding to 80k req/sec with static in memory content is easy nowadays. If there are some complex database queries you have to finangle 80k/sec then that’s the interesting part

srcreigh · on Sept 5, 2023

Google search is only 99k qps

rrdharan · on Sept 5, 2023

Bigtable does 6B QPS though…

https://cloud.google.com/blog/products/databases/youtube-run...

tpmx · on Sept 5, 2023

Huh.

About a decade ago Opera Mini did 150k transcoded full pageloads/s (times about 30 inlines per pageload that was the average back then, so about 4.5 million requested/loaded/processed/compressed HTTP resources/s).

(All of the public Google Search numbers I've seen have seemed one or two orders of magnitudes too small. Or maybe most people don't use their search engine/browser as much as I do, so my perspective is skewed...)

nextworddev · on Sept 5, 2023

I’m sure there’s massive variance and seasonality around that number

tpmx · on Sept 5, 2023

From my experience with that scale of traffic (with Opera Mini at the time about 250M MAUs and 150k full pageloads/s):

There is surprisingly little seasonal variance. You have your weekly/daily traffic rhythms based on when your users are awake/active based on their geographical distribution and that's mostly it.

"World events" also have very little impact - they tend to barely make a dent in that massive background noise.

Before we had large volumes of traffic I thought we'd be seeing all sorts of unusual peaks, after a few years I realized growth at scale tends to become boring (but in a good way).

dilyevsky · on Sept 6, 2023

That number is off by more than one order of magnitude at peaks

rcme · on Sept 5, 2023

That's pretty crazy. At a big social media company, a service I ran got 300K+ requests per second directly from end users.

yazaddaruvala · on Sept 5, 2023

Not all requests per second are made the same.

For example, it is just as true for this title to have said "How Amazon uses ... to load 1.6 MM requests per second, from just the search page."

Each search page load, is 1 request to the search backend, but 20x request fanout to the product's key-value store to render the images and titles, etc.

Scalene2 · on Sept 5, 2023

Did the math, that's about 7 billion searches a day. That doesn't sound like a lot.

WrongAssumption · on Sept 5, 2023

I mean, google does 8.5 billion per day. What would be a lot to you?

marginalia_nu · on Sept 5, 2023

To be fair, a lot of various URL bars and input fields turn other activities into implicit Google queries.

umpalumpaaa · on Sept 5, 2023

There are 8 billion people on the planet.

akdor1154 · on Sept 6, 2023

Wait, your Search feature uses 40 microservices?

yathaid · on Sept 6, 2023

Okay, describe your architecture and let's see how many holes it has.

latchkey · on Sept 5, 2023

I feel like Amazon search is one of the worst products I've ever used. It is a clusterf/ck of paid advertisements and obviously gamed results. I don't care how many requests/sec you get. If the results are horrible, what does it matter?

yazaddaruvala · on Sept 5, 2023

Disclaimer: Used to work at Amazon.

Funny story, internally Amazon Search doesn't consider the ads products to be part of the "search results". It is tracked and accounted to Ads.

The way Ads are handled on Amazon is really poorly done. The Ads teams claim to make a lot of money (and based on the internal accounting tricks they do), and as such have been pushing Amazon's leadership to go more into Ads, even tho every person I've met that worked at Amazon also hated the prevalence of Ads.

Literally, Directors and VPs at Amazon are afraid to step on the toes of Ads' leadership team because of how well they have told the story about "Ads is excessively profitable".

Meanwhile, all of us in the thread can easily say, even if it is short term profitable, it most certainly is not long term profitable for Amazon.

Both from internally and externally it has been very disappointing to watch actually.

acchow · on Sept 5, 2023

Sure the ads aren’t great. But that’s almost barely a problem compared to the gamed reviews

SteveNuts · on Sept 5, 2023

> But that’s almost barely a problem compared to the gamed reviews

Which pales in comparison to the problem of counterfeit goods, IMO.

I can at least somewhat comb through the reviews to look for outliers of well written reviews. Getting something that's obviously a fake (has happened to me multiple times) is completely unacceptable.

Newegg has this issue too, I got a knock-off Intel CPU there once, I was furious.

wincy · on Sept 6, 2023

What the heck, what does a knock off Intel CPU even look like? Who is making them and more importantly, how?

NovemberWhiskey · on Sept 6, 2023

It's more like: old being sold as new, engineering samples sold as production parts, relabelling lower spec parts as higher spec on the packages etc.

malfist · on Sept 5, 2023

You can complain about two things you know. Just because search is bad doesn't mean reviews can't be bad too. Or <insert pet peeve>

nijave · on Sept 6, 2023

So... The advertising team is good at marketing?

beebmam · on Sept 5, 2023

I'm sick of it being impossible to identify cheaply made products from high quality, durable products on Amazon. The rating system is flat out broken and there's an entire industry built around gaming those ratings.

I'm at the point that I rarely ever buy products on Amazon anymore. It's a total disgrace. On an ethical level, I wish I had the ability to say "I only want to be presented with results that weren't made in China or other slave societies".

amzthrow · on Sept 5, 2023

I understand how you feel. It can be frustrating.

Contrary to popular belief Amazon actually does put energy into making sure products are responsibly sourced. Products are de-listed if they’re found to come from unethical sources.

To take that even further take a look at Climate Pledge Friendly. Those are products with (at least one) third party certification. These certifications don’t just further climate goals. Social responsibility is also considered. Including worker conditions and product durability. You can filter search results by this attribute. Admittedly it can be hard to filter for specific certifications.

https://www.amazon.com/b?node=21221607011

nijave · on Sept 6, 2023

Amazon can simultaneously do due diligence on a subset of products while completely ignoring the rampant fraud and knockoffs of many other products.

NoPicklez · on Sept 6, 2023

Buying from ethical sources is great, but that doesn't mean those products aren't crap either

amzthrow · on Sept 6, 2023

Take a look at the certifications. It’s a signal.

beebmam · on Sept 6, 2023

A giant portion of Amazon's products are made in China by a populace that's enslaved by their totalitarian dictator. Please allow me to identify which products are made in China, and other communist/totalitarian states, so that I can exclude them from my searches.

baz00 · on Sept 5, 2023

The trick is to find your products somewhere else and then look them up on Amazon for a price comparison.

0cf8612b2e1e · on Sept 6, 2023

Is that Amazon’s fault? So many once reputable brands have been MBAed to death and are now indistinguishable from the bottom tier garbage. It is near impossible to do real product research on anything, anywhere.

nijave · on Sept 6, 2023

Surely a company with the resources of Amazon can determine fake reviews from real ones. Especially those bait-and-switch where the product is changed and half the reviews aren't even for the current listing.

digging · on Sept 5, 2023

These days it should be assumed that the quality of anything bought on Amazon is dogshit. Which, even though I canceled Prime years ago for ethical reasons, is honestly the biggest reason I won't even click an Amazon link.

wombat-man · on Sept 5, 2023

'chaos' is a great way to describe the results, to be fair.

latchkey · on Sept 5, 2023

technically they said 'chaos engineering'... which obviously means they use monkeys slapping their hands on keyboards to write the code that returns the results.

xnx · on Sept 5, 2023

Is there any alternative front end for Amazon? AI with some image-similarity smarts could do a much better job grouping/de-duping similar products.

tkahnoski · on Sept 5, 2023

This would be awesome... When I'm in a rabbit hole trying to find a product and I see three or four of roughly the same design I know not to bother any further unless I can find a reliable manufacturer website.

RockRobotRock · on Sept 5, 2023

Working as intended.

waynesonfire · on Sept 5, 2023

source: https://www.wsj.com/articles/amazon-changed-search-algorithm...

tree666 · on Sept 6, 2023

Your article shows you put in a lot of work, and was written well. But... A few thoughts... Service Owners usually tend to design and build their systems in a way which allows them to understand and know how the system will behave under the worst conditions. It is not difficult to test these conditions either. A bash script utilizing curl will suffice.

If you have to use 'Chaos Engineering' to experiment your way into innovation, this is a sign you built your service wrong. What will Amazon Re-Invent next!? I am guess the wheel. Well written article though.

NBJack · on Sept 6, 2023

I might be misunderstanding what you consider one, but in my experience a bash script driving curl is great for some types of API or front-end load testing.

However, it won't necessarily help you know how your system will behave if S3 kicks the bucket in us-east-1 (again), your image host for that super-cool Kubernetes cluster suddenly throttles you during a critical restart, or your other service of choice went down due to an expired certificate.

If you however mean to use it to perform a denial of service on an endpoint you don't own, you're more hard-core than I thought.

hughesjj · on Sept 6, 2023

> Service Owners usually tend to design and build their systems in a way which allows them to understand and know how the system will behave under the worst conditions.

With you so far

> A bash script utilizing curl will suffice.

Lol hell no. Yes, AWS/amazon does require a "GameDay" before launching a service which will execute an mcm (managed change management) that's basically a runbook of (way more in depth and comprehensive way to test your service than a single bash script with curl), but chaos engineering is a great additive, additional verification mechanism that really helps with service outages.

How are you going to test a thundering herd with a bash script executing curl?

How many machines are you running this simple bash+curl script on anyway? Using a single node to generate requests isn't going to do much in testing a service's reliability.

Also, they literally did invent a wheel

https://aws.amazon.com/blogs/opensource/the-wheel/

yathaid · on Sept 6, 2023

Is your bash script open or closed? [1]

How does a bash script simulate network failures, connectivity drops or a gray-failure in one of your dependencies? [2]

A lot of thought has been put into this domain. Dismissing that without understanding any of the complexities is just showing your ignorance.

[1] - https://brooker.co.za/blog/2023/05/10/open-closed.html

[2] - https://docs.aws.amazon.com/fis/latest/userguide/fis-actions...

bananapub · on Sept 5, 2023

why would they need to do anything - including blogging - to handle 80kqps, that's ~one machine nowadays

yathaid · on Sept 6, 2023

I am sorry, do you think building a reverse index for a billion products fits in one machine? Is that seriously the amount of thought you put into this comment?

cedws · on Sept 5, 2023

Your average enterprise Java backend cannot handle that.

ilyt · on Sept 6, 2023

Yeah you're 10 years out of date on that too

winrid · on Sept 6, 2023

on what, a raspberry pi? with what db? with what indexes? with what threading strategy? with what...

bananapub · on Sept 5, 2023

and it doesn't need to, so why would that matter?

pelorat · on Sept 6, 2023

M80K requests per seconds were doable in the 90's and you didn't need a super computer for it.

boredtofears · on Sept 5, 2023

Did someone mean to post this about ten years and accidentally set the publish date too far out?

hexo · on Sept 5, 2023

only 80k?

bdamm · on Sept 5, 2023

Can anyone calculate the dollars-per-request revenue and profit? Would be interesting to see how much it costs Amazon to make money and make some connection to the request rate.

NBJack · on Sept 6, 2023

Back in The Day, rumor had it the detail page hosting/rendering would easily max out a single machine after only a handful of queries per second. I regrettably can't verify the truth of that, or if it still holds, but there is a LOT going on with a given /dp . "How many of those requests translate to actual purchases?" is the next question.

dastbe · on Sept 6, 2023

i do recall the simplestack version of dpx had some obscene issues with garbage generation, followed by some obscene issues with throughput due to aggressive locking (to avoid the obscene amount of garbage generated).

NBJack · on Sept 6, 2023

I am not surprised. That platform was a neat concept, but wow, the nesting and resource consumption was atrocious. There was potential there, yet the implementation in Java made IMHO some fundamentaly flawed assertions about small objects. I unfortunately know of a great director who had to find new opportunities outside of the company over it.

timeagain · on Sept 6, 2023

IIRC there was some memory leak issue in the custom rendering engine/language that was being used and for a time the solution was to reboot any VM that has taken more than $x requests. $x was a very small number

slipshady · on Sept 6, 2023

My professor at Cornell whose lab Werner Vogel (Amazon's current CTO) worked loved telling this story. Maybe there's some veracity to it.

farhanhubble · on Sept 6, 2023

This is just plain old testing, no?

bluedevilzn · on Sept 6, 2023

When I was at Amazon, I thought this was an insane scale. Then, I joined Google to learn how they serve couple orders of magnitudes higher.

slipshady · on Sept 6, 2023

DynamoDB served upwards of 100MM tps during Prime Day '23 - https://aws.amazon.com/blogs/aws/prime-day-2023-powered-by-a...

hughesjj · on Sept 6, 2023

this is low acale for any given public api at amazon. You should see how much kinesis/ddb/iam/sqs/cloudwatch handles (in total per region)