Can't imagine a change like this would be made without some analysis.. would love an internal view into a decision like this, I wonder if they already have log data to compute financial loss from the change, or if they have sampling instrumentation fancy enough to write/deploy custom reports like this quickly.
In any case 2 weeks seems like an impressive turnaround for such a large service, unless they'd been internally preparing to acknowledge the problem for longer
> 2 weeks seems like an impressive turnaround for such a large service
I assume they were lucky in that whatever system counts billable requests also has access to the response code, and therefore it's pretty easy to just say "if response == 403: return 0".
The fact that is the case suggests they may do the work to fulfill the request before knowing the response code and doing billing, so there might be some loophole to get them to do lots of useful work for free...
Have often wondered about this in terms of some of their control plane APIs, a read-only IAM key used as part of C&C infrastructure for a botnet might be interesting, you get DNS/ClientHello signature to a legitimate and reputable service for free, while stuffing "DDoS this blog" e.g. in some tags of a free resource. Even better if the AWS account belonged to someone else.
But certainly, ability to serve an unlimited URL space from an account with only positive hits being billed seems ripe for abuse. Would guess there's already some ticket for a "top 404ers" internal report or similar
Metering feeds into billing and they are some truly epic levels of data volume. You can kind of see the granularity they're working with if you turn on cloud trail.
> Can't imagine a change like this would be made without some analysis.. would love an internal view into a decision like this
Sure, here you go: There was some buzz and negative press so it got picked up by the social media managers who forwarded it to executive escalations who loops in legal. Legal realizes that what they are doing is borderline fraud and sends it to the VP that oversees billing as a P0. It then gets handed down to a senior director who is responsible for fixing it within a week. Comms gets looped in to soft announce it.
At no point does anyone look at log data or give a shit about any instrumentation. It is a business decision to limit liability to a lawsuit or BCP investigation. As a publicly traded company it is also extremely risky for them to book revenue that comes from fraudulent billing.
As someone who has been involved in high level crisis management issues like this multiple times across various companies I can tell you that in a competent organization it looks nothing like your day-to-day decision making as an engineer or PM. Better yet, as few "rank and file" employees are involved as possible to avoid dangerous situations like you just described.
I don't want to debate the merits of what happened, but a prosecutor is going to open with "AWS billed people for things they never asked for or consented to." You're already fighting an uphill battle that it is not fraud.
Now what is going to save you is intent. If your defense is "yeah we identified the problem and corrected it" you're good to go. If on the other hand, someone decides to run a fucking metrics report of how much you could lose by stopping doing fraud and god forbid it is ever seen or mentioned in front of anyone in the decision making path - you now have to deal with mens rea.
If you have material knowledge that someone took "a look at the metrics", shoot me an email. I can help put you in touch with programs that offer financial rewards for whistleblowers.
Are you for real? Legitimately baffled by your comment.
How about the financial losses of customers that could be DDoS-ed into bankruptcy through no fault of their own? Keeping S3 bucket names secret is not always easy.
I prefer your version: Barr replies to a tweet before gatecrashing the next S3 planning session. "A customer is hurting, folks!". The call immediately falls silent with only occasional gasps heard from stunned engineers, and the gentle weeping of a PM. I wonder if Amazon offers free therapy following an incident like this
I was thinking this too. You're giving AWS a lot of credit if you think they're not going to do some kind of analysis about how much they were making (albeit illegitimately) from invalid responses. I'm just surprised that they either didn't do the analysis beforehand or that if they did do the analysis beforehand (like the parent commenter suspected), how they were able to get the report for that analysis out so quickly.
In any case 2 weeks seems like an impressive turnaround for such a large service, unless they'd been internally preparing to acknowledge the problem for longer