Burnt $72k testing Firebase and Cloud Run and almost went bankrupt

onion2k · on Dec 10, 2020

The fact that cloud providers don't have a simple "This is how much I can afford, don't ever bill me more than that!" box on their platforms makes development a lot scarier than it really needs to be.

donmcronald · on Dec 10, 2020

This is my worst nightmare. Lol. I guess now is a great time to give Azure a shoutout for sitting on their hands for 8 years without so much as a response to the community for half a decade [1].

At least AWS allows using a prepaid credit card so they’ll need to call me if things go haywire. I bet if that $72k charge went through it would have been much harder to get out of. “Sorry, we don’t have the money” is a much better negotiating position than “can we please have our money back?”

1. https://feedback.azure.com/forums/170030-signup-and-billing/...

dathinab · on Dec 11, 2020

But then consider following hypothetical but possible scenario:

Sorry until you pay, no more Amazone services for your company.

Now you must move to a new cloud provider (or make a new company).

Oh, wait they now interchange (bad) customer information to better find fraud and you just got marked at "owning a lot to amazon" so no cloud for you anymore at any provider.

Now you want to buy your own hardware. So you need a credit from the bank, but dang, your owe to much to a big company and the bank now, so no credit for your company either.

While part of above's scenario is luckily not how reality currently works. But then who knows when (part of) such a horror scenario becomes reality.

In the end relaying on forcibly not paying back money you contractually own is just not a very viable strategy in my view.

betwixthewires · on Dec 11, 2020

I see no reason why such an arrangement couldn't be optional. Different projects, teams and people have different needs, cloud computing services are marketed specifically on this point. It makes no sense that there isn't even an option in Firebase or AWS to immediately stop services over a certain amount. The current a situation is ripe for lawsuits IMO.

dvfjsdhgfv · on Dec 10, 2020

> “Sorry, we don’t have the money” is a much better negotiating position than “can we please have our money back?”

I agree but why would you like to be in either position anyway? The so-called cloud services are terribly overpriced when compared to traditional servers.

treeman79 · on Dec 10, 2020

Done correctly they save a lot of IT time.

Seem companies hire five 6 figure people to try and cut amazon bill by a couple of grand a month.

Never understood spending 50-100k a month to maybe save 5k

foolmeonce · on Dec 10, 2020

>Done correctly they save a lot of IT time

Not really, computing done correctly is about avoiding all of the pitfalls and finding ways to get zero cost benefits, free computation out of necessary redundancy, etc. Selling cloud computing is about creating options around every pitfall and finding ways to charge for every mitigation that will be necessary and charge for redundancy in the mitigation strategy for the mitigation strategy..

Even if you pay for all the redundant managed blah they offer to not lose your business by having any single point of technical failure in their network, their billing and IAM are your single points of failure, if you diversify to multiple clouds all the guarantees either cloud offers is now pointless redundancy so you are paying 10X pricing for an inadequate redundancy layer.

If you look at Google's own model for computing, they didn't fall for this themselves, the computers they used were intentionally unreliable to not recursively pay for reliability and redundancy at any layer that can't provide the needed guarantee.

You can basically go all in with one of these clouds and become a franchise add-on with roughly the same rights as your average mcdonald's store owner, or you are managing a strategy that is far more complex because of the complexity of these offerings than just using metal and free software.

betwixthewires · on Dec 11, 2020

They're very useful if you're testing a concept, need agile scaling of computational power, or are just starting a service and don't want to / can't invest the capital in dedicated hardware. I agree with you on your last point though, making your service entirely dependent on these services makes you little more than a franchise and is a potential vulnerability if you ever compete with any important existing service. It probably isn't a good idea for a mature or rapidly maturing business to rely heavily on these services.

Retric · on Dec 10, 2020

It’s often a fixed vs ongoing cost question. Spending 200k to save 5k per month breaks even in 3.4 years.

However, for growing companies that 5k/month AWS premium can hit 200+k/month very quickly

serial_dev · on Dec 10, 2020

It is baffling why cloud providers don't have that option.

I might want to have an app because I don't mind spending 50 dollars on my pet project as a hobby, but I don't ever want to spend more than that. Not if I write a wrong query that's suddenly becomes very expensive, not when I got attacked, and not even when I have legit users.

By the way, the same goes for some companies, too, just the threshold would be different.

greatgib · on Dec 10, 2020

It's not complicated to add configurable hard limits for these companies but they don't allow it because the current situation is more interesting for them.

They want to suck the maximum money from consumers before they realize.

For one person that will complain wildly and having to do a gesture, there are hundreds other companies that will not notice or just pay without recourse.

ctvo · on Dec 10, 2020

> They want to suck the maximum money from consumers before they realize.

This is a naive understanding of how corporations like Google and Amazon work. Bad will and using gym membership tactics aren't how they scale or make money. Getting you to confidently try things knowing you won't get charged (the reason they have those free tiers) so you'll get your company, your start-up, your next side project on it is much better for business.

It's a miss that things like this aren't implemented and widespread, not by design.

> It's not complicated to add configurable hard limits for these companies but they don't allow it because the current situation is more interesting for them.

I'm not in this space, but from my observations:

- Each service has a different billing model and metering model. Most likely this data is held by the service. I'm familiar with AWS so I'll use them as an example. I'd wager only DynamoDB or only Lambda (the service owners) know how much of those services you've consumed

- Billing is most likely reconciled asynchronously after collecting all data from all services by an entirely different department with knowledge of payments and accounting

- GCP, AWS, Azure launch 50+ services a year

- Each large customer most likely has a special rate. I bet Samsung or Snap pay an entirely different set of rates than the normal customer. There are thousands of these exceptions

- Cutting your service off when your over the limit is an incredibly complex set of edge conditions. Your long running instance hosting your critical service is shut off because of experimenting on a new ML workflow?

Even with only the above I can see the difficulty in globally limiting your spending limit at an accurate level. I know there are features for both AWS and GCP and they try.

It's easy to stand on the sidelines and handwave away technical complexity at scale, but I'd encourage you to give all of these providers a more charitable view, at least on this topic.

AntiImperialist · on Dec 11, 2020

>Bad will and using gym membership tactics aren't how they scale or make money.

Except they do that with their actions.

>Cutting your service off when your over the limit is an incredibly complex set of edge conditions.

Sure! But if they cared about customers as you claim, they'd let users set hard limits, and when one of these mishaps happened, stop the services when their system eventually knows that the quota has been exceeded... and, make the user only pay the hard limit as the maximum amount. If this continues to happen, warn the user that their account will be terminated... and that's that. But they'll never do that.

Most of their clients pay for these mistakes because they don't have the reach or skills to make this a viral social media article to get people's attention and hence get them to forgive the costs.

I'm sure they know how much they make in revenue because of these mistakes and they deliberately don't do anything about it.

beoberha · on Dec 10, 2020

I work in this space and you’re absolutely correct. Your last paragraph hits the nail on the head for pretty much every complain people have about the public clouds.

patrec · on Dec 10, 2020

Right, so let's say Congress passes a bill that requires cloud providers to enable hard spending limits by start of February 2021, and eat any extra usage costs that exceeded a set limit.

What is your educated guess by when this feature would be essentially correctly implemented in AWS and GCP (essentially = negligible costs to the providers due to either false negatives (bills they eat) and false positives (PR fallout, when SomeSite gets shutdown despite not being over limit)?

Paul-ish · on Dec 10, 2020

The fact that the dashboards and alerts have a delay sounds like there might be difficult consistency stuff going on. Many nodes need to coordinate their usage and billing. It may be a difficult problem, but solving billing problems might not really motivate anyone at the company. It's not a "cool" problem for engineers and not profitable for product.

phkahler · on Dec 10, 2020

>> The fact that the dashboards and alerts have a delay sounds like there might be difficult consistency stuff going on.

I think that's true. It's easier to measure usage and aggregate that data after the fact than to meter it in real time and stop at a limit. Those are very different things. What happens if you hit the cap while running multiple processes spread across a cloud?

One improvement might be to throttle things as the cap approaches but that doesnt really change the problem at all. Do that and have provider eat any overages should solve it from the user point of view.

tpetry · on Dec 11, 2020

Theres an easy solution: You set a limit and everytime a service needs to spend some many it allocates a small portion of the budget and after some threshold it will put unused money back to the budget. The only downside is that your spending limit will be reached to optimisticly, but i prefer that to paying thousands more than i wanted to. Knowing the system works like maybe a lower and higher threshold for the budget could be set.

abawany · on Dec 10, 2020

Every time a GCloud rep would ask us about what we need, we would say: fix the billing interface. As far as I know, it never got fixed. The feelings I would get when looking at cloud billing interfaces can be summed as: obfuscated, like a pawnshop, and caveat emptor. I kind of came to the conclusion that if the cloud giants are not fixing their billing interfaces, then just like Amazon not sending you the details of the items you ordered by email and thus causing you to use the app to help with primenesia, there is a 'business' reason why the billing interfaces are generally incomprehensible.

onion2k · on Dec 10, 2020

They want to suck the maximum money from consumers before they realize.

I have very little money so I just don't use their services because a mistake would be disastrous. They might be losing out on me making a unicorn app on their platform. It's unlikely, but while the possibility of catastrophe exists I'll stick to not using them. That extends to not recommending anyone uses them either in case the worst happens.

mikestew · on Dec 10, 2020

I have very little money...

Then the harsh reality is: companies don't care. Yeah, your app might turn out to be a unicorn, but the overwhelming odds are that it won't. And no one cares that you'll tell your other broke friends to avoid the service.

We'd all like to think it to be different, that a company might care about appeasing my broke ass. But as already pointed out, they want the whales. I also wonder, despite the number of years "cloud services" have been around, if companies aren't still trying to figure out a gazillion other things and limiting customer spend might be a bit low on the priority list.

thanksforthe42 · on Dec 10, 2020

Meanwhile this leaves an opportunity for a different company to provide these services.

I do my best to avoid FAANG giants who don't think about me.

renewiltord · on Dec 10, 2020

The highly price sensitive customer will force you to compete only on price. That's just forcing yourself into a commodity market. It's bad business. I would never try to cater to that market. Very dangerous. Competition will drive margins down to near zero.

brundolf · on Dec 10, 2020

For hobby projects you probably don't need auto-scaling, and should use a provider that charges a fixed monthly rate. You'll "waste" a little bit of money on unused uptime, but for a hobby project it will be a minuscule amount.

perfectspiral · on Dec 11, 2020

But then I don't get to be a "serverless hero" and write blog posts about how my side project (that no one uses) costs $0.000034 to run instead of $5.

uoaei · on Dec 10, 2020

> It is baffling why cloud providers don't have that option.

...is it? If a lazy dev leaves their corporate account open and you can bill it for their negligence, protected by the contract you already signed, you earn a lot of money. From a purely business perspective, it is stupid(!) to provide a stopgap for that.

Edit: to be clear I am not advocating one way or the other. But it is surprising that people are "baffled" by this obvious profit optimization.

codechicago277 · on Dec 10, 2020

Google is around a trillion dollar company, your $75,000 is a completely immaterial amount to them. Not to mention it would be a one time payment that would drive away customers and lead to bad PR like this post.

AntiImperialist · on Dec 11, 2020

Except Google has 4 million customers. If 1% of their customers made a 75k mistake, they make 3 billion dollars.

uoaei · on Dec 11, 2020

So... where else are you going to go if you don't like these policies?

If everyone has this policy, Google, Amazon, Microsoft, and the rest are in a good place. And suddenly it's the "industry standard."

This hypothetical is already enacted today...

sandGorgon · on Dec 10, 2020

Google does have this feature https://cloud.google.com/billing/docs/how-to/budgets-program...

Here's the specific example https://cloud.google.com/billing/docs/how-to/notify#cap_disa...

ggthrowaway2020 · on Dec 10, 2020

As a former victim to the same issue as OP, I am furious every time I see a Googler promote that as a solution.

In our case, we racked up a $10000 bill on BigQuery in ~6 hours, when a job was failing and auto-retrying.

We had set up every alert correctly and our reaction time was about 5 minutes (about $100 of usage, no big deal). So how did we get a $5000 bill? Google's alert was 6 hours late (according to them, this was root-caused to us, because we were submitting jobs continuously). They pointed to their TOS and said they don't guarantee on-time delivery of the alert.

I had to write up a blog post with fancy graphs and prepare it for social media before they finally agreed to eat the bill.

dathinab · on Dec 11, 2020

(By now) GCP documentation says billing alerts can be late by days. Yes days not just hours. Totally crazy.

EDIT:

Link (see blue info box a a bit below the anchor on which the page is opened):

https://cloud.google.com/billing/docs/how-to/notify#cap_disa...

donmcronald · on Dec 11, 2020

> Recommendation: If you have a hard funds limit, set your maximum budget below your available funds to account for billing delays.

But if 1,000,000% lower doesn't work ($7 vs $70k) then...?

sandGorgon · on Dec 11, 2020

P.S. so im not a googler.

you misunderstand the intent of this - you basically set this. even if it fails (because messages are delayed), google will refund.

This has happened to us before - they do a refund - since you had set the limits correctly. In general, they are not super assholes. I actually dont know a case, where they have refused to refund.

AWS is better here - since GCP doesnt have a support dashboard. So the "chasing them" experience is much worse.

FeistySkink · on Dec 10, 2020

Is there a public postmortem anywhere? Your message points to 'no', but just in case.

lights0123 · on Dec 10, 2020

> There is a delay of up to a few days between incurring costs and receiving budget notifications. Due to usage latency from the time that a resource is used to the time that the activity is billed, you might incur additional costs for usage that hasn't arrived at the time that all services are stopped. Following the steps in this capping example is not a guarantee that you will not spend more than your budget.

This looks like it has the same problems as the post, because it also relies on those budget alerts that can happen a long while after you've exceeded them.

mgkimsal · on Dec 18, 2020

Very late to the post, but this seems like "eventually consistent billing". distributed systems seem to rely on "eventual consistency" but... "eventual consistency" is not what most people want in billing threshold scenarios...

modeless · on Dec 10, 2020

"Following the steps in this capping example is not a guarantee that you will not spend more than your budget."

"Resources [...] might be irretrievably deleted."

Also it's not automatic, you have to manually write code to do it, and test it, and make sure not to break it.

A reasonable implementation of this feature would be built into the console, guarantee a maximum spend, not require writing your own fallible code, and provide an option to preserve storage (at normal cost) so that all your data isn't deleted when your compute/API stuff is shut down.

asciimike · on Dec 10, 2020

Extremely technically, the only GCP product that had this feature was App Engine Standard v1, but looks like it's deprecated as of the end of 2019 (https://cloud.google.com/appengine/docs/managing-costs#chang...)

NegativeLatency · on Dec 10, 2020

Probably hurt revenue ;)

asciimike · on Dec 10, 2020

As a former App Engine PM who spent a lot of time with billing/quotas (though, not the one who deprecated this feature), it's likely due to some combination of:

- hard limits caused downtime more often than they prevent these blog posts

- hard limits were inconsistently enforced, even within GAE

- platform wide quota notifications were implemented (reached "GA"), leaving the question of "how a developer wants to handle this" to the developer, not the platform

- maintenance burden

The "I bankrupted my startup by running tests in an infinite loop" blog posts happen ~once a year, while the number of customers (including internal teams!) who inadvertently went down because of this quota was staggering. I feel like I used to see one a week, at least. Most often someone on the team was like "oh I'm going to turn this down to zero because we don't want to spend any money during development", never told anyone, and then they go live and they forgot to turn the knob back up (or didn't properly estimate traffic/costs and set it too low).

I can tell you it hurts revenue a lot more when a large customer goes down for 15 minutes due to quota issues and their usage drops to zero (both in terms of revenue and customer credibility) vs when tiny developer accidentally blows through 10k in a month and we refund it (since, obviously, the providers cost is a lot less than that).

Retailer · on Dec 11, 2020

Personally, I don't think this is a good enough reason. Worst case, if I experience an unplanned shut down, I will increase my spending limit. Removing the feature entirely because of this just doesn't make sense.

When I also think of the fact that Google tied it to requiring a credit card for almost every single transaction even if it is free gives the impression that it is for financial purposes (aka a way to get more out of developers or those who might be free-loading on the free tier of App Engine)

daveidol · on Dec 11, 2020

I gotta say that seems like a bad reason to remove the feature. If someone intentionally set a hard spend limit - hit it - and their service went down because of it that's not Google's fault. The simple solution for that customer is to just turn off or increase the limit.

_mj78 · on Dec 12, 2020

This is a reasonable way of achieving the balance needed. My company would freak out if we had even a short outage that affected all our customers because we set a billing quota too low. And I'd feel a lot more comfortable experimenting with serverless on my own projects if I knew Google would have my back if I made one of those once-in-a-year mistakes.

mrtksn · on Dec 10, 2020

OP claims that the budgets are not real time, they are eventually accurate but if it happens that you spend too fast you may end up with a larger than your budget sum before anything triggers.

jasonpeacock · on Dec 10, 2020

It's surprisingly complex to do that. Let's take a simple example and say your cloud account is doing 2 things - compute & storage.

Compute is an active resource, when you exceed your budget it can be automatically shutdown.

Storage is a passive resource, when you exceed your budget it can be automatically....deleted? That's almost always the wrong action.

Providing fine-grained cost limits help some, as passive resources usually don't have massive cost spikes while active resources do, so you can better "protect" your passive resources by setting more aggressive cost limits on the active resources.

This quickly gets more complicated. Another example is most monitoring services are a combination of active (actual metric monitoring) and passive (metric history) resources. A cost limit on that monitoring service likely won't provide sub-service granularity, mostly depending on whether the service even has different charges for monitoring vs history.

Oh, also, even for a passive resource like storage, you also have active resource charges whenever you upload/download your data.

Ugh, what a mess. The best thing to do is pay attention to your spending, just like you do with your personal & corporate budget.

AngusH · on Dec 10, 2020

But we've had disk quotas before that mostly worked?

If anything it seems an easier problem than processor time.

I recall disk quotas on shared systems at university back in 1998 and I'm sure they existed before that.

Two thresholds IIRC, one at which you get a warning, second at which you can't write any further and the disk write operation fails.

I don't think they deleted files, it was just you couldn't write more than [quota] bytes to your disk.

Is there something particular about cloud based systems that prevent this from working?

ie. is this a specific problem with distributed storage?

edit:tone

robrtsql · on Dec 10, 2020

S3 costs money to keep your files in, even if you're not touching them, so just preventing further uploads wouldn't do much to prevent your AWS bill from increasing.

codechicago277 · on Dec 10, 2020

It would let you set an upper limit on the price you pay though. Better than accidentally misconfiguring a logging service and writing gigabytes of unneeded data.

user5994461 · on Dec 12, 2020

>>> But we've had disk quotas before that mostly worked?

AWS has quotas on everything, including quotas on EBS storage per region.

You will realize that after you spin up some instances with disks and it's failing because you've hit 10 TB of EBS storage. Have to raise a ticket to raise the limit.

Spooky23 · on Dec 10, 2020

The IaaS services are easy. Other services are more difficult. Something like BigQuery ML could generate massive bills pretty easily.

cesarb · on Dec 10, 2020

> Storage is a passive resource, when you exceed your budget it can be automatically....deleted? That's almost always the wrong action.

A better option would be to automatically reduce the budget by the amount it would cost to keep the storage forever. If doing that would reduce the budget to zero, do not allow increasing the amount of storage. That is: assume the storage will not be deleted, and budget according to that.

asciimike · on Dec 10, 2020

How does this actually work? It clearly can't be forever, since any non-zero dollar amount * infinity months is infinity dollars, which is going to reduce the budget below zero since any non-infinite number minus infinity is less than zero... thus locking it immediately.

Even if we say "you get N months of storage before we delete it" and subtract N * current storage cost/month, what happens after you're locked out of all actions because you added an extra GB? Storage APIs cost money to use, so you would get locked out of those too (note that if you're not, people would set arbitrarily low limits and get storage access for free) and couldn't retrieve anything. The only remaining actions are delete (which is free) or raise the quota and do the whole rodeo over again.

Abuse is impossible to ignore at public cloud scale, so "free storage forever" (or even, storage at a one time fixed price) as the fallback isn't a viable option.

Lastly, from an optics perspective, which blog post would you rather see on the front page of HN: "I did something dumb and spent too much money on Cloud" or "Google is holding our data hostage" (or "Google deleted all my data")?

Source: I launched Firebase Storage, which has a GCS bucket that has a hard limit.

cesarb · on Dec 11, 2020

For it to work, obviously the budget has to be per month (for instance, $100/month), instead of an absolute limit. Most of the time, that's what you'd want: if you calculated that what you use will cost $50 each month, setting a budget of $100 per month would give some room for growth while preventing billing disasters (and you can always increase it a bit if necessary).

betwixthewires · on Dec 11, 2020

Off the top of my head I'd say that if you're budgeting for storage, the max you can afford for the time period that you'd need to recover data in the event of a budget overrun taking into considerations the delay time for notifications would be a way to calculate that. And that sounds like something that is reasonable to put on the customer to calculate.

onion2k · on Dec 11, 2020

You've explained why it's hard for Google to not give me resources I can't pay for, but that's not what I care about, or what I'm asking for. What I'm asking for is a feature where I set a hard limit of $100 and that's the most I get billed - if my account accidently uses $5000 of resources before Google reconciles the usage with my budget then Google automatically waives the additional $4900 and then limits my account in some way until the problem is rectified.

Practically every time these blog posts come up they end with the provider refunding the costs. I just want that refund to be a feature.

jasonpeacock · on Dec 11, 2020

So...you're saying that Google should give away $4900 of usage?

How will Google automatically differentiate between an "honest mistake" and someone taking advantage of this feature?

onion2k · on Dec 11, 2020

So...you're saying that Google should give away $4900 of usage?

Yes. But they should also develop mechanisms to warn users that they've made a mistake before it happens, and improve the speed they can detect mistakes to lower the cost, and invent some way to detect someone intentionally abusing the feature.

But mostly they should make the fact they do give away $4900 when a mistake happens explicit. That isn't actually a change. They just need to make it clear that's what happens.

Closi · on Dec 10, 2020

It's almost like you could make it configurable so users can choose what happens if they go over, and to what extent.

modeless · on Dec 10, 2020

It's not really that complex. All compute should shut down. All API calls should fail. Storage should be (optionally) preserved at normal cost.

Your examples are simple given this framework. Uploading/downloading data to storage is an API call. Monitoring is compute. Metric history storage is storage.

jasonpeacock · on Dec 10, 2020

But storage costs continue to add up even when you're not accessing them - there's a cost to storage existing which continues to accrue with time.

When there's no budget left, what do you do with those accruing costs for existing storage?

modeless · on Dec 10, 2020

Storage costs are predictable and slow to accumulate. They are rarely the problem people are trying to address when they set a budget. As I said, storage would optionally continue to be charged at the normal rate, the other option being immediate deletion if you really need a super hard budget cap.

Once you get the alert that your budget is tripped you can go and see what's in storage via the console and delete it, only paying for a few hours of storage for things you don't want.

AngusH · on Dec 10, 2020

If the amount of storage that you can use is limited by quota (say 50GB) the problem becomes relatively easier.

You set a quota for 50GB of storage and no more. The server then restricts you by disk quota to that amount of storage.

The cost is then calculated as 1.15USD per month.

So you don't pay more than 1.15 per month.

Compute and transfer (and other things) could be covered by separate similar quotas with a single maximum spend figure at the bottom of the table.

asciimike · on Dec 10, 2020

Moreover, once API calls are locked, what next? You can't delete files, and even if you can delete them, you aren't able to retrieve them before deletion... If a platform allows you to do those actions, then it's rife for abuse, and at public cloud scale that ends up being a far, far bigger problem than the occasional blog post that ends up as a refund (because the other blog post is "I got free storage forever with this one weird trick").

It's really not a simple problem because the next action depends on the choice the developer wants to make: do they increase the budget or decrease usage, and no cloud provider wants to make this choice because no matter what the choice is it will be viewed as wrong. The best they can do is provide developers the best insight and tooling to make this choice themselves.

modeless · on Dec 10, 2020

Once API calls are locked you can open the console, disable all the things that caused you to hit your budget, and then raise the budget a bit to get access to the storage APIs again and manage your storage. Or, the console's storage browser should let you browse and delete files as well. And again, there should be an option to delete all storage immediately for a hard cap on your budget if you really want that.

AngusH · on Dec 10, 2020

You need separate costed quotas for each type of activity with a combined total at the bottom.

You could also have a setting in the admin panel as to what the system should do:

[ ] I want to keep going beyond my quotas (but email me)

[ ] Please shutdown my site

asciimike · on Dec 10, 2020

If the answer is "you have a dollar limit set of GCS GETs, GCS PUTs, etc." I guess I could see this working, but hot damn that'll be a horrific interface.

The other issue is that many large customers pay different prices, so billing and quota aren't really tied to each other, and it wouldn't be easy to reconcile this.

As for the button... having been on the product side of building this button, there is no right answer: people will say they never got the email (or it went to the wrong inbox, or their dog ate their phone...) or that they never checked the box to "shut down the site" ("I didn't think it would do X that made my app not work").

AngusH · on Dec 10, 2020

I'd probably want it grouped by category with a drill down interface for the specifics.

Probably arranged so you can type in a figure at the bottom for monthly expenditure and it would balance out the requirements based on typical use cases.

So enter $50 in the monthly cap figure and it allocates, say, $20 to compute, $20 to transfer operations and API calls, $10 to storage

which you could then fiddle with of course.

I can't offer much on the second point other than to say that unexpected bills annoy me much more than services that stop working.

I've also never worked anywhere with unlimited budgets. (alas)

I can see that there are probably cases where uptime is more important so they would be more annoyed the other way around.

jsiepkes · on Dec 10, 2020

Not only development but also running in production. You can configure alerts but you can't configure a hard limit. Thats just insane. That makes working with GCP like playing with fire.

k__ · on Dec 10, 2020

What about throttling?

herendin2 · on Dec 10, 2020

Nice to have, but people want a throttle that shuts off dead at a certain number of dollars

pwinnski · on Dec 10, 2020

aka "Bankrupt me more slowly"

Throttling doesn't stop the drain.

ZephyrBlu · on Dec 10, 2020

Probably because it's not so simple on the backend.

I'm guessing there's a good chance a lot of systems are only eventually consistent, which could explain why billing takes a long time to update.

Aggregation of service usage for billing could also be an expensive operation, so it's only updated irregularly instead of being near real-time.

It would be a great feature, but I can imagine it being very complex. It's also probably cheaper for them to just wave away excess usage like this instead of building out a solution.

a-priori · on Dec 10, 2020

This is a billing question, not a technical question, and looked at through that lens it's easy to put a hard limit on a monthly bill: just don't ever issue bills greater than that amount.

If I say I only want to pay a maximum of $1000 a month, and I hit that limit but it takes a bit for the provider to shut everything down so really $1100 of resources were consumed, then the provider eats the $100 overrun and I get a bill for $1000.

With an actual hard limit you create a financial incentive for the provider to minimize this overrun. Yes it might be difficult to fix but I assure you, if hard limits existed, the technical issues would be solved soon enough because now there's a reason to invest in a solution.

benlivengood · on Dec 10, 2020

It's also a mostly solved problem because advertisers have budgets and it's common to implement globally distributed budget servers to avoid showing more ads than the advertiser paid for, despite tens of thousands of individual web servers needing to know which ads in their inventory have budget left.

It's a fun exercise similar to global rate-limiting/load-balancing.

wikibob · on Dec 10, 2020

That is fascinating.

If you have the time could you (anyone feel free) talk a bit about how you would implement a globally distributed budget?

I can imagine a few simple options, but they all seem to have significant shortcomings.

benlivengood · on Dec 11, 2020

I think the simplest is a tree of servers (which can be sharded by user if necessary for load balancing). The root has the total budget and offers short-term small leases of ad views to child nodes, who may also have child nodes doing the same thing with even smaller leases.

Web servers check with the leaf nodes for every ad they want to show. If that leaf has a budget greater than zero it decrements its own budget and returns success. If the web server gets a success it shows the ad, if not it checks with another budget server or two. Web servers frequently log how many ads were served per client.

Whenever leases are up the intermediate nodes inform the parents of how much was spent and get a new lease. If nodes crash or otherwise don't return their lease then their parents have to assume the whole budget was spent, but leases are kept small to avoid big discrepancies.

If the root crashes then there are problems so the root can be a slow ACID replicated database as long as its immediate children are mostly reliable and take large enough leases to minimize load on the root.

Periodically web server logs are aggregated to adjust the root budgets to account for crashed intermediate nodes and web servers.

The tree approach allows global low latency operation guaranteeing no overspending and minimizing underserving. Nodes are provisioned from the leaves on up to handle the necessary amount of traffic and to ask for leases large enough for 99.X% percent of child requests to succeed.

Any cloud provider could use the same technology on individual hosts to grab leases of CPU, RAM, disk, etc. by the minute per user and terminate services with no budget. Leases could be a lot longer because most budgets are monthly to cover all service needs and not pathological ad campaigns with low budget, high bid, and huge audience.

It's up to cloud (or ad server) providers to decide whether to stop services if the budget system is broken. In most cases it makes sense to fail open and keep serving and eat the loss because shutting everything down will incur even bigger losses.

kevsim · on Dec 10, 2020

I think that's not really an issue though is it? If you say "never charge me more than $100" they can a) ensure they never charge you more than $100 and b) work to optimize their own systems so that they cut you off as close to $100 as humanly possible. In the beginning they might eat some costs since it takes them a day to catch it, but they could work over time to bring that down. And it's not like it's costing GCP/AWS/Azure "sticker price" to provide their services.

donmcronald · on Dec 10, 2020

Azure has it for some plans [1], but not others like pay-as-you-go. It seems arbitrary.

1. https://azure.microsoft.com/en-us/support/legal/offer-detail...

raphaelj · on Dec 10, 2020

It's ever worse for services like AWS Cloudfront.

One of your competitors could just rent a cheap server on OVH with uncapped transfer and incur you $10k in cost in a few hours.

cambalache · on Dec 10, 2020

Maybe that it is your cue to move your server from AWS to OVH*

* I dont have any idea about OVH

tcoff91 · on Dec 10, 2020

CloudFront is a CDN. What the poster you've replied to is talking about is a competitor setting up a server that repeatedly downloads your content to rack up a huge CDN bill. OVH is not a suitable replacement for a CDN so you can't migrate from Cloudfront to an OVH server because a server is not a viable replacement for a CDN.

jyu · on Dec 11, 2020

There is an easy explanation: It's hard to build this feature, there is no pressing demand from upper management, it's easier to get promoted doing other simpler projects. Think about what a real time snapshot means: you need to know how much of all the services are being used, project that in the future and compute the costs.

betwixthewires · on Dec 11, 2020

Really, it is a bit disappointing to see a bunch of engineers in this thread talking like this is some monumental, borderline unsolveavle problem. The solution is pretty easy to figure out, even taking into consideration different needs of different customers. The implementation might not be trivial, and legal liability questions might have to be considered beforehand, but the problem is not that hard.

cambalache · on Dec 10, 2020

That should be illegal, but hey, at least they support noble causes, so let them be. It sounds cynic but this is their game.

betwixthewires · on Dec 11, 2020

To me this is akin to the personal checking overdraft scams banks were running for many years until those practices were made illegal.

There is no engineering hurdle that is a valid excuse for allowing a customer to go over their stated budget by 86 million percent.

trymas · on Dec 10, 2020

AFAIK digitalocean has notification if you go over user defined limit.

rk06 · on Dec 11, 2020

FWIW, Azure has that option

nojito · on Dec 10, 2020

Price transparency is the antithesis to the "cloud" and it's current financial success.

bpodgursky · on Dec 10, 2020

There are some cloud services where it's not quite this simple.

S3 -- you can't just delete customer data because they hit a billing limit

RDS -- not going to drop databases on the 27th of the month

Anything with persistent data is going to have to stay alive and accumulate costs. Admittedly these services aren't where the crazy bills come from, but it does make a simple kill switch a bit more complex.

raphaelj · on Dec 10, 2020

You don't have to immediately delete customer data.

Most service that has a limit cap will have a "grace period" of a couple of days during which the service does not work but the data is not deleted. That give your some time to get notified of the issue, and fix the problem/increase the limit.

heavyset_go · on Dec 10, 2020

This is a solved problem for every other service out there. You don't just delete the data, you give the customer a few days, weeks, or a month to pay their bill and if they don't, then you delete their data.

betwixthewires · on Dec 11, 2020

The problem with this though is it opens a vector for exploitation: users could just use the grace period to store data for free for a period of time. This can quickly become a heavy financial burden if enough people do it.

You could factor that into the price, but then you're potentially making the price point even more unattractive to users than it already is, and users that are responsible with their budgets would be subsidizing those that aren't. Not a very workable solution.

I'd say a good solution is giving customers the option to stop accruing more storage capacity, and to have a max deadline accounted for in their budget to store data (basically each customer decides whether or not to pay for a grace period).

heavyset_go · on Dec 11, 2020

I've accidentally let my OVH subscription go unpaid, and they gave a 7 day window to pay my invoice or delete my data. That's seems pretty fair to me, and they seem to have wide enough margins to eat the cost and still offer some of the cheapest prices out there right now.

gonzo41 · on Dec 10, 2020

I wouldn't be too scared. For AWS you get about $0.20 per 1 million requests on Lambda. You can do quite a lot with a single Lambda function. And a million of anything is a lot for a dev. Put a HTTP API Gateway infront of that with a CDN and you're hitting ~ a few dollars.

If you don't buy one coffee, or put a 20 dollar note in a book one month. Then you're fine. And if you have to use EC2, just use a t2.micro or a raspberry pi on your desk.

But really the first lesson you should learn in any cloud setup is Billing Alarms :)

If you're doing ML or CV work then it's probably cheaper to build on the desktop and port to cloud once you understand what the workloads are.

onion2k · on Dec 10, 2020

For AWS you get about $0.20 per 1 million requests on Lambda.

If you get it right, great. If you get it wrong then you end up doing billions of operations by mistake, which could cost a huge amount. That's what happened to the author of the article.

But really the first lesson you should learn in any cloud setup is Billing Alarms

Alarms only tell you that something is going wrong. They don't stop it. If your mistake is costing $1000/minute and you're an hour away from a computer you have a very expensive problem.

gonzo41 · on Dec 10, 2020

You can trigger events from alarms. And Lambda's only last 15 minutes. So still cheaper than 75K :D.

_mj78 · on Dec 12, 2020

That's not a bad idea. You could set it up to delete all Lambdas (assuming you've got a CI/CD system capable of redeploying them quickly later) if the billing goes over. Of course, this may hurt you more because of the outage it would cause. Up to you really.

jjk166 · on Dec 10, 2020

So you're taking code that you haven't validated locally to see what resources it uses, you're putting this up on the cloud to test it, then you are immediately going to the middle of nowhere without your laptop/phone/etc, and you can't arrange for a coworker or friend to pull the plug for you if something goes wrong?

patrickaljord · on Dec 10, 2020

> and you can't arrange for a coworker or friend to pull the plug for you if something goes wrong?

This is HN, many of us are solo founders with no coworkers or employees. Also how could a "friend" pull the plug? If it was a physical server running in your house maybe, otherwise you can't really give them access to your AWS account with all your private clients data in there.

jjk166 · on Dec 10, 2020

If you don't have anybody who can monitor your test, and you're not monitoring your test, why are you doing a test?

As for having a non-employee pull the plug, set up an IAM user with permission to access the test instance

WrtCdEvrydy · on Dec 10, 2020

> you're not monitoring your test, why are you doing a test?

Agile. Bringing you bankruptcy at the speed of cloud.

jschwartzi · on Dec 10, 2020

If I'm the only developer on a project and I really need to get to market I might do just that. I sometimes do day hikes on weeknights so this is actually a likely scenario for me.

jjk166 · on Dec 10, 2020

Do you go hiking alone without your phone? That seems dangerous.

And why would you start a test if you won't be there to see the results of the test? Seems more sensible to either leave after you've run the test or wait to do so until you get back.

jschwartzi · on Dec 11, 2020

If a test is going to take more than an hour then I'm not going to sit around after work waiting for it to finish.

jschwartzi · on Dec 11, 2020

Yeah pretty frequently. It's not dangerous at all. Maybe if I was climbing alone it would be.

gonzo41 · on Dec 10, 2020

Just to expand on this. You can have a hard limit. For AWS, create a role/user that's essentially ~root like access. Make a lambda function that's triggered by a billing alert at your threshold to just turn off things from most expensive to least. So turn of the DB servers. So the apps error out and the users go away.

salmonlogs · on Dec 10, 2020

As an ex-Googler working in a customer facing role in Cloud you did very well to get a $72k bill written off! It's definitely possible but requires a lot of approvals and pulling in a few favours. I went through the process to write off a ~$50k bill for one of my customers and it required action every day for 3 months of my life.

Whoever helped you inside Google will have gone to a LOT of trouble, opened a bunch of tickets and attended many, many meetings to make this happen.

cogman10 · on Dec 10, 2020

I know there's no reason for Google or AWS to do this, but man do I wish there was a way to put down a spending limit and simply disable anything that goes over that limit.

It's a little bit nuts that there are no guardrails to prevent you from incurring such huge bills (especially as a solo developer that might just be trying out their services).

betwixthewires · on Dec 11, 2020

In my opinion, and maybe I'm an absolutist about this, the fact that there aren't these guardrails is opportunistic and predatory. Agile, iterative design and testing will inevitably lead to failures, that's the whole point. Marketing a cloud service to developers who need scalable and changing access to computing during that process should take that into consideration.

KorematsuFred · on Dec 11, 2020

I do not think the intentions here are to be opportunistic and predatory but inability to empathize with small developers. A large customer will very likely just pay off few hundred thousand dollars extra expenses. It is only individual developers who are at risk here and cloud operators do not have much interest in them.

betwixthewires · on Dec 11, 2020

I don't know about that. Large customers can almost always find the capital to run their own infrastructure and save on cost. That's not to say that there aren't big customers of these types of services or that certain business models make using them more attractive than maintaining infrastructure yourself, but I would guess that revenue from these sorts of services are largely built on appealing to smaller customers, so their needs would be taken into consideration. Not taking potential cost overruns into consideration to me seems a bit deliberate.

To me it looks very similar to the personal checking overdraft schemes banks were using up until a few years ago.

rightbyte · on Dec 11, 2020

When I was a young and naive student I thought I could not charge my debit card under 0$. Got down to -3$ and had to pay a 40$ something fee when I already was out of money.

m1gu3l · on Dec 11, 2020

Definitely drank a few 45 dollar lattes in my day. Sucks.

donavanm · on Dec 10, 2020

The downside of disabling active resources is huge. It would mean a catastrophic interruption to the customers application exactly when its the most popular/active. And theres no practical way to determine whether the customer is “trying it out” or running a key part of their business on any particular resource.

On the other hand retroactively forgiving the cost of unexpected/unintentional usage doesnt impact the customers users. And with billing alerts the customer is able to make the choice of whether the cost is worth it as it happens.

Note: Principal at AWS. Have worked to issue many credits/bill amendments, but dont work in that area nor do I speak for AWS.

cogman10 · on Dec 10, 2020

> And theres no practical way to determine whether the customer is “trying it out” or running a key part of their business on any particular resource.

What? Why wouldn't this just be an opt in thing? It could even be tied to the account being used. It's not like AWS accounts are expensive or hard to setup.

If a user opts in to the "kill if bill goes too high" and they kill a critical portion of their business, then that's on them. Similar to how a user choosing spot instances if their spot ends up being destroyed. You've already got that "I can kill your stuff if you opt into it" capability.

> On the other hand retroactively forgiving the cost of unexpected/unintentional usage doesnt impact the customers users.

Yeah, and what happens if someone isn't big enough to justify AWS's forgiveness? What if they get a rep that blows off their request or is having a bad day? You are at the mercy of your cloud provider to forgive a debt, which is a real shitty place to be for anyone.

> And with billing alerts the customer is able to make the choice of whether the cost is worth it as it happens.

And what do they do if they miss the alert? You can rack up a huge bill in very little time the right AWS services.

The point of the kill switch cap is to guard against risk. The fact is that that while 72k isn't too big for some companies, it means bankruptcy for others. Its because you might want to give your devs a training account to play with AWS services to gain expertise, but you don't want them to blow $1 million dollars screwing around with Amazon satellite services.

remus · on Dec 11, 2020

> What? Why wouldn't this just be an opt in thing?

"Oh cool, I'll set a $1k cap, never gonna spend that on this little side proj." Fast forward a year, the side proj has turned in to a critical piece of the business but the person who set it up has left and no one remembered to turn of the spending cap. Busy christmas shopping period comes along, AWS shuts down the whole account because they go over the spending cap, 6hr outage during peak hours, $20k sales down the pan.

Of course it is technically the customers fault but it's a shit experience. Accidentally spending $72k is also technically the customers fault and also a shit experience. I don't think there is an easy solution to this problem.

cogman10 · on Dec 11, 2020

"Oh cool, I'll use spot instances, never gonna need reliability for this little side proj."

"Oh cool, I'll only scale to 1 server, never gonna see high load for this little side proj."

"Oh cool, I'll deploy only to US West 1, outages are never going to matter for this little side proj."

There are a million ways to be out of money as a company. Why should this be any different? Why is the singular particular instance one where it is simply intolerable to accept that users can screw things up?

There are lots of things that are "shit experiences" that are consumers fault.

There is an easy solution. Give consumers the option and let them deal with the consequences. There are enough valid reasons to want hard caps on spending that it's crazy to not make it available because "Someone MIGHT accidentally set the limit too low which will cause them an outage in production that MIGHT mean they lose money"

freemint · on Dec 11, 2020

There exist totally a solution. It is also user hostile enough so it might get's adopted. $cloud_vendor just has to (and probably will) constantly nudge people to loosen the limit. Have a red banner that says, "you spend 3% of your monthly budget already think about increasing it". Also routinely send out emails to remind people. " Black Friday comes up think about increasing your quota " when your service has nothing to do with E-Commerce.

dathinab · on Dec 11, 2020

> The downside of disabling active resources is huge. It would mean a catastrophic interruption to the customers application exactly when its the most popular/active. And theres no practical way to determine whether the customer is “trying it out” or running a key part of their business on any particular resource.

This is simple wrong.

Depending on your use-case disabling active resources is the right reasonable solution with less downsides.

E.g. most (smaller) companies would prefer their miscellaneous (i.e. no core-product) website/app/service to be temporary unavailable then have a massive unexpected cost they might not be able to afort which might literally force them to fire people because they can't pay them....

I mean think about it, what worth is it if my app doesn't go temporary unavailable during it's free trial phase when it means I'm going bankrupt from today to tomorrow and in turn can't benefit from it at all.

Sure hug companies can always throw more money at it and will likely prefer uninterrupted service. But then for every hug company there are hundreds smaller companies which have different priorities.

In the end it should be the users choice, a configuration settings you can set (per preferably per project).

And sure limits should probably be resource limits (like accumulated compute time) and not billing limits as prices might be in flux or dependent on your total resource usage or similar so computing it is non trivial or even impossible.

I often have the feelings that hug companies like Amazone or Google often get so detached from how thinks work for literally every one else (who is not a hug company). That they don't realize that solutions proper for hug companies might not only be sub-optimal but literally crippling bad for medium and small companies.

tpmx · on Dec 10, 2020

The upside for the noob trying out/learning is huge.

I'm no longer that person, but I think GCP/AWS are just being lazy about this - perhaps because they earn a lot of money from engineer mistakes. Of course it's possible to create an actual limit. There'll be some engineering cost, like 0.5%-1% extra per service?

Edit: Being European I think legislation might be the fix, since both Amazon and Google have demonstrated an unwillingness to fix this issue, for a very long time.

perfectspiral · on Dec 11, 2020

"The downside of disabling active resources is huge. It would mean a catastrophic interruption to the customers application exactly when its the most popular/active."

Lol what ... this is exactly what happens any time you hit a rate limit on any AWS service. The customers application is "catastrophically interrupted" during its most popular/active period.

The only difference is in that case, it suits AWS to do that whereas in the case of respecting a billing limit, it doesn't.

howlgarnish · on Dec 11, 2020

If you hit a rate limit, the marginal portion of requests exceeding that limit is dropped: if you plot the requests, the graph gets clipped. Bad, but not catastrophic.

If you hit a billing limit, everything beyond that point is dropped, and the graph of requests plunges to zero. You're effectively hard down in prod.

franciscop · on Dec 12, 2020

And for some companies/individuals, if you keep charging then THEY will plunge to a large negative debt. It's not even zero, it's a lot worse than that.

freemint · on Dec 11, 2020

Just as it is with bank accounts. Once you run out of money you hit a hard floor.

mewpmewp2 · on Dec 10, 2020

I was creating some side-project. I already incurred around $100 fee. I imagine if I made some looping/recursion bug I could've easily incurred a cost of $10,000 or frankly, infinite cost. How easy would it have been for me to get this pardoned? And at the very moment I would discover that I just lost $100,000 - would I know in advance that they are definitely going to forgive this because I'd be full panic mode? It was very scary for me to use cloud in this case.

I didn't even have any customers at that point.

Alupis · on Dec 10, 2020

Why not alert thresholds, configurable by the user?

Email me when we cross $X amount in one day, Text when we cross $Y, and Call when we cross $Z. Additionally, allow the user to configure a hard cut-off limit if they desire.

Just provide the mechanisms and allow users to make the call. Google et al would have a much stronger leg to stand on when enforcing delinquent account collections if they provided these mechanisms and the user chose to ignore them.

Additionally, Google et al should protect _themselves_ by tracking usage patterns, and reaching out to customers that grossly surpass the average billable amount - just like OP with their near $100k bill in 1 day. Zero vetting to even have a reasonable guarantee the individual or company is even capable of paying such a large bill.

And then what? Sue a company that doesn't have $100k for $100k? This makes zero sense.

Retailer · on Dec 11, 2020

Google has alert thresholds (you set it up under your Budget). But practically speaking, an alert is not enough - what if you are unavailable to get the alert, it comes in the middle of the night, etc?

A better solution would have been 'limits' which they used to have (at least for Google App Engine) but which has been deprecated.

We had to spend sometime to research and see if there was a work around because just like the author of the article, we were quite worried about suddenly consuming a huge amount of resources, getting a spike in our bill and our accounts being cut off/suspended because we hadn't paid the bill. We've documented our solution here

https://retailingplatform.com/blog/how-to-control-costs-on-a...

patch_cable · on Dec 10, 2020

Something like this for alerts? https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori...

Alupis · on Dec 10, 2020

Doesn't look like there's any cutoff mechanism there, and it's a separate, optional step instead of part of the setup flow with a mandatory opt-out warning.

Nor does that address the other complaint - Google (and possibly others) seem to be willing to extend an unlimited credit line to all customers without any prior vetting for ability to pay. That's crazy.

arp242 · on Dec 11, 2020

> The downside of disabling active resources is huge. It would mean a catastrophic interruption to the customers application exactly when its the most popular/active. And theres no practical way to determine whether the customer is “trying it out” or running a key part of their business on any particular resource.

Well, this is true, but this is also true of a lot of limits, like limits.conf. Sometimes you really want to spawn loads of processes or open many files, but a lot of the time you don't, so a barrier to limit the damage makes sense.

There is no one solution that will fit everyone: people should be able to choose: "scale to the max", "spend at most $100", etc. If my average bill is $100, then a limit of $500 would probably make sense, just as a proverbial seat belt. This should never be reached and prevents things going out of control (which is also the reason for limits.conf).

_mj78 · on Dec 12, 2020

> It would mean a catastrophic interruption to the customers application exactly when its the most popular/active. And theres no practical way to determine whether the customer is “trying it out” or running a key part of their business on any particular resource.

This could be ameliorated by using namespacing techniques to separate prod from dev resources. For example, GCP uses projects to namespace your resources. And you can delete everything in a project in one operation that is impossible to fail by just shutting down the project (no "you can't delete x, because y references it" messages).

Aggressive billing alerts and events, that delete services when thresholds are met, could be used only in the development namespace. That way, fun little projects can be shut down and prod traffic can be free to use a bit more billing when it needs to.

dralley · on Dec 10, 2020

>It would mean a catastrophic interruption to the customers application exactly when its the most popular/active.

Making the worst case scenario no worse than traditional infrastructure.

speed_spread · on Dec 11, 2020

Correct. That argument assumes that every penny spent autoscaling has a positive ROI.

freemint · on Dec 11, 2020

I think this is a insightful way to think about. Thanks.

franciscop · on Dec 12, 2020

> "And theres no practical way to determine whether the customer is “trying it out” or running a key part of their business on any particular resource."

Well there's a very easy way, adding a checkbox and an input:

[ ] I am just trying things out, don't charge me more than [ ] USD

pstuart · on Dec 10, 2020

There are ways it could be done relatively benignly, such as defaulting to paranoid and explicitly opting out.

And for those that are heading into that financial barrier it should be a straightforward problem to look at trending to anticipate the shutdown and send out an alert.

dathinab · on Dec 11, 2020

Or just ask for a default when opening a new account ;=)

howlgarnish · on Dec 11, 2020

This. App Engine used to offer hard spending limits, and they were removed with precisely because so many users set them up to shoot themselves in the foot at precisely the worst possible moment.

salmonlogs · on Dec 11, 2020

^^ this. Hard spending limits seem great until your app/service gets super popular and you have to explain to the CEO why you were down during the exact window you needed to be serving the demand.

Aeolun · on Dec 11, 2020

I feel less troubled by this in AWS because they actually have functional customer service.

dijit · on Dec 10, 2020

> but man do I wish there was a way to put down a spending limit and simply disable anything that goes over that limit.

Literally did this my first week when trying out GCP for my company. It is entirely possible and documented (with code):

https://cloud.google.com/billing/docs/how-to/notify#cap_disa...

dathinab · on Dec 10, 2020

> Note: There is a delay of up to a few days between incurring costs and receiving budget notifications. Due to usage latency from the time that a resource is used to the time that the activity is billed, you might incur additional costs for usage that hasn't arrived at the time that all services are stopped. Following the steps in this capping example is not a guarantee that you will not spend more than your budget. Recommendation: If you have a hard funds limit, set your maximum budget below your available funds to account for billing delays.

(Source link in parent post, emphasis mine).

In this case they had a additional cost due to delay of $72k. Which, lets be honest means this feature kinda useless for anything but the somewhat harmless case.

Only by combining this with resource limits in load balancers, instance and concurrency limits and similar can the maximal worst cost be limited. But tbh. this partially cripples auto-scaling functionality and it's really hard to find a good setting which doesn't allow to much "over" cost and at the same time doesn't hinder the intended auto-scaling use-case.

donmcronald · on Dec 11, 2020

> it's really hard to find a good setting which doesn't allow to much "over" cost and at the same time doesn't hinder the intended auto-scaling use-case

> I created a new GCP project ANC-AI Dev, set up $7 Cloud Billing budget, kept Firebase Project on the Free (Spark) plan.

There's a lot of middle ground between $7 and $72k. Your quote explains it perfectly though. They flat out can't because the accounting and reporting is badly designed and incapable of providing (near) real-time data.

IMHO the easiest solution to this is government regulation. If you set a budget for a pay-what-you-use online service there should be legislation forbidding companies from charging you more than that.

I also find it (sort of) hilarious they can magically lock the whole thing down once payment fails, but not before the CC is (presumably) maxed out. Lol. Talk about a good deal for Google.

throwclowd · on Dec 11, 2020

There's something uncanny about understanding the situation enough to turn on the budget alerts, while at the same time not realizing it's not going to help in time if your system runs amok.

ggthrowaway2020 · on Dec 11, 2020

I'm not sure if you meant it this way, but your tone makes it seem like the parent just needs to "read the docs".

Unfortunately for all of us, your solution doesn't work, per the huge disclaimer on the page that says those alerts can be days late. You can rack up an almost unlimited $ bill in hours.

alasdair_ · on Dec 10, 2020

The article says that they had a limit in place but that in practice the billing limit lags up to 24 hours behind the "real" number.

merb · on Dec 11, 2020

thats not the best thing you can do. the best thing you can do is put excessive time into quotas. aws has way better quotas for starters than gcp has, sadly

brianwawok · on Dec 10, 2020

There are guard rails in quotas. Like you can only spin up X servers without opening a ticket to ask for more.

Now, think some of these quotas can still lead to some pretty crazy bills.. but that is the point of at least some of them....

dathinab · on Dec 11, 2020

They are broken, unreliable, hard to correctly setup guard rails.

I mean like the article mentioned they could have set the instances and concurrency settings to lower values. Which in this case would have worked.

But finding the right settings to balance intentional auto-scaling and limiting auto-scaling to limit of how fast unexpected cost might rise is hard and prone to get wrong.

Let's be honest it's in the end a very flawed workaround which maybe might help (if you know about it, and did it right).

fweespeech · on Dec 10, 2020

Tbh, its lack is why I don't use Google or AWS for projects.

Retailer · on Dec 11, 2020

If you're on App Engine, we did an article about that

https://retailingplatform.com/blog/how-to-control-costs-on-a...

Aeolun · on Dec 11, 2020

Yeah, had that when I just started using it and it happily kept scaling like crazy. $200 bill in one day.

I never used google again.

KorematsuFred · on Dec 11, 2020

There are already such features but a lot of indie developers are lazy to configure their infra properly. Now, default low limit does not make sense as it will piss off large customers.

I run so many websites on Google Cloud Run that sometimes I feel I might be abusing them, but I have ensured each of my site has max limit of 2 hosts.

paxys · on Dec 10, 2020

This is already present and very easy to set up.

sudcha · on Dec 10, 2020

OP here.

Thanks for sharing!

I have no idea what they did internally, but something like this was my guess. I only communicated through customer support channel and replied to emails, and shared my doc (which cited all the loopholes) with them.

It took them 10-15 days to get back and make a one-time good will contribution. The contribution didn't cover logging cost, so we did pay few hundred dollars.

salmonlogs · on Dec 11, 2020

Sounds like you found an amazing support rep and made a great case for it - good job!

vincentmarle · on Dec 11, 2020

I went through this very scary experience recently as well (although in our case it was $17K, not $72K). One of our devs accidentally caused an infinite loop with one of the Google Maps paid APIs on our dev account and within hours both our prod and dev accounts were suspended (pro tip: don't link your prod account to the billing account of your dev account). The worst part was that after removing the suspension, our cloud functions were broken and had to be investigated and fixed by Google engineers themselves resulting in our prod app being down for 24 hours... be very careful.

Luckily we were able to get $11K refunded on our card and received $6K credits after spending all night with Google support.

lathiat · on Dec 11, 2020

By contrast I hear stories of AWS doing this quite often for one-off mistakes (crediting thousands of dollars). It doesn't make much sense to me not to consider well-intentioned requests for this sort of thing.

Especially if you consider the dollar value of all those approvals and the business you might lose to some other platform and/or hesitance people will have to use those platforms for such things in the future.

xwdv · on Dec 11, 2020

If I were in this situation I would probably offer 10% of the bill to the employee as a reward for their help.

sfkdjf9j3j · on Dec 11, 2020

That's too low. I usually tip my customer service reps 25% of whatever they save me.

xwdv · on Dec 11, 2020

So zero.

rightbyte · on Dec 11, 2020

Sounds like a bribe.

xwdv · on Dec 11, 2020

Right, better hope they help me out of the goodness of their heart instead.

Alupis · on Dec 10, 2020

As J. Paul Getty once mused[1]:

> If you owe the bank $100 that's your problem. If you owe the bank $100 million, that's the bank's problem.

Crappy situation for OP and his startup, but I find the part about reading up on bankruptcy to be a bit premature.

Perhaps not the most ethical choice, but what stops OP from just not paying the bill, and finding a different cloud provider? Obviously they'll want to not repeat the "experiment", but seriously... there's no mechanism at Google to stop a new client from running up a near-$100k bill in a single day?

That's absurd, and should be a learning lesson for Google more than this startup. Some malicious actor could apparently consume hundreds of thousands of dollars of Google resources and "get away" with it.

Wait and see what happens, then deal with it - would be sane advice.

[1] https://www.brainyquote.com/quotes/j_paul_getty_129274

sudcha · on Dec 10, 2020

OP here.

Bankruptcy fear was real at the time. Google has at least a few thousand lawyers on payroll. They probably also have a process of handling delinquencies and sending them notices. A quick look at the lawyer fee to just manage the case, let alone fight it, is enough for bootstrapped company to raise hands.

+1 to bad actors possibility. I shared this with Google team, I'm not sure what they have done since.

We are out of that situation and I wrote the post so others, relatively new to Cloud don't make same mistakes.

Fail fast is a very bad idea with Cloud.

Alupis · on Dec 10, 2020

All true, and good points you raise.

However, Google's army of lawyers costs them real money, where your bill is largely made up numbers.

Perhaps the true cost is still enough to warrant sic'ing their lawyers on your company.

Even in that situation, a wait-and-see approach is still pretty advisable. The worst case scenario was already known to you - bankruptcy.

Nothing Google or their lawyers do would change that worst-case outcome, and if Google was aware you literally don't have $72k, and might just declare bankruptcy and walk away, they'll be much more eager to negotiate a more reasonable bill and settle your account. It's exactly as J. Paul Getty said...

Very glad it's being worked out and you will not have to go down that path.

donmcronald · on Dec 11, 2020

> Even in that situation, a wait-and-see approach is still pretty advisable. The worst case scenario was already known to you - bankruptcy.

You could even go scorched earth, represent yourself, and drag it out as long as possible. "Your honor, I'm a free man on the land and all I was doing was travelling the information super highway. I'm not bound by your laws!" Haha.

xwdv · on Dec 11, 2020

This is why you create a shell company to use cloud services with while your real company leases the servers from that company. As soon as you run up a bill you can’t pay you shut down the whole shell company and reopen a new one.

kabirgoel · on Dec 10, 2020

One of my favorite quotes of all time. J. Paul Getty was quite the weirdo. His Wikipedia article is worth a look, especially the section on his frugality.

pontifier · on Dec 10, 2020

Lol. I love it. I moved to a state I'd never considered because it had the largest, cheapest building in the US.

It's 220,000 square feet, but I've lived in a tent out back for the last 6 months because I can't get an occupancy permit, it's not zoned residential, and I refuse to pay rent on an apartment.

sterlind · on Dec 10, 2020

You live in a 220,000 square foot building?! Is this an abandoned missile silo or something? I want to know more.

pontifier · on Dec 11, 2020

It's the old headquarters of Varco Pruden. They manufactured steel buildings, and there are long, wide manufacturing bays with overhead cranes. You can see much more on my YouTube channel. I've got a few videos of different areas.

This is the first video I made there showing a bit of the size: https://youtu.be/qdkLzioUiAE

dumpsterdiver · on Dec 10, 2020

> It's 220,000 square feet

Is it an old airplane hangar?

xyzzyz · on Dec 10, 2020

As an interesting coincidence, large part of Google Cloud organization resides in a building that was formerly headquarters of Getty Images, a company founded by Mark Getty, a grandson of J. Paul Getty.

LordHeini · on Dec 10, 2020

That sort of crap is the reason we host all our stuff on root servers.

Even trying to read the amazon pricing for their instances, hours and what not, drives me insane.

Seems this is done on purpose. no wonder they make so much money with it.

So i have never seen a reason to move any stuff to the cloud.

Just grab a dedicated server for a few bucks and put a bunch of docker containers on those.

Its way cheaper, usually not more complicated. Just use a CI with Gitlab runners or whatever and be done with it.

Most apps don't need scaling anyway and if you do, just put that app on bare metal fitting your requirements.

that_guy_iain · on Dec 10, 2020

> That sort of crap is the reason we host all our stuff on root servers.

Having just started my own journey into building products for myself, pretty much the first thing I realised with my tech was I need to get dedicated servers instead of cloud just because it costs 100x less.

> Just grab a dedicated server for a few bucks and put a bunch of docker containers on those.

Exactly, if you really want kubernetes coolness to act cloud like, install kubernetes it's free and is super easy to setup.

And with the cost savings you can literally buy multiple spare servers and with kubernetes using them all while keeping the usage low allowing to scale up new nodes if needed.

throwaway201103 · on Dec 10, 2020

> kubernetes ... is super easy to setup

Can you point me to the super easy setup guide? Because I've tried a few and never gotten it working.

styren · on Dec 11, 2020

Have a look at https://github.com/hobby-kube/guide

I set up a 3 node cluster using it in an afternoon and haven't had any problems since.

AntiImperialist · on Dec 11, 2020

Canonical has microk8s, which you can install as a snap package. Super simple and works great.

WrtCdEvrydy · on Dec 10, 2020

I don't like use kubernetes raw but I am a fan of Caprover (which has kubernetes support)

scrollaway · on Dec 10, 2020

AWS pricing is not obscure, it's just not for you. So in that sense, you are correct to not see a reason to move to the cloud, but your advice does not apply to everyone.

And I don't believe they make "more money" that way at all. AWS margins are either very low or very high, and the higher margins and prices tend to be the "simpler" ones: packaged, managed products such as Redshift that are billed on fewer tiers and flatter prices.

When you design your application with AWS, pricing has to enter your design considerations. For example if you are designing something that will interact a lot with S3 you want to minimize PUTs. You want to minimize ram usage on lambda by streaming rather than buffering. Etc.

AWS is not a suitable product for playground stuff. The only reason it gets used as such is because it's easier if you're already using AWS for other things (or it's you're already very familiar with it).

nojito · on Dec 10, 2020

AWS's margin is currently 30+% which is massive.

>AWS pricing is not obscure

There is a massive secondary consulting market because of AWS's price obscurities.

klohto · on Dec 10, 2020

> There is a massive secondary consulting market because of AWS's price obscurities.

While that's true, there is consulting market for most things that are complicated. Doesn't mean they are shady. It's simply not for you. You are welcome either to dive in or get a consultant. I promise you though, that AWS pricing isn't difficult once you understand few concepts and know your way around the Cost Explorer. With proper tagging, it's easy to drill down which resource is consuming how much. I don't believe there is a way to have a simple billing for a complicated product(s).

Closi · on Dec 10, 2020

> While that's true, there is consulting market for most things that are complicated. Doesn't mean they are shady.

It does mean it's not simple though.

scrollaway · on Dec 10, 2020

Obscure and complex are different concepts. I'm part of that "secondary consulting market" FWIW, so I'd like to think I know a thing or two about it.

Does AWS have high-margin prices? In aggregate, somewhat, but this is mostly driven by the big ticket managed enterprise items: Aurora, Redshift, Quicksight, probably Fargate, etc. A lot of their more popular stuff (S3, Lambda, …) offer incredible value for very little money. EC2 is the exception I believe, because I understand it to be high margin for how popular it is. But EC2 pricing is one of their simplest ones.

Could AWS simplify some of their pricing? Yes, probably. There's always room for optimization. Personally for example I'd like to see their pricing be global rather than different by region (with understandable exceptions for govcloud and china).

Is AWS making its pricing complicated for nefarious purposes? No, there is no evidence to support that.

AWS pricing absolutely is not simple. It's a part of the AWS stack. You need to study AWS's events/signals system to be able to write apps that make the best use of AWS's interconnected stack. You need to study their APIs / SDKs to really understand what you're able to implement. And you need to study their billing systems to understand how to implement apps that run cheaply, and be able to predict potential runaway costs.

It has to be a part of the design. That's why you may want to hire consultants for it: People who understand it better than you do, and will be able to assist you in reducing your costs.

It's just another kind of optimization. Maybe some software engineers don't like it because it hits them where it hurts (the wallet) when they don't do it right, rather than be able to brush it off as they usually do.

It's much easier to ignore the waste produced by, say for example, the 3000 javascript dependencies shipped with the fat, unoptimized electron app they ship on their users' desktops, that do a ton of unnecessary expensive computing; when all that crap is client-side and it's the downstream user's electricity bills and CPU time that's being used.

dragonwriter · on Dec 10, 2020

> There is a massive secondary consulting market because of AWS’s price obscurities.

There is a massive secondary consulting market because the enterprise market is addicted to secondary consulting. This secondary consulting market includes AWS pricing because it includes pretty much any IT service the target market might be interested in.

A rational need for decomplexification isn’t necessary to explain the existence or coverage of enterprise secondary consulting, IT or otherwise.

scrollaway · on Dec 10, 2020

The margin is absolutely not the same across all products.

> There is a massive secondary consulting market because of AWS's price obscurities.

Its. Not. For. You.

AWS pricing is a part of your design. With some exceptions (that you aren't talking about), they charge you more for using more resources. You are forced to design systems that use less resources if you want to optimize your bill.

That consulting market is an optimization market. It's economics at its best.

If you are too small to have to take these things into account regardless, AWS is not for you. You're welcome to use it, but don't be surprised if you end up having to deal with these kinds of things which simply don't exist in the world of flat-price underprovisioned droplets.

nojito · on Dec 10, 2020

>AWS pricing is a part of your design. With some exceptions (that you aren't talking about), they charge you more for using more resources. You are forced to design systems that use less resources if you want to optimize your bill.

This is marketing.

It's like saying you want to build a house and the quote you got ends up blowing up 100x overnight.

Great example is the 100k credit for startups. You can repeat it's not for you all you want, but their business is predicated on pricing ignorance and vendor lockin.

scrollaway · on Dec 10, 2020

The $100K credit (which I've been granted multiple times) is there because if Amazon can get you to invest serious work into their infra, they'll make up for it in the long run. It's not "lock in", it's sales. The only amazon "lock in" really is their bandwidth-out pricing, which is a sleazy tactic for sure but I'm not hesitant to call it out when it's the case.

You can get the $100/$300/$1000 tier if you are in "just checking it out" solo mode. $5k and up requires either connections, partnerships, or a serious application.

Anyway I don't know what your point is, I'm not even sure if you have one. They're not "marketing" their pricing, nor the fact that you are "forced to design systems that use less resources".

csharptwdec19 · on Dec 10, 2020

> Anyway I don't know what your point is, I'm not even sure if you have one. They're not "marketing" their pricing, nor the fact that you are "forced to design systems that use less resources".

I think they are referring to this statement:

> > AWS pricing is a part of your design. With some exceptions (that you aren't talking about), they charge you more for using more resources. You are forced to design systems that use less resources if you want to optimize your bill.

It is a defense that I've heard in many AWS talks in the past.

Where it turns into a 'marketing' blurb to me is my real world experience in these AWS talks in the places I work. As a real world example, we had a product that required -some- architectural work, but otherwise was solid, and could run on 3 live EC2 instances (2 web LB, 1 live backend) and 1 spare (spare backend)

The Consultant that AWS partnered us with? Suggested a very overdone architectural revamp, moving everything possible into AWS Specific technologies.

It's marketing in that in many of our experiences, we know there is often at least one person on a team who does -not- have the discipline and/or experience to -keep- a system using less resources as the field goes from green to brown.

scrollaway · on Dec 10, 2020

Overengineering is easy and happens not just with AWS but with just about anything in software engineering.

I'm having trouble seeing how this changes what I'm saying: That with the way AWS pricing is structured, you are supposed to take it into account when designing your product.

When you reach a certain size / complexity and you have to design infrastructure, you should be making schematics, predictions on the usage peaks and troughs, how various parts of the infra will be affected, how active/idle they will be.

When you are dealing with AWS, pricing becomes extremely predictable because it can be derived from those plans. And it is far better to be dealing with that kind of model than to deal with "unlimited with a million asterisks" or something. AWS is predictable, reliable, and most notoriously has never ever increased their prices, so whatever you calculated will not go up because of Amazon's decisions.

WrtCdEvrydy · on Dec 10, 2020

> Suggested a very overdone architectural revamp, moving everything possible into AWS Specific technologies.

To be honest, depending on the technology, the savings could be worth it... for example, did you know you get a discount if your traffic is served over cloudfrount? even if your distribution is set to no cache any resource, you can front your APIs using cloudfront and save networking.

akh · on Dec 10, 2020

How do you take pricing into your design considerations? Does it come with experience from using an AWS service in production and understanding how it's priced, combined with the usage numbers the new system might get? I'm trying to learn more about how engineers currently do this.

scrollaway · on Dec 10, 2020

Basically, yes.

It's not that complicated, it's just not something engineers are usually used to do. If you use an AWS service, you look at its pricing.

Take s3 for example: whenever you use it, you'll pay for outgoing bandwidth, PUTs, GETs, and storage.

So you seek to minimize all of these:

1. Bandwidth: use cache layers. This also minimizes GETs.

2. PUTs: design your app in a way that doesn't do unnecessary inserts into s3. Consider alternatives such as redis, postgres or filesystem depending on the need.

3. Storage: compress your objects if they compress well. If they aren't often accessed, use storage classes and auto lifecycle management.

Pricing in AWS generally reflects some kind of engineering limitations you will face at scale in the first place, so it makes sense to go through this whole exercise either way.

thanksforthe42 · on Dec 10, 2020

Calling programmers "Engineers" is a misnomer.

I wish programmers had the prestige it deserved for combining Science, tradition, authority, and art.

Engineers are not allowed to use tradition, authority or art. They are restricted to being modern day calculators.

Nothing is wrong with either.

csharptwdec19 · on Dec 10, 2020

The shift from 'Developer/Programmer' to engineer has indeed been part of a push away from creativity towards cookie-cutter work.

An interesting analogue would be the Automotive industry; As time progressed, Companies focused more and more on 'engineering' versus art/tradition/etc. But as the industry evolved, "Flashy" vehicles that took risks became moreso either a halo product for a brand, or relegated to Luxury/Boutique.

And, of course, there was the dark side of this shift; A good example from the 70s, the level of 'engineering' driving the design of the vehicle and it's assembly didn't take into consideration the actual line worker; in Ohio the workers wound up getting overworked, burned out, and in some cases actively sabotaged the product, because they were being treated like automated machines.

thanksforthe42 · on Dec 10, 2020

I think that missed the point.

Engineers are applied scientists.

Programmers are not applied scientists.

scrollaway · on Dec 10, 2020

Why does any of this matter?

thanksforthe42 · on Dec 10, 2020

Different expertise.

I'm guessing you are neither?