I really wish cloud providers would allow users to set a hard budget that just stops the service if you exceed a threshold. I got a surprise bill from AWS this month (fortunately orders of magnitude less than this, but still ~15x my usual) and am thinking of moving from Lambda to a VPS just so the possibility of this doesn't keep me up at night.
I use VPSs for personal use for exactly this reason - the bill's the same (small amount) every month, and if load spikes I'm happier not being able to handle it than handling it for a surprise Xx bill.
Shameless plug: I know this plug should not be here but I think I can help some people with cost problem via (the easiest automation platform for AWS)[1].
I have a startup plan where you can run 60 executions in a month free of cost. Just reach out to me via https://totalcloud.io/solutions.html and refer YComb.
Same here, with AWS specifically. I'm used to AWS so generally stay in that ecosystem. Designed a web app and backed part of it by lambda and api gateway. Via a surge in traffic, my monthly bill went from about $6/mo to $150 for a month. Didn't catch it until I got the bill for the month, the timing of which meant my next month's bill was about $80.
App Engine PM: the GAE Daily Spend limits are hard caps that will shut down services when hit. They are different from the broader GCP Billing Alerts which are simply notifications.
The issue is that they only apply to certain GAE services (compute, legacy APIs, etc.) and not across the platform.
You can limit the max concurrency of lambda functions which can help a bit here.
Imo the reason is this is for business/enterprise use cases configuring this wrong could be catastrophic, for side project/personal use cases its pretty well known you get 1 chance for a refund from the public clouds, after some manual checking to make sure you aren't trying to abuse the service.
You can set budget alerts, which aren’t as good, but they can let you know if the bill’s going off the tracks. I set up alerts at 50% and 75% of what I’m willing to pay and that’s gotten rid of my ‘unexpected bill’ stress.
I think there'd be two big issues - (1) what is reasonable behaviour when you reach your budget, and (2) the consistency of billing.
On (1), if you get to say the 14th of the month and run out of budget, what would be the expected behaviour? - shut down all your services? delete all your data so that's not incurring a charge too? It'd probably have to be configurable on a service by service basis, and that's an awful lot of complexity to introduce for a tiny minority of their revenue base.
On (2), their billing systems are almost certainly eventually consistent in logging and charging usage, so they'd either have to shut you down early in anticipation of delayed charges, or let any delayed charges bring you over budget. Either is liable to make customers fairly unhappy one way or the other.
The TL;DR version - it'd be nigh on impossible to have good UX with such a feature, would be complicated to implement, and it's not likely to move the needle on any metric they really care about considerably.
> it's not likely to move the needle on any metric they really care about considerably.
what about devs too scared to put their CC no in the till coz they've heard horror stories about being overcharged for mistakes? Just recently had to sort out a feature that a junior dev created using openstreetmaps coz they didn't want to be on the hook by putting their card into google maps... It was for an internal portal, usage wasn't even high enough to hit the monthly free tier!
Why would a junior dev be using _their_ card details? Surely your company has a corporate billing account with Google for this purpose?
Perhaps I've misunderstood the situation, but if I were working as a dev at a company and was expected to provide my own card details to pay for APIs, I would walk out the door.
They were just told to make the portal as a learning teambuilding exercise, choice of tools was entirely up to them - i even told the person company would pony up for anything they ended up spending. HR was being a pain about getting card details(it's not easy getting that stuff in india)
The dude bounced off google maps as soon as he saw the form to enter payment info and went straight to open street maps.
Not everyone is a rockstar engineer in a top 5 company my guy, people have to start somewhere and paperwork is a huge show stopper in india in general.
from your comment history i see you're european.Europeans have way more protections and assurances in cases like this, india is a very different business environment...
Seeing as how i had to learn all the GIS jargon OSM uses by default, it was bad for me. It's not a bad choice but since we as a company can afford the easy of Google maps it's a waste of time
It seems to have been a combination of a traffic surge and a third-party API being temporarily slow. The slow API meant that I ended up paying for more idle CPU time during the request round-trip.
Alarms are like a garage leaving you a voicemail message before doing $25k of repairs on your car. It’s ridiculous that I can’t set a hard cap on my spending.
I guess you could have the SNS topic invoke a lambda that shuts down the specific service if you wanted that but I agree a budgets feature would be nice, especially when provisioning test accounts for developers.
Assuming your revenue and your AWS bill are strongly correlated, I agree. However, when a free service or a bug can increase costs by orders of magnitude without a corresponding increase in revenue your choices are to eat it (not sustainable, for most of us), throttle, or stop. Throttling is hard, so stopping is often the best choice.
If my business had 100k in revenue per month. Normal cloud spend is 2k per month.. some glitch made spend go to 2k a day. That sucks, and I hope I have good alerts to catch it within a day or two.. but telling my customers "screw you, bill messed up" for however long it took me to sort out the bug? I would not do that.
Sounds like the oomkiller. Sounds good in theory until it starts shooting processes in the head that you didn’t intend.
What would you like Amazon to do for stateful services? Should they stop and delete EBS volumes? What about databases? Simply shut them down? What happens when you lose data or it doesn’t come back up?
EBS volumes have a size, so there's an upper bound to the cost. Most of the storage is predictable, so if I try to allocate storage where the monthly cost of the raw storage is (ex) 10x my budget, I wouldn't have a problem with the request being denied.
For non-storage resources like EC2, network bandwidth, etc. I'd be fine with having a hard limit where everything just breaks, especially for stuff that's not production.
There could also be better, self managed quotas on resources. SES is a good example. AFAIK the quota is all or none across the entire account. IMO, it's not a good idea to give a user that needs to send (made up numbers) 1k emails per day credentials that can send 250k emails a day.
I have 3 AWS accounts. I don't keep anything in my main account. It's for billing only. I have a sub account for production that I try to keep pristine. I have a sub account for development and testing. It's the development account that scares me. I spend less than $50 per month. I'd rather have my whole development account de-allocated than get a bill for $1k.
Billing problems aside, it's amazing that web development has reached a stage where it's almost too easy to scale a site to 2m active sessions and 20m+ page views in two days with just off-the-shelf tools and barely any specialized skills.
Except for very simple use cases, serverless platforms in general encourage bad software design patterns.
For example, in this case, if a regular database was used instead of Firebase, "counting the number of supporters" would have been done inside the database using a query. The worst that the developer could have done is forget to index to the relevant column; performance would have been sub-par but still orders of magnitude better than Firebase because the bulk of the data would not leave the database if there was a proper query language. With Firebase, when doing complex calculations, not only does the data have to leave the database, it often reaches end users; the calculations then happen on the front end and this is a huge problem in terms security, performance and even correctness (due to latency and the fact that other users may be making conflicting calculations simultaneously based on slightly outdated data).
Also, based on the article's description of the application, it sounds like it may have exposed potentially sensitive data (about supporters) to all users so it may have introduced a security vulnerability. I would not be surprised if this was the case.
> With Firebase, when doing complex calculations, not only does the data have to leave the database, it often reaches end users
Any time sensitive data reaches untrusted users it is a bad practice, but this needn't be the case on Firebase just because it is serverless.
Firebase cloud functions can be triggered as a response to document saves (or authentications, or a plethora of other things) and run outside userspace, and are available for exactly this sort of work -- keeping the minutiae of upkeep away from the client.
That said, I agree that it's easy to make this kind of mistake with serverless development, but mostly due to lack of existing domain knowledge. It's trivially easy to make similar (or worse) mistakes without a serverless environment by untrained developers, too. It's just a matter of becoming familiar with the toolset, and because serverless tech is newer, fewer are as familiar with them as they are the old things (and there are fewer people to catch those mistakes when they see them.)
Not all databases support counting indexes. This is why counts we’re a bit issue at Parse: mongo didn’t support them and if the count is for an unindexed query, it could be a full table scan on a multi-tenant system. Just as bad, mongo as of 3.0 (last I worked with it heavily) did not ever yield during an index scan so counting large numbers on an indexed query could cause massive problems too.
I recommend the atomic increment operator for counts that both Parse and Firestore offer for this problem.
> Except for very simple use cases, serverless platforms in general encourage bad software design patterns.
> For example, in this case, if a regular database was used instead of Firebase, "counting the number of supporters" would have been done inside the database using a query.
I don't think this is connected to serverless achitecture. You have to write a database query either way. And it would be just as easy to count or retrieve entities one by one with traditional approach.
That's why I don't use cloud offerings. Their pricing models are designed to trap someone with a exorbitant bill and you don't have control over it. For example let's say you put an image on a CDN and someone who doesn't like you runs AB for a couple of days making billions of requests to bankrupt you or someone finds a page that runs costly queries and sets up curl against it. No thanks.
I think this is actually an admirable example of why Firebase is a fantastic option for a startup. This company had made no optimisations for this and would have lost a rare opportunity for virality.
Would you pay 30k$ for 2million people to look at and try your site? I would for any business I was running. If your servers were on fire at the start of that, would you pay 30k to keep your servers up?
I actually switched from AWS/Cloudfront to Cloudflare because pricing isn't based on bandwidth and I have greater control over security and access (firewall, rules, etc)
I had a string of sketchy accounts using my SaaS service as a file host which ran up a bill of about 1.5k. Obviously I put some safeguards in place (firewall, rules, alerts), but the scenario you're mentioning does happen and is largely mitigated by using a service with a pricing model like Cloudflare's.
Interesting, I saved roughly 2M USD / year for customers moving them to the cloud. Your example is silly, all of the CDN I set up has very strict limit on how much a single IP can use and have multiple alerts if you are passing 100, 200, 500, etc. USD limits. On the top of that if that is not enough you can add more limitations to avoid that exact scenario that a public resource can be abused to cause you financial troubles. It won't "bankrupt" you if you do it right. Just like pretty much every other technology, you need to know it.
The other problem with your comment is that you try to make it sound like it was a single dimension decision to use the cloud. It is never a single dimension questions though.
Not taking sides here, but I think that the cloud is complicated enough (especially for a person that does not specialize in it), to miss one of the edge cases that can lead to the huge bill.
In a few minutes, I can set simple PHP script with curl, that will launch 100 requests each second, using a pool of hundreds of thousands of IP's thank to the rotating VPN.
This is an edge case of course, but it can happen.
I use cloud myself, but only cloud servers, which allows me to control budget better and provides me with an "escape plan", where I can just switch to dedicated quickly.
You see this is too complicated and "If you do it right" imply you can get it wrong and go bankrupt. I prefer to use dedicated server with dedicated bandwidth and Kubernetes on top of that. This makes me sleep comfortably at night. I tried the cloud and this was just too much anxiety for me to handle.
It is absolutely not complicated compared to running Kubernetes yourself. You don't have to use serverless and infinite scaling. For things like Azure App Services you don't autoscale to start with and if you want autoscaling it looks like this:
Yes, just like driving, if you doing it wrong you got to the hospital or die, if you doing it the right way you get from A to B. Personal responsibility does not go away because you are using the cloud.
I have to mention this every time I see it... The crowdfunding campaign mentioned in this article took place in "Colombia" not "Columbia." There is no country by the name of "Columbia." This problem is so aggravating to Colombians they even sell t-shirts in their airports that say "IT'S COLOMBIA NOT COLUMBIA"
The Cloud. The sky is the limit for how much you can run up on your credit card. The silver lining is what you're going to be paying to your cloud provider.
I also seem to have to pay 5$ a month now, I can get past it with some element hiding and cookie clearing but seems like medium is not the ideal solution anymore
Personally, I had a go with Firefox (desktop and Android) again when they were making all the fuss about quantum, but it was still much slower than chrome for my workload - I regularly ran into common tasks that took >50% longer. Combine that with the fact that some websites work less well in FF and I just couldn't face it, however much I'd like to support them.
* Create quality service, prioritizing user experience over money
* Grow site to significant user base
* Start running out of money, because hosting and running a site does cost money
* Try to monetize service
* Get out-competed by another service still at step 1
The presence of VC money that's all about capturing market share now, and monetizing later does not help.
In the end, we have become accustomed to a world where things that cost money are given to us for free. When there are attempts to charge us so things remain sustainable, we get angry! It's a tough problem, and I don't know the solution.
You're missing a step between "Try to monetize" and "Get outcompeted" - the "Service turns to user-abusing, steaming pile of hot garbage". This cycle does happen repeatedly (image sharing being a prime example), and services that die do so because their monetization strategies destroy almost all the good aspects.
I believe it's hard, but not impossible, to refrain from abusive monetization and stay afloat longer.
My point is that there is very little difference between "user abusing" and monetization. This is precisely because of this cycle.
Users expect these services to be free, because they are used to services being sold under-cost. The alternatives for 'platforms' that I know of are:
1) accept 10% of paying customers and 90% leaving, bad plan if you rely on the mass of present users
2) go for a tiered system, where certain features are gated by a pay-wall. This is very tricky, and asking for essential features will see free competitors come up. This can work if your sheer user-base is enough of a moat. (Reddit is still trying to make this one work)
3) sell user-data
3 is definitely seen as 'user-abusing' 2 tends to be as-well and 1 is not an option for any platform.
I can't think of a single platform that started of as 'free' and got profitable without being widely condemned for 'selling out users' except for YouTube (and I guess PornHub). The moat there is massive infrastructure needed for video and a huge, hard-to-move catalogue of content.
That is, it is too expensive for some start-up to grow and capture the market.
I'd love to hear of any other success stories. Especially one that could work for a medium / reddit / flicker kind of site.
Thanks for elaborating. I agree, but I wanted to see this point mentioned explicitly. It's not just the free competitors that are causing established business' downfall; the business itself usually engages in plenty of self-destructive action in a desperate attempt to monetize users who are used to free service.
I was just thinking that, fully agree. One of the things I learned from startup school: Get people to pay asap. If you are really are solving a problem they do not care that your website is a bit buggy and it's made with bootstrap.
All these platforms rake in millions of VC money to get market share (that is fine btw) but if you do not know how to monetize it, what is the point? And of course always being free and then moving to a paid model doesn't make your users really happy.
Well the solution is find a market gap, not give people things they don't really need and pull a bait and switch. You can find users of a lot of things if you give them away, don't be surprised if they don't now want to pay for your thing they only used because it was free.
I know I've seen at least 4 articles about 'OMG Surprise horrific billing" with them. And to be quite frank, you can buy a EC2 instance, throw Postgres on it, and have a stable bill (Except for bandwidth costs, but that's trivial... usually).
I'm sure Firebase has good features, but this surprise billing is terrifying. They can't even offer a warning "10 min average indicates a bill roughly XX,000$/mo". I could not suggest it in good faith for anything, especially since it doesn't have a hard cutoff.
I just want to point out to people who are new to Firebase that you can query your Firestore collections.
Also, Firestore Queries are cached by default, so if you try and fetch data that hasn’t changed, you shouldn’t have to pay for that read.
db.collection(‘Payments’).where(‘id’, ‘==’, ‘payment 1’).onSnapshot(console.log) // 1 read, then cached
Whereas specifying a document and then manually get it always counts as a read. Example;
db.collection(‘Payments’).doc(‘payment 1’).get() // always counted as a read
It's funny.
I've had multiple discussions with people who build apps using Firebase and then not being able to scale to 1000 concurrent people without the BE falling apart. I want to tell them that they've done something wrong, but since none of my apps haven't reached those kind of usage levels, I really don't know.
I think Firebase comes with a lot of power, and as long as so plan to scale, design your database model, have proper security rules and cache as much as you can, you can probably host a 1M concurrent users without getting scaling problems. Cost though, ooh yeez. :)
TLDR: The huge bill was a result of an improper way the application was coded. They contacted Google/Firebase who were gracious enough to waive off the bill.
You can, cloudwatch can shut things down on reaching threshold or scale-down.
With cloud you really have to optimize your code/architecture as you pay for what you use.
With your own servers that is not an issue unless you overload them, then you can have an outage.
EDIT: I thought Firebase is amazon for some reason, it is not. So in general Google panel doesn't have that functionality (from my experience with it), they have billing alerts but I never saw anything that would shut off things on reaching that alert.
You can limit by API requests usually but not by credit amount so if you use multitude of API's it can get tricky and they likely didn't limit anything.
That's very nice of them. However I'd have to read all of firebase's small print before i would consider using it.
Does their SLA guarantee data availability even if google dcide to spin it off or sunset the way they did with google+, reader and such?
Does it matter what their fine print says if you have to bang your head against a buggy automated bot process to get support things done? Switched over to MS appcenter recently(they added support for cosmosdb + authentication). Their customer care is so nice by comparison
Former Firebase PM: firebase.google.com/terms has all the details.
TL;DR: minimum 1 year deprecation policy, and you've got access to the data at any time (e.g. do a backup and get everything as JSON, or download an entire store bucket and transfer it to S3).
Glad to hear that Google has done something with it.
Also that's one of the reasons I try to use the realtime database, and not firestore. But they still charge for bandwidth there (most if it is consumed by downloading the SSL certificates from the clients).
This must be the first time I hear about Google responding timely and adequately to a specific problem of a customer. Could the publicity have something to do with it?
More likely because google responds timely and adequately to customers all the time but nobody tweets about it or writes a medium post that gets shared.
Their report reads like they had it coming big time:
> The app was running, all the supporters were able to support and the comments on social networks were that the app made it really simple to do support. We were very proud :)
> We didn’t want to release any new feature with that many users on the site, so we decided to merge a version with Angular V.6 […]. The site started to load slower, for some users it took them more than 30 seconds to load the page. That was weird. Our team was not comfortable with that and we couldn’t understand what was causing it and now we had our code with a completly new version of Angular, and probably many other bugs in production.
Am I reading this right: Their site was running well. They didn’t want to interrupt it by adding new features (potential bugs) so they casually update the framework and push the new version of their website to production hastily? And instead of rolling back the release they double down and optimize their code without knowing the source of the slowdown. Like WTF? Apparently they hadn’t even opened the browser’s network console to check if/what requests cause the slowdown. How did these people get $25K grant money?
Rgerding grant money post says: NXTP Labs acceleration program - so venture capital basically.
Looks like incompetence really, using a framework and unable to pinpoint the bottleneck and deciding that maybe upgrading version will somehow fix their bad code.
Not a completely pointless idea as framework might indeed have been changed enough to force them into using itself correctly, but still lot of questions.
I see. Thanks. Hadn't realised theres a link to the actual post-mortem. I thought that part was underlined for emphasis.
Really should've just linked to that hackernoon post though IMO, contains a lot more details thats very helpful in understanding what actually happened.
Writing a cost calculator for firebase is a baffling experience.
The relative difficulty to gather precise information and model costs are intentionally discouraging you
to undertake any estimation.
Very interesting post especially as I am myself currently working on my first firebase based app and making sure I don't get huge bills is one of my worries.
Just little correction needed:
> July last year, a crowd funding campaign went viral in Columbia
I think the point he's trying to make is that even with a selection of highly "en vogue" PaaS and frameworks, you still run up against basic problems in Computer Science, and that all the fancy frameworks can actually obfuscate the source of the problem. The sentiment seems to be "perhaps you could have done this with basic tools and not made the error, rather than using something trendy".
Pretty much every PaaS provider does that. If you're exceeding the number of queries a node can process you either need a bigger node or more nodes, therefore the amount you have to pay is directly related to the number of queries you send. The only difference is that firebase automatically adds more nodes behind the scenes which makes surprise bills possible. If you're not adding more nodes, your application simply stops serving requests reliably because your DB is overloaded. Therefore the actual problem is not the billing model, it's the fact that the code is crap.