Heh. TL;DR version - Google starts recovering their costs, hits people in unexpected places.
So back when I worked there Google had no clue what it cost to run their infrastructure at a fine grain level. Sure they knew the aggregate cost, that was easy, but knowing on an application level didn't exist. This was a problem since as more and more things were using the machines, how did you "bill" a department for their machine usage? That really crystallized when the bottom fell out in 2008 and suddenly there was going to be no more new machines/data centers for a while and everyone had to 'make do.'
They mobilized an effort to figure this out, its not like it isn't knowable, and ever the data driven company the first signs of light were appearing just as I was leaving. It should not be a surprise but they discovered many things they did not previously believe was true, and I don't doubt it has driven a lot of change going forward. One of the more interesting outcomes was that projects/products were actually getting cancelled if they cost more to run than they could generate in revenue (I'm looking at you Goog-411)
So this knowledge is being applied to GAE, which is great, its also another way to back compute some of their operational efficiencies.
But that it costs money to run stuff? Well that isn't really news is it? That it costs that much? Well there is the whole if it doesn't make money it will get cancelled threat.
And the kicker is pricing out the scarce resource. It looks (and I've been gone over a year and half so I am speculating based on this move on their part) like their 'scarce' resource is web server front ends. (the labeled "Frontend instance") Traditionally they've been like most multi-tier web properties split between front end machines which host the web facing stuff, and back end machines that do the heaving lifting and storing. And by this change one can reason that residency on the 'front end' is more valuable than crunching in the 'back end.'
I'm guessing PlusFeed gets a lot of web traffic. So they spend a lot of time 'actively' on the front side, and from their numbers they do practically nothing on the back side. This fits well with the sudden massive price increase.
This gives you an insight into Google's business dynamics as well. Where page-views are the limiting resource, and computation is not. When you look at it that way, you can see that most of their 'revenue' has to be delivered through their front end services, and so consuming that resource reduces (potentially) their income. Hence the charge inconsistency.
Now contrast that to Billy-Bob's Web Farm (fictitious service) where every machine in their data center can be a web server, and front end serving is trivial, its all about the bandwidth. Their pricing would probably be more gigabytes transferred.
I would not be surprised at all if it is impractical to run such 'translation' services (basically all web traffic very little compute) on a hosted environment like Google's.
So they built a service called "App Engine" and it turns out that the "engines" that run my "app" are exactly the parts that Google doesn't have enough of, and so they are going to charge me uncompetitive amounts of money per day?
Lets suppose that google doesn't have enough machines. Do they:
a) Massively increase the price to reduce customers?
b) Massively increase the number of machines?
Actually no. I'm saying that they seem to have learned what their costs are and have priced their service based on those costs and what ever economic model they use. Given their pricing it suggested that "transactions" (which is to say a query from the web and its response) are either a good scalar for their cost to deliver, or are one of their scarcer resources.
It is entirely possible that they discover they can't make a business out of selling 'engines' like this, it wouldn't be the first time they decided they were leaving too much money on the table.
They chose "c" - scale the price so that AppEngine is revenue positive for the company using these constants for the various moving parts.
Clearly some customers will find that it no longer makes sense for them to use AppEngine. It doesn't say anything about whether the market reach will be sufficient to sustain that business.
I'm not sure if Google understand their own freemium model. Isn't the idea to let people use your cloud service at a very low cost, in hopes that they'll scale up and pay more? I feel like there's a huge gap between free and paying $400 a month. The day that the app becomes popular is the day the developer will have to shut it down or get hit with large server bills.
Your last point is very confusing. You're saying you think Google can't spin up more App Engine front ends if it wants? ("every machine in their data center can [not] be a web server") Or any other kind of front end?
Google historically splits off their "front end" which is dealing with messages from the internet and their "back end" which is processing what those messages want to do. It was briefly described in the original search paper they presented.
Of necessity, most machines that Google runs have 'private' (as in not directly addressable) addresses. This is how it seems that everyone runs things when they get above a certain number of servers.
Thirdly because one can assume that 'attacks' (whether they depend on XSS, overflows, or what not) are coming from the Interwebz its prudent to present the smallest possible attack surface on a machine which is being fed internet stuff. As you can imagine, those security constraints make for uncomfortable restrictions on what you can do in the 'front end', so a machine that is 'front end' has a different set of security constraints than a machine that is 'back end' (they are still constraints there of course but having gone through the filter of the front end machine which has greatly attenuated the possible exploit vectors).
I believe it is this armored 'front end' resource which is the scarce resource, but again, I've been gone over a year and things change quickly inside the 'plex.
Not trying to piss all over this, since I thought you might have been saying something I didn't know. But I have worked at Google for a long time and basically everything you wrote is incorrect and has been since at least 2005 before App Engine even existed (why would a paper written in 1998 be relevant?).
Without going into too much detail, it shouldn't surprise anyone that physical machines are interchangeable via cluster management software, so application front end and back end software all end up running on the same machines. They aren't addressed through DNS but by another mechanism. There is no special system administration for application-specific front ends vs back ends. Spinning up more instances of a particular server is a 1 line change. Reverse proxies (i.e. a non-app-specific front end) are also involved but they're not a bottleneck AFAIK (this problem is well "commoditized" in open source whereas other parts of Google's stack are not).
I would be curious to see a comparison with competitors like Heroku, which I've heard are pretty expensive too. Theoretically Google should be cheaper because Heroku runs on AWS and thus is paying for Amazon's profits (I have no idea how they compare.)
The original point is "almost right". The scarce resource reflected in the new pricing model is not machines, but RAM. The old pricing model assumed CPU time was the essential resource. The new pricing model bills you for the amount of time your instance (which has a fixed RAM reservation) stays resident.
It sounds plausible that "on demand" instances i.e. search servers are a bottleneck, while bulk offline processing (in the back-end) i.e. PageRank calculations is not such a constraint.
Yes, those offline processing instances could be suspended while dealing with a traffic peak, but peak traffic is still often a challenge, especially now that Google is pushing more and more stuff onto the web servers (Instant).
That might be what he meant.
Disclaimer - I don't work for Google, it's just a guess.
I interpreted it more as saying that their architecture allows them to parallelize all sorts of 'back end computations' across their infrastructure, such that their scarcity is in internet-facing processing and not processing as a whole.
Probably he is saying that because their expertise and focus lies in backend, it's costlier for Google to increase frontend service and server compared with say, other web-hosting companies.
People who are downvoting me don't really understand what I'm saying. I'm finishing up my internship at Google now. This is a nomenclature problem -- what ChuckMcM is properly referring to is front-facing backend vs "back" facing backend. They're really just two different types of backend servers. SREs manage both and they are installed similarly. It could be argued that Google perhaps has certain weaknesses in frontend (i.e., UI/UX design), but this is clearly a case where engineering talent is not lacking -- they're the same engineers as the ones working on private servers.
Yep, but one would expect Google being able to do better and get it work for lower cost at their scale and level of expertise, which is raising the doubt among developers.
I, and many others, spent a lot of time figuring out how to write apps that do it the "app engine way":
* Fast completes (30 second timeout)
* Offloading to task queues when you cant
* Channels
* Blobstore two-phase upload urls
* Mail eccentricities
We did so, because we believe Google when they told us If you write your apps in this really weird way then we will be able to give you scale and cost benefits that you wont be able to get elsewhere
We believed them, because it seemed reasonable. We laughed at those who complained that django would hit the 30-second limit: "Its not a general hosting! Figure out the App Engine way!" And we educated on how to do it right, and many were happy.
Well, it turns out that it is general purpose hosting, with all of the costs, and yet also with all of the (once rational, now bullshit) idiosyncrasies.
---
But that's not the biggest complaint. The biggest complaint is that when my friends and peers objected to App Engine, its strange requirements and its potential lock in, they were right and I am a fucking naive idiot. And I really don't like to be proven a naive idiot. I put my faith in Google's engineers and they have utterly destroyed my credibility. THIS more than anything is the cost to me.
Amen bro. The biggest frustration was we followed Google's preaching and spent lots of time to fit our apps to their restricted model, then have them turned around to destroy it. I don't have faith to do another round of optimization to fit their new restricted model. What is stopping them to change the ground rule again. I rather spend time to develop to a generic model that I have more control.
I'd recommend GAE to people who are prototyping - it's easy to do simple stuff in.
But mostly, GAE doesn't make sense for larger apps. You can't buy your way out of trouble, by putting your db on a dedicated server with fast drives and tonnes of RAM. You can't really use relational data without performance and reliability issues.
It's not just about the "app engine way". It's not like learning C or Haskell, and having to find a new way to write the code. You fundamentally cannot do big ad-hoc database operations.
And consider this - it was July last year that they introduced the MapperAPI. Before then, I don't think you could do Map-Reduce without manually re-implementing it yourself (on top of the cantankerous Appengine Datastore). Just think about that for a minute - how were you meant to do stuff the Appengine way without map-reduce?
Anyway, I don't think your credibility was "utterly destroyed". It was really hard to know whether or not the learning curve was worth climbing until you had tried. You just had to judge the book by its cover, and the "Google" brand is pretty compelling to an engineer. It's not the first time someone has been fooled into buying something because the provider has a good reputation.
I have several ideas that would scale well on GAE and neither want not need a relational db. I'm not making do without SQL, I'm actively not using it and its very successful. Even when I move to EC2, I still wont be using SQL. In fact I have only one idea that needs any kind of relational data and that is so relational that SQL is a bad fit too. EDIT: Actually, sorry, all my data is relational. Its just that I put the (small) effort in to figuring out how to make it work without joins. In the last 15 years, I've not done a single project that could not have been done with a NoSQL database.
In fact if you look at the recent comments of certain GAE engineers, they seem to believe that GAE is precisely for scaling, and that's why it now costs so much: its only for the big boys.
The problem is that I can never become one of the "big boys" on their system, because pretty much as soon as I get any traction, I have to move to EC2 or heroku or go broke. Their new found belief in the scalability of their system is just arrogance. Anyone can claim to handle lots of traffic when you require that your customers run 20 times as many frontends as they should reasonably need.
Denormalize. Match your data rows to your access pattern (i.e. your UI). Naive example: if you have a webpage that displays a list of employees, and it must have their department name and boss in that list, you put that data in the employee row. What is the probability that a boss will change his or her name causing you to have update a ton of records? Very low. (not zero mind you, so you have to be able to do it). So why pay for the join every query?
There are no longer, in my view, any situations where a SQL db is the best idea. You either want a giant NoSQL database, or you want a massive in-memory object-graph using pointers. Or you want something for $20m from Oracle or IBM.
The problem is not having to update tons of records, the problem is seeing one day, after 2 years of having the app in production, that the listing shows that employee X works in department Y and her boss is Z, but Z is not the head of Y. Bugs happen and referential constraints go a long way towards keeping your data clean.
Yeah, I was worried about all that. Hell, I was worried NoSQL couldn't possibly work at all, given my experience of SQL and the joyful things that happen there.
But I've found that my object model has evolved to handle the "scariness" of the back end. If someone wants to change the boss of an employee, they are doing if via an http post. So I've got to check that the key I was sent over http is even an employee at all and not some javascript bug. Since I have to do that, I might as well read the data into my employee object. Then the code to update employee X with employee Y as a boss is pretty straight forward and thoroughly unit tested. The code to serialize an employee is thoroughly unit tested.
Not saying shit cant happen. Now look me in the eye and tell me you never had some noob drop a constraint and forget to put it back.
You could periodically run a script that checks all the records for errors (especially embedded records that might have drifted from their current value, and not been properly changed by the app-level constraints), and automatically correct them (plus log the error).
If Michael Arlington changes his job from "editor in chief" to "founder, former editor, occasional contributor, and CEO of Arlington Investments", and his old posts aren't all updated, it's not the end of the world.
It really depends on the problem domain. You wouldn't run a bank's ledger off MongoDB. On the other hand, a bank's ledger should be radically simple, with little need for normalization.
What is the probability that a boss will change his or her name
That's obviously an example of something that will practically never happen, which is why it doesn't work all that well as a justification for ditching SQL databases altogether.
I've never used NoSQL for anything, so there must be a lot that I'm missing, and that's why I asked. But it seems to me like you'd be digging up necessary information through quite a few steps if everything is "flat".
On the contrary, its the SQL database thats "digging up the necessary information through quite a few steps" it just that massive effort required by the SQL server is hidden from you, the programmer, by a one line bit of text called a SQL statement. So you do it all the time. Indeed we've been taught that denormalizing is the "proper" thing to do because otherwise "Bugs happen and referential constraints go a long way towards keeping your data clean."
Digging kills you. I assert that SQL does the digging automatically, and thats exactly why it doesnt scale.
Yes, an SQL DB does the digging for you, but with NoSQL you'll be doing it yourself, right?
Your app will most likely have some kind of "entities", and then records to represent them. How much information can and should you cram into records of various "types"?
How much information do you typically end up duplicating across all those "entity records", and is it not a problem?
As I said in my original complaint: I, and many others, spent a lot of time figuring out how to write apps that do it the "app engine way"
That included learning NoSQL. At least that part was not a waste. There are no right answers to your questions, there are only right actions, starting with stepping outside the SQL box and writing an app using NoSQL. I started by thinking of a simple app that would be useful to me personally. I knew java servlets, I knew SQL, I knew all sorts of things, but after several iterations my app is architected like no app/server I've ever written before. Almost every iteration involved starting doing it the way I knew how, running into either roadblocks or major cognitive dissonance, and then rewriting it to fit these new-fangled constraints. Its been a huge learning experience. You might like to try it.
>> What is the probability that a boss will change his or her name
> That's obviously an example of something that will practically never happen
Women changing their name when they get married? A tiny assumption like that can make our software brittle. Now every model that caches the old name needs updating and you need to make sure there aren't any overlapping saves in any of those models that'll overwrite any items in your bulk update. If a single linked model has the wrong old-name cached, your data update process is buggy.
Now every model that caches the old name needs updating and you need to make sure there aren't any overlapping saves in any of those models that'll overwrite any items in your bulk update. If a single linked model has the wrong old-name cached, your data update process is buggy.
Well, that sounds like the kind of stuff I'd like the other guy to talk about. How does he avoid the bad sides of having all your data in a key - value store?
I would argue that all the forced scaling in App Engine makes prototyping harder. You can't use SQL. You can't reuse whatever open-source components you find on GitHub. You can't just let your app run slow and optimize later.
From what else I've read, it sounds like engineers who didn't also wear green eye-shades (or good enough ones, or who didn't possesses or use good enough crystal balls) set up this debacle. And it was people wearing green eye-shades (who we can sincerely hope are also engineers) who aligned it with reality. Causing way too many people way too much pain.
Object lesson: if you're going to sell a service for cash money to others, paying close attention to your costs from the very beginning is not optional.
The problem is how does one get from 31 cpu-hours to 879 instance-hours.
You might be thinking that in the original measure they did something insane like measure only user time of a process, or only when its executing a request, not booting or whatever (or fuck, I don't know because honestly there is no reasonable explanation). That is to say, that the 31 cpu-hours is a misread, and if the fellow in the article ran his code on EC2, he really would need 879 EC2 instances that day.
But this is not my experience. An extreme example: my app that served 14 pages was rated as taking 0.02 cpuhours, or 720 cpu seconds. This is entirely reasonable, if not excessive (because looking at the app it only took about 200 seconds including warmups). Under the new system, it is claimed that these 14 pages will require 2.8 instance hours.
0.02 => 2.8
31 => 879
So when the author of the article is told his app is going to take 879 instances hours per day, there is something seriously fucked up and wrong. It doesn't mean that the guy is running a realtime raytracing server. It means that GAE is horribly, amazingly, inefficient.
The app in the article serves 1.5gb/day and takes 879 instance-hours. What server would you need to do that on EC2: 1mb/s? The hourly cost on GAE is $1.46. Can I do that on a $0.085 EC2 instance? Yeah, I think so.
EDIT: My figures were wrong as I was comparing a $16 (wrong) figure to a $0.8 EC2 figure. The actual figure is $1.46, not $16. So I looked at the bandwidth/cpu numbers to see if a $0.8 EC2 instance is what is required, and I don't believe that it is. I think a $0.085 instance would be enough. YMMV.
It means that GAE is horribly, amazingly, inefficient.
This. We always knew GAE was inefficient. There's no doubt about that. Serving 30 or 40 requests per second would spawn quite a few instances and start producing request errors.
This is a load a 4 year old machine could handle with ease.
Why did we put up with this? Because Google didn't make us pay for the crappiness -- the pricing made sense. You don't pay Ferrari prices for a slow car...and during a surge it scales up gracefully. Go from 30 rps to 1000 rps and it'll just work. An old machine co-located someplace won't do that.
Now under the new pricing gouge Google is making us pay for their inefficiencies. All appearances are that this is what this really costs (plus some reasonable markup)...well that's pretty piss poor. Because we're essentially paying to haul cargo in a Ferrari and it's dumb dumb dumb.
It would only start producing request errors if there was bad coding. Many of Google's stuff (like the chrome updates) are done through GAE and receive no "special treatment" from GAE (except lifting the request limits which wouldn't affect any sites you make). GAE was made to be fast - and it is. It was made to be reliable - and it is (100% uptime in over 500 days).
Yeah to me this decision is either an admission that the idea of hosted apps is a failure or that the people making the decisions at google don't understand app engine.
I'm on the App Engine team, and I just wanted to clarify one thing: The main difference between CPU hours and Instance hours is that CPU hours are charged based on CPU usage, while instance hours are based on wallclock time. The high ratio between the two you can see with PlusFeed is because it's spending a lot of time to serve each request, most of which is spent doing nothing - likely because it's doing outgoing HTTP requests.
Previously, we had no way to account for apps like this, that take a lot of wallclock time but very little CPU time, and as a result we couldn't scale them well. Under the new model, the charges reflect the real cost here - memory pressure. Every second an instance sits around waiting is a second that the memory occupied by that instance can't be used to serve other requests.
As others have pointed out, we're in the process of launching Python 2.7 support - it's currently in Trusted Tester phase - which will support multiple concurrent requests, and services like PlusFeed are likely to be able to take great advantage of that, reducing their instance hours by a large factor. Likewise, doing asynchronous URLFetches (where that's practical) can cut a huge amount off instance time.
First of all, thank you for chiming in, here, on HN. Your presence might also be welcomed on the google forum, since I've read most of the posts there and no one has managed to answer this question.
If memory pressure is the issue, how are the trusted testers finding their memory pressure when they have a whole number of in-flight requests? If the PlusFeed fellow got 2.7 working, we'd expect to see 100/3.5%= 28 in-flight requests. Do you have data on how big the base memory vs per-thread memory requirements of these apps are? Python isn't famous for freeing up memory.
Which is to say, do you have any solid numbers that tell us that when we switch to 2.7, you wont have exactly the same memory pressure and either have to up the instance-hour cost, or start charging for ram, or just limit in-flight request to 3 or 4 so that our costs are only 5 times as much instead of 20?
Bottom line: what we all woke up to is that fact that as of right now:
* you set the price of an instance, and
* you get to decide how many instances I'm going to pay you for
Some of us are thinking that while that was a great idea when engineers were in charge, its not such a great idea now the bean-counters have taken over.
That's a good question. I can't point to published figures, since the 2.7 runtime is still fairly new, but I can say that based on both my personal experience and based on fairly basic reasoning, the per-thread memory overhead is definitely a lot less than what's required by the whole instance. The entire of the Python standard library, along with your framework and other libraries, are shared overhead between all the threads.
The issue with charging by CPU hour was that you could occupy memory-seconds as much as you wanted without charge; that's no longer the case - by charging for instances, we're implicitly charging for the memory they use.
As far as determining how many instances you run - you can do this to a large degree, both by setting budget limits, and by setting scheduler parameters.
What's most interesting to me about this article is that the management of GAE seems to actually be getting worse over time.
GAE has always had two main disadvantages. First, there is vendor lock-in because you code specifically to the data store, worker API, and so on (though arguably there are alternative platforms that implement the GAE API). Second, you cannot run custom code (custom C in some virtual machine) or have a custom architecture (if, say, Redis might be useful to have around). These disadvantages probably aren't changing and are probably necessary for auto-scaling, security of Google's infrastructure, and so on.
However, there are lots of little things that GAE has been getting wrong for a while that are totally unnecessary. Lack of hosted SQL support. Lack of SSL for custom domains. Just little things that are probably annoying to implement and boring, but totally necessary for real websites or websites just gaining traction. (I know these are in varying stages of early support at the moment.)
But now, the GAE team almost seems to want to actively disappoint users. With hosted SQL being a request for years, Guido appears to have spent a bunch of time re-architecting the API for the datastore instead. With this pricing increase, they're pushing the many developers who came to their platform based on price (due to the very interesting scaling properties of the Google front-end) off the platform.
Yeah, the hard limitation to request duration is pretty nasty too. It's a pretty limited platform overall, and never took aoff, so maybe they are trying to push people off it so they can eventually sunset it?
Deliberately pushing people out? Probably not. They had a major decision point available when they chose to either exit beta and step onto a release product schedule or drop the project entirely. The fact that they chose to productize it bodes well for the immediate future. Adding SLA and SSL was also non-trivial from a product perspective.
But, that does mean that you have to figure out what your revenue-producing tenants are going to look like, just as you would do in physical real estate. Yes, it looks like the high-traffic commodity-product (think Halloween store) doesn't make a good tenant. But that doesn't mean that a jewelry store or office (low throughput, high value per-square foot) wouldn't be a good tenant.
I love using GAE and got 8 apps running currently. At first the new pricing model shocked me. But please take a 2nd look. Sure it's more expensive than the old model. But you can set the maximum number of idle instances in your Application Settings page. Just set it down so no more that X instances get spun up:
> The Idle Instances slider allows you to control the number of idle instances available to your application at any given time. Idle Instances are pre-loaded with your application code, so when a new Instance is needed, it can serve traffic immediately. You will not be charged for instances over the specified maximum. A smaller number of idle Instances means your application costs less to run, but may encounter more startup latency during load spikes.
There is another setting for latency:
> The Pending Latency slider controls how long requests spend in the pending queue before being served by an Instance. If the minimum pending latency is high App Engine will allow requests to wait rather than start new Instances to process them. This can reduce the number of instance hours your application uses, but can result in more user-visible latency.
So if you are fine with a little higher latency for your app then you can reduce your bill by a great deal. If you want all that GAE can offer with max. instances available and lowest latency you gotta pay - as you would when you run n instances at another cloud provider.
The problem is the price, though. Spinning up a couple EC2 micro instances to run your app is $0.02/hr per instance and you can run a few threads per instance. Appengine is $0.04/hr (double that soon), with one thread per instance.
Thanks for the info. Is there a setting for max number of simultaneous instances? If your app gets slashdotted are your potential costs completely unbounded?
You won't, because you specify a maximum daily spend in the budget settings. So when you get slashdotted the scheduler will spin up X hundred instances, your budget will run out in 1/2 hour, and your site will be down for anything up to the next 23 1/2 hours.
I think Google just doesn't really get how unique GAE was. It was a fantastic platform for small apps, and inevitable some percentage of these would grow to big, paying apps.
Also (sorry for the armchair quarterbacking here, can't resist..) it was exactly what Google can do better than anyone - best server infrastructure + Guido Van Rossums - while stuff like Google+ is exactly what Google haven't a clue how to do.
This is amazing. You've hit the nail on the head. Google would be much more successful if it focused on being Google and didn't try to be Apple+Fecaebook+X companies all in one.
An alternate perspective: Google App Engine is still a fine platform even with the new price increases. (Which they told us were coming by the way)
And by the time the pricing takes effect the updated python runtime should bring costs down even more.
The instance costs are comperable to Heroku AND you get a high availability data store AND the ability to store really huge amounts of data in the blobstore AND a CDN for serving images from said blobstore. Not to mention background processes, task queue, XMPP, memcache, Multitenancy and multiple versions of apps so you can easily roll things back or test out updates painlessly.
Try and replicate that setup on Heroku or AWS for anywhere near the costs and time that you can get there with app engine.
While you're fighting with AWS and playing sysadmin or trying to think of ways to bring down the costs of Heroku's database services by using RDS instead or being nickel and dimed by add-on fees I'll be shipping code. Code that actually takes advantage of the platforms strengths.
The blobstore is basically S3, and the CDN for the blobstore is basically CloudFront. Both S3 and CF seem to me to have really simple APIs, plugins to Rails and other common frameworks, and are almost certainly battle tested by an order of magnitude more companies than the blobstore. What am I missing?
As far as I can tell, of your list, Heroku has all of the things you describe, with the possible exception of XMPP. Task queues (workers, SimpleWorker), many flavors of high availability data stores, memcache (or even better, Redis as a service), multitenancy, multiple versions of apps (git, releases).
Is there something bad about RDS that I don't know about? I had always assumed you just spun it up and could optionally replicate a MySQL instance a few times in different zones.
More broadly, it sounds like you had a really bad experience with Heroku. If so, what was it? I'm a bit curious because we are rather dependent on it.
All I'm saying is with Heroku you have to clobber these features together and the real cost of that is higher than people may be admitting in these comparisons.
I've worked with both platforms and I'm not saying heroku is horrible. In fact it might be the easiest to get started with of all the PAAS providers but it's expensive (they all are) and does not provide the same level of features out of the box that GAE does. Saying that because I can eventually get all the features I need by clobbering together a bunch of random gems and bringig AWS into the mix is quite different than having a unified api out of the box.
The point I'm making is that there is an additional real cost this kind of integration work (just like there are real costs learning googles API and accepting a certain amount of lock in.) and I don't think these costs are being fairly represent in the current discussion.
It seems as though you are actually contrasting two competing philosophies.
BareBoneswithAdd-ons : Heroku handles deployment (via git, releases), app servers (thin), reverse proxies (varnish), and caches (varnish). (That's "bamboo" rather than "cedar" for what it's worth.) Heroku provides optional data storage via Postgres and background jobs via worker queues. For everything else, a combination of add-on providers (MongoHQ, RedisToGo, SimpleWorker, ...) and your own custom EC2/S3 code is used.
Integrated : GAE handles deployment, app servers, reverse proxies and caches. It also has channels, XMPP, workers, a data store, and a blob store built in. All of the documentation is in one place and from one provider. All of the billing is unified.
Personally, I have always assumed that the integrated approach is by necessity. GAE can't rely on other providers to provide functionality like a blob store because they don't want other providers running in their data centers. By contrast, since Heroku is in US-East, basically any SaaS can pop up to offer functionality with low latency to Heroku apps. If there's no SaaS, in the worst case, the web app author can write their own backend services within US-East to meet their own needs.
I think you're right that the inconsistency of pricing, documentation, and quality of many providers can be a problem for the bare bones approach. However, I've found in general that almost everything I want is (a) S3, which has a simple API and a well known gem, (b) Redis, which has a simple API, a well known gem, and basically one (somewhat overpriced) provider, and (c) very rarely EC2 instances if I need to run some really weird, open source Java or backend code. I'm also much more comfortable knowing that there are basically no problems that cannot be solved, at some difficulty, by the EC2 backend solution.
I could see the appeal of the integrated solution. For example, if I had a class of students, I could just point them to one documentation source with GAE, or monitor one billing page with GAE.
Do you think that web developers generally prefer (a) integrated solutions with shared documentation and the possibility that some functionality may be impossible or (b) piecemeal solutions where they may get varying quality from a set of services they select, but where basically all functionality is possible?
"An alternate perspective: Google App Engine is still a fine platform even with the new price increases."
This depends a lot on what you're doing. GAE is now pretty terrible if you're using it for high-bandwidth applications (say, a data proxy or similar). Especially when you compare it to other providers for whom bandwidth prices have been dropping (AWS, Linode, etc now offer incoming bandwidth totally free and the rates on outgoing are very low as well). If you're using it as a number-crunching backend with relatively small in/out datasets, the new prices aren't too bad at all.
c1.xlarge on EC2 is $1.16/hour. How many of the $0.08 GAE units does it take to match one of those? How many of those GAE units can run optimized SSE4/AVX? (Answer: none of them). I've run 250 c1.xlarge units at a time. GAE wasn't even possible, let alone affordable.
I agree that picking the right tool for the job is important but I just wanted to list the reasons I prefer GAE. There is quite a lot of outrage (raising prices does that) without really considering what the real costs of the alternative platforms are.
"your app can be slashdotted or tweeted by demi moore -- http://adtmag.com/blogs/watersworks/2010/10/mobile-app-creat... -- or perhaps you may need to build/host something on the scale of both the royal wedding blog and event livestream with traffic numbers that are mindblowing -- http://googleappengine.blogspot.com/2011/05/royal-wedding-be... ... these are the reasons for using App Engine. it was not meant as free/cheap generic app-hosting but to provide a premium service that's difficult to get elsewhere in the market. if you're just after the former, there are plenty of options for you."
My take-away is that GAE is hard to justify unless your usage pattern is unpredictable and spike-y. I'm taking the long weekend to give dotcloud a serious test-drive.
I still don't quite understand their market, and honestly I don't think they do either.
If you're expecting insane traffic to begin with, it makes more sense to create your own systems using a more standard stack and not tie yourself to GAE. If you're just some small to medium sized site that gets "slashdotted or tweeted by demi moore", I guess it is nice that your site will automatically scale to serve the millions of unexpected incoming users, but when you're filing for personal bankruptcy due to the unexpected and infinitely scaling GAE bill that comes along with that, how much consolation is the fact that your server stayed up during the rush going to be?
The strength of GAE is the magic scaling beans, but to take advantage of it you need to lock yourself in massively. It's probably not realistic to port an existing application that actually needs real scale to GAE given the complexity of most apps by the time they reach that point. Therefore, the key funnel for them is new apps with hopes of becoming truly massive. Fortunately for Google way more people dream of scaling than actually will, but they need to A) not scare poor startups away with the price today and B) not scare them that they're going to get bent over and raped on price changes later.
Personally I'll never touch GAE with a 10' pole simply because of the support issue and perpetual-beta-culture uncertainties.
We tried to use GAE several times on our heavy traffic production, and was just too slow. We contacted google several times, but never received any help from them. So, thanks to this, we don't use them and we are not affected but the price change.
That's unreasonably expensive. For $68.46 he payed for a day you can have an "el-cheapo" dedicated box or a great VPS for a month. I still don't understand why people trade a bit of system administration and superb performance for a vendor lock and insane prices.
But it's 880 instance-hours per day. So assuming that "el-cheapo" dedicated box has similar performance characteristics to one GAE instance, then you'd need 37 of those boxes.
My question is why the heck 31.42 CPU hours turned into 880 frontend instance hours, and whether this is something a lot of other GAE apps are seeing as well.
Google's instance-hour is process-instance-hour. The current GAE Python can only serve one web request by one process. Multiple web requests coming in would spin up multiple processes. Besides the CPU, Google charges the app while the process is waiting for IO, like waiting for receiving/sending to the browser.
But measuring process-instance-hour is misleading. In a typical web box, spinning up an extra process takes very little resource since most process memory are shared with the parent process, and most web apps are IO bound, like waiting for network/file/DB, idling taking little CPU. In a server box, many processes can be crammed in (like the shared hosting box). But Google counts those duplicate processes as separate ones and charge the idle time these processes take. That's why you see the outrageous bill.
What's more insane is that Google charges the process-instance-hour like a machine-instance-hour. GAE charges $0.08/hr for process-instance-hour while AWS charges $0.02 to $0.08/hr for the whole machine.
He has a lot of instances spun up waiting to do work. This is your choice, but if you want to be able to handle spikes without latency troubles, it's what you should do. None appear to be doing a lot of work in this case, so he should have definitely tried messing with the settings before closing shop.
Yeah I still don't understand why the Khan Academy and other educational/open source developers are using Google App Engine when it suffers from vendor lock-in.
some of it yes, but if you write modular code then its a none issue. Doing Java development, GAE supports most Java EE and popular frameworks, so if done correctly porting would not be an issue. I can't say the same about Python stack.
Unless you're doing regular integration testing on an alternative platform, you're bound to inadvertently introduce some painful GAE-specific dependencies.
It's still an issue -- you have all this data that you have to migrate off of app engine. Some of my friends who migrated on to GAE said it took them almost 2 full weeks to load data into GAE, and now I'm sure it would cost them at least a full month to load data off GAE. The code changes are probably the easy part of the migration.
Our charges are going to be increasing from $0/day to $5-11/day. While bearable, it's a serious problem given so little notice, and disappointing since we invested in their infrastructure & optimized for their previous pricing plan. This is a serious hit for a bootstrapped startup getting off the ground. No doubt it will kill a lot of startups.
The problem is that google count each process of the python runtime as a separate instance and count them when they run only partial hours (amazon counts this way too, but you can run many, many python programs at the same time even on a micro instance).
It's not fair to make assumptions like this. I have a Google App Engine app which doesn't push put a lot MBs each day, but does use a fair amount of CPU, and I'll pay for it. It's not a poorly designed service, it's a service which doesn't generate a lot of traffic (when measured in MB), but I'm getting value from it elsewhere. In this case, my app on GAE is an API which my front end (hosted elsewhere) is consuming.
My understanding of all these GAE pricing story is that Google decided that GAE is not a strategic business and that business unit need to break even or they will be canceled. Does this make sense?
If that were the case, wouldn't apps like google calendar get cancelled long time ago? I doubt the ads on google calendar are sufficient to sustain the number of users it has on a daily basis.
I'm wondering if google is just trying to encourage an architecture which is less bad for their site.
I'm guessing a small minority of apps were doing things in a way that was eating up tons more resources than they were paying for. I bet for many apps, this could end up no worse or better.
I don't think it's a small minority of apps. 100% of the people I know who are using GAE are seeing the minimum of a 4x price increase for their app (before the 50% discount). The price increase would actually prevent some small ISVs from reaching ramen profitability. There's a thread somewhere on the GAE google groups containing many angry users.
I'm seeing a 50x (5000%) $ increase and I wrote my app from scratch "by the GAE book". It almost seems as if those of us who put the effort in to do it the GAE way are worse off than those who just uploaded django.
Even if most of my apps would increase by 10x they would still be cheaper than running a single AWS instance for them. Not even taking into account the easy of management and automatic redundancy that GAE offers.
As Wesley Chung noted in his reply on the GAE mailing list, GAE is designed for companies who get very unpredictable traffic. If you get a very predictable traffic, it's significantly cheaper hosting on AWS. Some of my apps previously cost $40 a month, now cost almost $300 per month. I can definitely run these apps on Linode for a fraction of the cost.
I picked GAE mainly because it was fast to setup, but I could have equally chosen Heroku over app engine when I first built it. In hindsight, I wish I had.
I seriously doubt that. One AWS instance can run multiple instances of your web app. On GAE you're going to pay about $40/month for each of those instances!
It sounds to me like the scarce resource was found to be RAM, not CPU. If an "instance" uses a big piece of RAM to serve a request, and doesn't give it up while waiting for a backend, then scalability is RAM-bound, not CPU-bound, and they should probably charge that way.
The $2.63 a day comes out to $81 a month which for hosting is a non trivial amount. It does come down to whether Google can turn a profit at these levels for someone using this amount of resources but I don't think customers willing to pay $50-$100 a month are ones you want to push away.
$81/month for a fun project that gets you noticed and looks good on your resume is nothing to sneeze at. Blog about it, and suddenly you have a hidden revenue stream from it. Let people know its yours and suddenly you have some weight to throw around.
I've been playing with ep.io for a while (nothing serious), it's a standard Python + PostgreSQL stack using few inis for configuration. Their pricing is a bit high on bandwidth though, so I'll probably wait for Heroku Cedar to officially support Python and use that instead.
In my mind, the problem is not so much the extreme pricing difference, but the change in what's being measured, so not only is there a price increase, but it's nearly impossible to figure out how to compare the old vs. new schemes vs. competitors.
The huge price jump is a serious problem, but the weird metrics switch makes it feel like such a bait and switch. I'm going to be highly surprised if this isn't challenged in court.
I am currently playing with the idea of moving an application from all-GAE to part GAE, part EC2 with some spot instances running jobs when cheap enough. According to my sloppy math, this should reduce the load on the GAE side by at least 60%.
Anyway, this is all theoretical (in the worst sense of the word) - the app is not even public.
But is it really worth the headache of such a Frankenstein architecture?
I initially fell in love with the "no sysadmin" aspect of App Engine, and started building apps around it. Eventually I realized that (for me, anyway) the upside isn't really worth the trouble of having to contort my apps to work in Google's sandbox- can't run SQL, have to deal with datastore timeouts, CPU timeouts, etc. When you're done coding work-arounds for all of these things, are you really coming out ahead?
This is so true. The first step to correcting a problem is admitting you have one, and everyone on App Engine clearly does (locked-in). What I'm doing about it? Moving to Tornado/MongoDB/EC2 as quickly as possible.
That is my current project's design. Web frontend stuff in GAE and AWS doing the backend processing. AWS is needed to run media servers which can't run in GAE. I was going with it despite the disadvantage of the split architecture because I liked GAE. Now with the new pricing model kicking in, the frontend cost has skyrocketed and it doesn't make sense to run frontend stuff in GAE.
Also the way Google handling this confirms my fear of proprietary lock-in, leaving a really bad taste in mouth. I meant I've spent lots of time to develop in their framework to work around their quirks and limitation. They have been advocating to developers to optimize for CPU time and that's what we did. Now they found out they don't make enough money with CPU optimization and change the whole billing scheme, screwing all the efforts done by developers. What would stop them to change again tomorrow if they find instance-hour is not making enough money for them?
I went through this thought process. I really liked Python and auto-scaling, so GAE seemed really appealing. But the combination of lack of SQL and inflexibility of architecture led me to want to create the sort of combination architectures you describe with EC2 instances or whatever.
Ultimately, after a lengthy debate with a friend, I concluded that Heroku was a sufficiently better option that it was worth learning Ruby (which I've been writing more or less as an uglier Python anyway).
A question for anyone on the AppEngine team that may also be useful for other developers:
I have an app that I want to deploy and control my costs. I would like to pay for 1 FE instance to always be running and limit temporary FE instances to a maximum of 1 (free?) instance when needed. I expect my app to have a relatively small number of users, but they will be active.
I would like to prevent the scheduler from ever spawning more than these 2 FE instances. Occasional unavailability when the site is busy is OK. Except for bandwidth and storage, I would like to know roughly what my costs will be.
Can anyone tell me how the new and old prices compare to running a similar app on AWS, Heroku or other competitors?
ie was GAE ridiculously cheap before compared to other options, and now comparable? Or was it somewhat cheaper than competitors before, and now somewhat more expensive?
I realise it's never that simple, but nearly everyone's complaints seem to be (understandably) given in relative terms of before vs. after. I'd be interested to know how it stacks up before vs. after vs. if-we'd-taken-another-route.
It was cheap by comparison. Now it appears to be orders of magnitude more expensive. I base this on the fact that their billing estimator is claiming that 1 cpu-hour on the old system equates to 100 instances-hours on the new system. Its not the $0.04/hour cost that worries me - that is a competitive number - its that their instances only appear to be able to do the work of a 80286.
It appears that the chart is being misinterpreted. The cost per hour is less under the new pricing and I believe the total hours is calculated for the month.
I wish that were true. Could there be some huge fuckup on the part of the kid programming the chart? I'm just going to keep clicking my heels and pretend that Google doesn't really want to take my monthly cost and charge me that per day.
That's the point. Appengine is changing their pricing from CPU hours to Instance hours, so an app that previously had 3-4 hours a day is now going to have 48-72.
Damn it. It appears the only solution is limiting the amount of instances in the application settings. I manage a free sync service with approximately 20,000 active users on App Engine. I just checked the billing history for our application and saw similar results, the cost will increase 10 fold.
I also have an app on GAE. http://www.tubesmix.com
Right now there's no charge as there's only a couple hundred users and the resources used are low.
But this change worries me a lot. Even more so since all my backend code is tied to GAE infrastructure. This is very disappointing coming from Google. Since when did they become so nickel-and-dimey?!
I believe frontend means requests handled through normal frontend hits from end users, while backend instances are for longer running background processes.
Everything Google does, they do in a half-assed sort of way. And it's getting really annoying. Even their core business of search is showing signs of neglect. SEO experts have learned to game the system such that the quality of search results is pretty abysmal now.
GAE is a half baked AWS. Google+ is a half baked Facebook. Google Docs is a half baked MSOffice. They have no blood in these projects and don't really care whether they succeed or not, which coincidentaly means they probably won't.
Google is getting absolutely clobbered in every category other than search advertising dollars. Their products wreak of ambivilence and neglect, and I'm surprised anyone expected GAE to be a good platform.
> Google is getting absolutely clobbered in every category other than search advertising dollars. Their products wreak of ambivilence and neglect, and I'm surprised anyone expected GAE to be a good platform.
Except for the largest business on the internet... Well search, video delivery, mobile phones, email, mapping, news, RSS, web analytics, and web browsers. Getting clobbered in almost everything else though.
They make no money on those products. They can't charge money for their products because they are half-assed knockoffs that nobody will pay anything for.
They make a ton of money on those products. It's all part of the ecosystem. Android users use Google search. So do Chrome users. Analytics users use AdWords (get too many hits with Analytics and you need to start spending some sweet AdWords bucks). YouTube and Gmail users see ads (Google Apps business users pay money directly).
Nonsense. Google makes 97% of their revenue from search advertising, they can't make any money elsewhere and they even say so themselves. Everything else they do is to keep up appearances that they are a market leader, which is very far from reality.
Android is a junky iOS-knockoff operating system they pawn off for free because it can't be sold. This comapany can't produce a single product worth paying for other than search ads, and even that is going down the tube. Sure, maybe their shareholders have been fooled, but the paying consumer hasn't. Whether they admit it not, Google is in severe trouble if search advertising decreases even a few percent.
Everyone seems to have a crush on Google, but they are going to get fucked pretty badly long term unless they find some alternative, major, multi-billion dollar sources of revenue of which they've found zero so far. Self-driving cars maybe?
> Nonsense. Google makes 97% of their revenue from search advertising
No, Google makes 97% of their revenue from advertising, not just search advertising. 31% of the revenue comes from "Google Network web sites" (AdSense).
Would it be if they charged for it? Android has massive issues across development, the market, the user experience and poor vendor practices that are routinely ignored by the same people that jump on every mistake Apple makes.
Don't get me wrong, Android has a lot of promise but it's popular because it's on so many phones. That's not exactly a "winner" if it takes dozens of phone to compete against one phone. The same goes for tablets.
The more Android devices you make, the more iOS devices stand out among them.
I said they are half-assed products. Google is not commited to GAE, or anything for that matter. Perhaps they lead in indirect ways to advertising revenue. But that's no excuse for having no support at all for any of their products. The poor quality of GAE is a symptom of an ambivalent company that doesn't deserve your respect.
Poor quality? Did you not read the posts of all the devs saying they really like GAE, some of them so much that the pricing change isn't really even an issue?
You're not contributing to this conversation at all, you're just spewing opinionated venom with no support for your claims.
My original points of Google building half-assed products they don't care about was succinct. It's others on this board that keep dragging on and defending the company for some reason or another. Perhaps it's because they're employed by them?
I've been developing for app engine since 2008 when it came out and absolutely love it. The price changes are a result of turning a successful and massively growing product into a profitable one a la search, youtube, etc... Google should be praised for this. The changes in price also accompany an SLA that guarantee developers will receive three years notice before a breaking API change or service shut down.
The SSL problem is a limitation in some browsers that causes the type of certificates that GAE needs to use a CNAME, not IP, based routing to display huge warnings.
He wrote software that many people found useful and gave it away for free. That's awfully damn generous as far as I'm concerned. You see an improvement that can be made? Then make it. But stop complaining.
On the other hand to make a huge stink about a 400 line script that is poorly designed and takes advantage of none of the APIs provided by app engine to make things preferment like back end instances or task queues is a little disingenuous.
Global variables galore, tabs no spaces, no documentation, compares to empty not just once but multiple times, pokemon style exception handling (you gotta catch them all!), 'is not None' lol, type checking instead of duck typing... Just to name a few.
14 pople forked it already, I hope those are Github bots.
More importantly, the script works, and was apparently used by quite a few people. Rewriting something that is already known to work, just because you don't like its style, is generally a bad idea.
This code is a complete abomination, why would you stick your neck out and defend it? What igorgue is saying is, even if somebody gave me this code, I wouldn't take it, and I concur.
So back when I worked there Google had no clue what it cost to run their infrastructure at a fine grain level. Sure they knew the aggregate cost, that was easy, but knowing on an application level didn't exist. This was a problem since as more and more things were using the machines, how did you "bill" a department for their machine usage? That really crystallized when the bottom fell out in 2008 and suddenly there was going to be no more new machines/data centers for a while and everyone had to 'make do.'
They mobilized an effort to figure this out, its not like it isn't knowable, and ever the data driven company the first signs of light were appearing just as I was leaving. It should not be a surprise but they discovered many things they did not previously believe was true, and I don't doubt it has driven a lot of change going forward. One of the more interesting outcomes was that projects/products were actually getting cancelled if they cost more to run than they could generate in revenue (I'm looking at you Goog-411)
So this knowledge is being applied to GAE, which is great, its also another way to back compute some of their operational efficiencies.
But that it costs money to run stuff? Well that isn't really news is it? That it costs that much? Well there is the whole if it doesn't make money it will get cancelled threat.
And the kicker is pricing out the scarce resource. It looks (and I've been gone over a year and half so I am speculating based on this move on their part) like their 'scarce' resource is web server front ends. (the labeled "Frontend instance") Traditionally they've been like most multi-tier web properties split between front end machines which host the web facing stuff, and back end machines that do the heaving lifting and storing. And by this change one can reason that residency on the 'front end' is more valuable than crunching in the 'back end.'
I'm guessing PlusFeed gets a lot of web traffic. So they spend a lot of time 'actively' on the front side, and from their numbers they do practically nothing on the back side. This fits well with the sudden massive price increase.
This gives you an insight into Google's business dynamics as well. Where page-views are the limiting resource, and computation is not. When you look at it that way, you can see that most of their 'revenue' has to be delivered through their front end services, and so consuming that resource reduces (potentially) their income. Hence the charge inconsistency.
Now contrast that to Billy-Bob's Web Farm (fictitious service) where every machine in their data center can be a web server, and front end serving is trivial, its all about the bandwidth. Their pricing would probably be more gigabytes transferred.
I would not be surprised at all if it is impractical to run such 'translation' services (basically all web traffic very little compute) on a hosted environment like Google's.