Hacker News new | past | comments | ask | show | jobs | submit login
How Far Out Is AWS Fargate? (iopipe.com)
117 points by kiyanwang on Aug 15, 2019 | hide | past | favorite | 119 comments



The problem with fargate is that like all AWS "compound tools" that are meant to be an answer to competitor x, is that they are painful to use if you don't have an infra team

take elastic beanstalk, its meant to be a competitor to heroku(thats how it was pitched to us in $big company), but my word it misses the mark.

Lambda and API gateway is a massive faff to setup manually, but serverless and zappa make it really really simple to do. That and its cheapness is why its caught on, its fast(to iterate), simple enough and super cheap.

Fargate is kinda aimed at people who have out grown lambda, but then you will be evaluation the whole hosting ecosystem. But if you are evaluating hosted K8s, I don't know why you wouldn't just plump for GKE.

Mind you, avoiding K8s and sticing with lambda + ECS for long running stuff is far simpler to understand, even if updating it with CF is a massive ball ache.


> Mind you, avoiding K8s and sticing with lambda + ECS for long running stuff is far simpler to understand, even if updating it with CF is a massive ball ache.

actually with GKE you can run your "AppEngine" apps directly on GKE: https://cloud.google.com/appengine/docs/flexible/python/run-...

Better yet there is the "next gen" stuff from AppEngine (Cloud Run) which is like Fargate, but can ALSO run on your own GKE: https://cloud.google.com/run/

the only thing which I still think is way more expensive/worse than the aws counterpart is RDS. (Google has a hosted db service, but that feels worse than rds)


ECS & Fargate is probably closer to the Heroku experience than Beanstalk is these days. You've got the same primitives you would with Heroku, in the form of Fargate tasks, being orchestrated by an ECS service.

Its definitely more involved than just a git push, you'll have to deal with RDS for databases for example, and get credentials for that into your application. Having said that I think the extra effort needed for ECS is worth it, you end up paying less for the resources your consuming (in some cases significantly less), and you're sitting within AWS so integrating with other products they provide is a lot smoother.


If you can understand Lambda, ECS, and CF you can understand Kubernetes. The problem with Kubernetes is not that it’s hard, it’s that Kubernetes lexicon is new and people are still learning it.

Now that I’m knee deep in the world of Kubernetes I actually find it easier to use and run. CF is an absolute abomination in comparison. God help you if you need to rollback in CF or debug something. I have none of those problems in Kubernetes.

Kubeless is a lambda replacement that is easy to setup and run. Apache Openwhisk is another alternative.

Why would I ever want to lock my company down to a specific vendor and their tooling when I don’t have to? Eventually cloud providers will need to contend with the idea that most of the stuff they’re doing will be replicated in Kubernetes. Kubeless and projects like Cert-Manger are perfect examples.


Feels recognisable. We've outgrown Lambda and moved to Fargate. But now we're wondering whether we should dive into K8s. Worth the investment in your opinion?


Its a lot of work. Its useful if you want to merge platform with other departments.

However for a small team, I don't think its a great fit, yet.


I'm pretty sure Beanstalk predates Heroku. I'm curious how you think it misses the mark though. I've found it to be really easy to operate although that's mostly low-traffic systems.


Heroku was 2007. Beanstalk was 2011.

Salesforce purchased Heroku in 2010 for $212MM in cash before Beanstalk was even announced.


I don't see how you could say it doesn't miss the mark of being a heroku level PaaS competitor. Heroku is leaps and bounds simpler to use and configure for most use cases. I mean getting started its either heroku new; git push or just point it at an existing public repo.


Beanstalk was announced in 2011. Heroku came out of beta in 2009.


Disclaimer: I work for Salesforce, Heroku’s parent company.

When Fargate was released I was very curious about it as it seemed like AWS was moving up the ladder towards PaaS and I wanted to know how it compares to Elastic Beanstalk and also Heroku.

I started going through this AWS written tutorial:

https://aws.amazon.com/getting-started/projects/build-modern...

The app they use as an example is of course meant to demonstrate how a plethora of AWS services can work together.

I still couldn’t help but be surprised at the sheer number of different things I had to learn and configure, sometimes involving editing yaml.

For long time AWS users: is that tutorial representative of how you really build apps in AWS?


AWS couldn't build a Heroku-like PaaS if their entire business depended on it. They're organizationally incapable of something so simple, beautiful, and productive.

I legitimately believe there's a checklist at the bottom of every new AWS service launch, and among their internal requirements includes a line like "Is this product so amazingly incomprehensible that customers will take weeks to integrate with it and be forced to upgrade to a technical support plan?"

The closest any major cloud player has gotten is App Engine. And its pretty close; it has issues, and some inherent complexity, but generally I recommend any new startups look to either App Engine or Heroku for hosting. Avoid AWS like the plague.


> Avoid AWS like the plague

Autoscaling: Heroku - 2017, AWS - 2009.


Yes. The Heroku developer is still fantastic and pretty unique, and hasn’t been replicated (yet). If Heroku intends to maintain the lead, it’ll have to keep moving - the new offerings like Kafka and Redis before a pretty good - providing S3, SNS and SQS as addons would also be pretty great - right now one still needs to drop into AWS for these services.

Heroku also really needs to wake up to the world outside the US and Europe. Just enabling the other regions (for general use, not the super expensive private spaces) would be a step on the right direction.


That is exactly it, except sometimes you're editing or copy-pastaing JSON instead of YAML.

I've often assumed that no one's KPIs are tied to making things more usable or docs more readable, and everyone's KPIs are tied to shipping new features.


As an AWS employee on the container services team one of my KPI's is definitely making things more usable. Have you seen the AWS Cloud Development Kit? (https://github.com/aws/aws-cdk) It is a more powerful infrastructure as code framework that can docker build, docker push, and run an AWS Fargate task with just a few lines of declarative TypeScript or Python code. You can see some examples here in the "ecs-patterns" module: https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecs-patt...

In general I'd say that AWS takes the approach of offering a lot of configurability if you need to tweak things. You can go into that low level JSON or YAML if you want, but we are also providing higher level tools like AWS CDK which do the work for you, and let you just focus on building your app with just a few lines of code.

I'd say that the referenced tutorial is aimed at being "mid level". It abstracts some things but still shows a lot of the lower level settings you can tweak directly. If you want a more abstracted, higher level getting started experience you should start with a tutorial for a higher level tool like AWS Cloud Development Kit, and use that to automatically setup and deploy to AWS Fargate under the hood.

If you (or anyone else reading this) have any specific feedback on things you want to see made more usable, or docs that aren't readable feel free to email me using the address in my HN profile.


Thanks for making the cdk! It’s awesome. Would love docs / examples on how to connect to aurora serverless from fargate with secrets in kms. Also, auto rotating certificates for the load balancer. And lastly, a fully functional ci/cd pipeline with staging/prod and backfilling aurora serverless from prod->staging, and also how to import data into aurora serverless from a postgres dump, and how to get aurora serverless data copied to a local postgres. Thanks!


Great suggestions! Thanks for the feedback!


That tutorial is representative of if you want to go all in on AWS and have your developers live and breathe in AWS.

Everything in the Architecture Diagram of Module 2C are definitely necessary pieces of modern software development, you could easily replace as follows:

AWS Cloud9 --> VSCode

AWS CodeCommit --> GitHub

AWS CodePipeline + AWS CodeBuild --> CircleCI

So in any case you are configuring _something_.


Yeah, seems about right for a tutorial. In real life things are never as simple or well suited to the underlying technology so it's lots of painful little missteps until you get it "just right" for your use case and the strange rules you're subject do because of your organization.


I use Fargate because it's not a PaaS. I don't want to use a platform, I want something to schedule my containers for me. (Granted, AWS is a platform, but not how you mean, I guess)

Most people don't write good tutorials. Often what they're trying to teach you is ambiguous, they don't clearly define what the dependencies are, and they don't provide clear steps. AWS's docs are not great, but same goes for everybody else.


"Heroku’s physical infrastructure is hosted and managed within Amazon’s secure data centers and utilize the Amazon Web Service (AWS) technology", I did not know that.

https://www.heroku.com/policy/security


> surprised at the sheer number of different things I had to learn and configure

AWS is complex just enough for production infra requirements. They don't make some stuff up just for fun :) Heroku is more for prototyping or personal blogs, imho.


Disclaimer: I'm employed by Salesforce, and I work on Heroku.

I agree Heroku has the perception of being only for prototyping projects or small projects. It does work great for those.

But also, there are lots of large companies (with non-trivial and large scale deployments) using Heroku, such as Macy's and Toyota and many others I can't mention without their permission.

https://www.heroku.com/customers/case-studies


Those companies did not yet run price/performance analysis probably.


Possibly, but heroku does not try to optimize for price/performance. It tries to optimize for price/shipping production software, which includes performance but also lots of other things like ability for application developers to iterate quickly and the human hours saved configuring and operating infrastructure.


The best way to describe a service, rather than 1000 words of buzzword heavy text, is a service diagram, a hello world example of the config, and a more complex example showing off the best/main features.

I don't want to hear that 'it's a bit like kubernetes but not quite'. I want to try it inside 30 seconds and see for myself.


> I want to try it inside 30 seconds and see for myself.

So let me get this straight, you want to try something as complicated as Kubernetes, inside of 30 seconds?

I'm thinking you might have some unrealistic expectations.


I think it is a realistic expectation. Think of EC2 in 2006. Getting started was almost immediate if you already had an Amazon account. You downloaded a CLI tool and your credentials, and started a VM. Prior to this, it might have seemed unrealistic to provide such a streamlined experience.

I would love to see the same done for Kubernetes. What I want is a kubeconfig file that links me to a paid account somewhere, and whenever I run `kubectl apply -f foo.yml`, I pay by the millisecond for whatever resources get created. Zero ops for me the customer, and all the complexity will be on the side of the company offering this service.


I think Okteto is an example of what you're describing. (This is for K8s, I just learned about this service yesterday and it does what you're describing, kind of.)

okteto.com

I'm not sure this is relevant, but it's a great example of a service that has a tutorial, that covers multiple use cases, and lasts less than about 2 minutes, leaving you with a pretty clear understanding of what else you're meant to do.

This is how all elevator demos should be.


Thanks for the link, I will check it out.


See also, k3sup

Since it was already easy to get a cloud machine on the Internet!


> I would love to see the same done for Kubernetes. What I want is a kubeconfig file that links me to a paid account somewhere, and whenever I run `kubectl apply -f foo.yml`, I pay by the millisecond for whatever resources get created.

Have you tried digitalocean's kubernetes offering? It takes about 10 clicks to create a cluster and download a config.


What I'm envisioning wouldn't involve creating a cluster. As soon as I set up the account, I could download the kubeconfig and start creating resources. The cluster would be invisible to me, aside from the namespace I was given.


I have built this on Digital Ocean's dropletkit API client gem for Ruby, but without the "we send you a bill" part.

Two or three clicks to kubeconfig, then your cluster is deleted in about 4 days. I call it Hephynator and it's not open source yet, but I would definitely consider it. (This model works for me, because I received Open Source credits from DO. :thanks:)

I don't know how much that helps, but DigitalOcean's built-in interface to creating clusters is about that easy. It's nicer than my stripped-down version. Things like "how long until your cluster is ready" -- I didn't take care of that in my DropletKit client, but DOK8s does in their web interface.

My driver to build this little widget was the fact that their Kubeconfig files that are issued by DigitalOcean's interface by default expire after 7 days, so I either needed a way to be sure that my OSS contributors who I hand these clusters out to, could get another kubeconfig when it expired, but not wait for me... or, their cluster would not live as long as the expiration date, which seemed to be a more reasonable economy-driven decision.

I decided to make the clusters last 4 days and then delete themselves. It was a fun project, and now we can use it to make more fun Open Source.

I should open source it. It's a very simple rails app. It does exactly what you describe, I just push "Create cluster" and then confirm some parameters, then get a "Download Kubeconfig" button which is the last step where you have to interact with anyone other than K8s API. It needs to be made pretty, before I'd consider publishing it. But for now, it does the job and my team is using it fruitfully :)


I don't think he meant 30 secs to "evaluate" the service. But you should be able to get enough from a pitch page to know whether or not it would be worth clicking through to "Getting Started" and installing the prereqs.


Yep. For new AWS services, I always end up having to open the web interface and click through the GUI before any of the rest of the explanations make sense.


Yes, I generally will read about a service, then click around the GUI to make "something". Then I'll put something in terraform (roles, needed buckets, images) and then try to make it go. As someone else said, a diagram with a quick-starter implementation would go far in getting people interested.


K8 is a software suite that allows you to partition and use a set of resources (eg, partition CPU/Memory/Space to run docker containers). Naively, it consists of a master app and a bunch of agent apps. Each agent app needs to run on the resource you want to put together as a cluster. If you want to run your own k8 cluster, not only you need the actual resources (eg a bunch of computers), but you need to run the master app and the agents on each computer you want part of the cluster, with all the work required by this, for keeping things up to date, making sure the master is available etc.

EKS and GKE are the "managed" k8 solutions offered by Amazon and Google. Not sure how much of the management they do here, never researched the subject, the advantage being that your automation that uses the k8 "language", can work with both. I repeat, I don't know the low level details, eg how do you get some resources in such a cluster.

ECS is the pure AWS alternative for K8. Eg, is a software solution to manage docker containers. Is not compatible "language" wise with K8 tools afaik. But, surprise surprise, is very well integrated with a host of AWS services.

ECS can work with two different types of resources, EC2 clusters and Fargate clusters.

EC2 Clusters are sets of EC2 instances, you start them in whatever way you want, make them as big or small as you want, you just need to run the ECS Agent on them. If you use the Amazon AMI optimised for ECS, this is trivial. The drawback is that you need to manage these instances, software update, you need to start new ones if you need more capacity, stop them when you are wasting resource etc. Obviously, AWS has a bunch of other services you can use to make this easier (CloudFormation etc). The advantage is that you can customise the AMI if you need to and you can leave data behind, after a container terminated. For example, all my tasks in this particular ECS Cluster need access to the same S3 bucket of data. So I'm precaching it on the machines and then simply mount a volume in each docker container that needs that.

A Fargate cluster takes away all of this complexity, you just need to specify a name. Then you tell ECS to start tasks/services (services are just tasks that need to run in perpetuity) on the Fargate cluster and the thing will scale up and down, and scale your price up and down, as you need it. Two drawbacks: you can't customise the underlying host and you can't have persistent data left on the host after a container exits. Therefore, the project I'm working on atm would end up costing more to run, since I can't cache the s3 data and every container that needs that data will end up re-reading it from s3.

Best I could do :)

/edit: GKS -> GKE


Hello,

I have a different question, if you don't mind.

Under what conditions or use cases, do you think it may make sense to migrate from ec2 to ecs? What are the advantages or disadvantages that one needs to take into consideration?

Thanks


What do you mean by that? Ecs can work with clusters made out of ec2 instances. You need to give me more details on how are you running things now.


So..I have couple of services (sentry, logstash and some others) which are running in their own ec2 servers. I am deliberating whether it may make sense to move them to container instances.

I guess what I am asking is that what points should I take care of or give thought to if I decide to migrate from ec2 instances to ec2 instances inside of ecs?

Does it clarify now?


Well, I'm 4 pints in and still typing this (started about 3 hours ago), so I have to apologies that I didn't really answer your question. I'm really fascinated about this kind of tech and it makes me sad when people get hung on buzzwords and fail to understand the underlying problem/solutions. You could solve all of the above using very old tech like PXE boot with POE and some custom code. We could have a chat at some point, when I can learn more about your use case, your choices and why you picked them, and then I might be able to come up with a proper strategy. But trying to solve a very generic thing is not working and I'm afraid the very best answer I can give you is to RTFM and play with all the toys you can find, learn what they do, how they do it, how they are different and apply all of that to your problem. I'm going to leave what I typed in the past 3 hours, I did try to answer your question, but I'm afraid I can't. There is no one size fits all answer here.

-----

I have just a very very shallow knowledge about these two services, so I'm not going to comment on them. Also, to make sure there are no missunderstandings, clusters of EC2 Instances and Fargate clusters are the things you put stuff in, like barrels, ecs is the tool you use to get water out of the river and put it in the barrels.

At this point, I don't really see any benefit on running 3rd party apps, like I assume logstash and the other one are for you, directly on the ec2 instance.

As a note, I would not run my own db server/cluster or redis or what have you, if I can avoid it. I would pay for it (and do atm, we are using RDS dbs).

The question is if you can benefit from using ECS to start docker containers or not.

You don't need ECS to run a docker container on an ec2 instance. All you need to do is ssh on the instance, install docker, docker pull and docker run. There, assuming you are building your own docker images with all the configs required, this is all it takes to start a new instance of your app. You can even automate some of this with a Launch Template. Two big advantages of doing this: fairly simple setup and might get cheap - if you prepay for your instance, say for 1 year or more, you can get huge discounts.

There is a drawback to this method, but it depends on your use case: you can't react quickly on spikes. The way to increase capacity, for this setup, is to simply replace the ec2 instance with a bigger one. Then put back the smaller one. Since you have to manually do this, I think is fairly obvious why this is a drawback.

So far, the assumption is that your app can only scale vertically, eg to get moar power you need a bigger server. There is the case when you can scale horizontally, eg if you need more power, you just spin up another instace of the same app and share the load.

If you can do this, eg scale horizontally - instead of replacing a t2.medium with a t2.2xlarge, you simply start 3 more t2.medium and all 4 can cope with the load, then shut the extra 3 off when the spike is over (I have no idea if 4 x t2.mediums can do the job of a t2.2xlarge, just giving an example). You can manually do this or automate it via autoscale/CloudFormation or even your own scripts that read crap from CloudWatch and use the api to do things.

You noticed that I didn't say when to use ECS yet and this is because ECS is the tool you use to automate most of the above. The truth is you can live a happy ops guy life without ever touching ECS, or EKS, or GKE, or K8, or chef or ansible or what have you. But knowing how these tools work, this is going to make your life soo much easier.


This is really helpful, thank you!


Watching this from a distance, I vaguely understood Docker. Then everyone was talking Kubernetes, and I vaguely grasped it was some kind of meta solution for something or other. And evidently really complicated but really buzzwordy. And then came Serverless which seems to really mean 'sortof stateless' and reminds me of MTS, an early Microsoft effort that let you run COM objects as a service on an NT server, but only if they were stateless. And now this thing, which I can't begin to fathom. In short, I am lost, but I'm beginning to wonder if all the cloud complexity might one day wrap back around to something very simple again.


You are exactly right, serverless is very much like COM (and wow, haven't thought of MTS in decades), except (!) that there is no mechanism for defining standard interfaces provided by a component, and no registry in which to indicate that said component provides said interfaces...yet. The "ServiceCatalog" cloud product will be the vehicle through which the concept of interfaces is delivered to serverless. Absent interfaces, you might think serverless is spaghetti messiness, and you would be right.

We are in a very messy infrastructure stage at the moment, which really just revolves around 3 concepts- being able to run code on a lot of machines as "easily" as one machine (kubernetes), being able to support in that context multiple runtime languages consistently (docker), while also protecting the machines the code is running on from malicious and/or poorly behaved code (kubernetes and docker together). So maybe we will soon get back to a place of "simplicity" with a stateless function-oriented programming model that looks a lot like COM/DCOM, hopefully with better ergonomics.


When viewed from the lens of a pure app developer, yes these technologies tend to be just more to grok.

But the benefits aren't really reaped by the app developers - they are reaped by operations. These technologies save money by reducing operational complexity. This happens by decoupling the infrastructure from the app. For MTS you still had to configure and manage your server or pay someone a lot to do it. I guess we have come full circle but it's probably more flexible and cheaper this time around.

Here is my 2 cents.

Instead of your team managing a fleet of servers with operating operating systems (which includes security patching, user management, log rotation, etc) you move to raw containers running on a self managed server fleet. Now you have (largely) decoupled the app from the infrastructure it's deployed to via the container interface. Remember CodeDeploy scripts? You don't need that any more since devs are delivering immutable images. That code is moved into the docker file.

Ok so you've gotten this far and it's great, but how do I do autoscaling, failover, etc. Well you can have your ops team do a bunch of work to make it happen on your self managed container host or you can run your containers on a managed Kubernetes fleet. Bam, now you have autoscaling, failover, etc. And the best part is that the Kubernetes API is platform-independent so it's much easier to move it to a different cloud. Or you do Fargate here if Kubernetes is too complex for your needs.

But your app is dead simple and you don't care about configuring all this crap. So you ditch Kubernetes (or you jump straight here) and you write serverless. Now the ops work is dead simple: deploy serverless app with 10 configuration options. No autoscaling to manage, way less security to manage, no user accounts to worry about. Just make sure it's written correctly and runs on the specified interpreter.

IMO there is no way that these technologies are going away. They provide way too much convenience. If you're a small startup (maybe no ops specialists?) and need to run a self-managed off the shelf OSS server, do you want to waste time dealing with ssh keys, choosing the OS, configuration, etc? Or "just" grab the container and throw it into your EKS cluster and forget about it (This step will get easier). Need to deploy a stateless app? Throw it into Lambda and don't even worry about EKS.

Disclaimer: Yes each benefit that I mentioned is possible as you go down layers. That's because the higher layers are built on the lower ones ;-) This post is about how much work it takes to go from 0 to live and manage the live system indefinitely.


I'm currently working on a project that I've suspected from the very beginning is way too complicated for a serverless solution. Nine months into it and I'm thoroughly convinced. The scaled container approach (Kubernetes/Docker)you've described would have been much more pragmatic.


wonder if all the cloud complexity might one day wrap back around to something very simple again

Yes, all these solutions are gradually converging back to CGI.


CGI? CICS!


How so?


Actually, is not very complicated. The main problem is that people, from both sides of the fence, eg customers and the aws/goog/ms people are conflating things, either deliberately or by simply lack of understanding.

I'm going to simplify some of the things, please don't get your pitchfork out if I gloss over some aspects.

Consider a web app, you most likely store your data in database, have some code for the business logic, say php and a bunch of front end stuff, html and js. You set up a web server, say apache, with and interpreter for your business logic code, say mod_php and a db server, mysql, on the same physical server (or virtual server). Your www.something.com is up and running.

Fast forward some time and your business grows. You notice that access to your website is very slow, since there are so many people trying to use your website. You move the db on another physical server to free more resources for the business logic. This doesn't last for long and your website is slow again. You also notice that your db server is not doing that much, yet the app server is being hammered.

You set up another app server, call it ww2.something.com, rename your original one to ww1.something.com and connect both of them to your db server. You also set up a third server, way smaller than the app servers, that on requests to www.something.com, based on some heuristic, will http redirect the user to either ww1. or ww2. [glossing over the db details here]. You see this works and you keep adding wwxxx.something.com. One day, on 20th of December, after you finished deploying ww1337.something.com, you realize you have another problem. How on earth are you going to deploy the brand new version of your webapp, that your team worked hard to finish before Christmas, in time for 25th? And there was a new CVE published, apache ver 1234 is vulnerable to the Grinch exploit. Not to mention that you expect a huge spike in traffic from 25th to 10th of January. So you need to add more servers. You also need to pay for these extra servers all the way to 25th of January and you start to wonder if this is worth it. And ww1256 has a hardware failure. Or was it ws1265?

Some time after, you had a chat with the CEO of this hosting company, he wants you to stop using bare metal and instead, move your stuff on virtual machines and they will make sure that all our vms will run, with an uptime of 99.99%, on their cluster. They also give you some tooling to build virtual machine images, thus a new deploy takes hours now, instead of days. No more worries about hardware failures [glossing over], if you find vulnerabilities, you can react on hours.

With the extra money made by being able to quickly deploy new code, hence develop new features, you hire more people and grow your business even more. One of the new people pokes around for some time and comes up with a list of things she thinks can get better, like your web app has powerusers and the http redirect load balancing makes everybody that happens to end up sharing a server with such an user, terrible miserrable, you should load balance based on request, not session. Most of the features are fairly light, resources wise, just a handfull are heavier. You should offload these requests to another set of bifier, more expensive vms, so while the main app servers wait for this forwarded request to happen, they can asnwer a bunch of the ligher requests. You agree and add the final piece in the puzzle, the application load balancer and move some of the features to a different set of resources.

Hope this clarifies what some of these pieces do. Next, a brief list of real life equivalents for the above:

1) vms: any technology that allows you to have a piece of code run in isolation. VMWare images, Virtual Private Servers, docker containers, this is NOT a new technology.

2) cluster: any group of CPU/Memory/Storage you can use to run vms. AWS Fargate is one, the entire EC2 is another one. An ECS Cluster of EC2 instances is yet another one.

3) tooling: whatever you use to create vms and instruct the cluster to run your vms. ECS is the pure AWS solution. Docker compose is another one. Kubernetes is another one. EKS is the aws offering for managed kubernetes. GKE is the google version.

/edit: GKS -> GKE


> "Compare that with AWS EKS pricing where the Kubernetes cluster alone is going to cost you $144 a month. A cluster isn’t terribly useful without something running on it, but just wanted to highlight that Fargate doesn’t come with a cost overhead."

That $144 a month is a deal considering it's just the price of a few instances that you'd normally provision anyway if you were rolling your own k8s cluster, and it manages the control plane administration for you. The moment your hand-rolled k8s cluster goes the way of so many horror stories [1] that savings evaporates into hours of expensive engineering labor hours wasted.

[1] https://k8s.af


EKS has its own share of problems, and in several cases they can't be mitigated due to the limitations that EKS imposes. For example:

- Service cluster IP range is not definable, making it difficult to integrate with an existing network topology [1]

- Limited choice of CNIs, e.g. Calico can not be used

- No accessible etcd snapshots for recovery

I tend to avoid EKS and prefer to create most clusters by using kubeadm as the foundation.

[1] https://github.com/aws/containers-roadmap/issues/216


EKS Engineer here, thanks for the feedback.

Service IP configurability is a very common ask, and as you’ve linked, is on our roadmap along with a slew of other control plane configuration options.

You can delete the AWS VPC CNI DaemonSet and install any CNI plugin you’d like.

EKS regularly backs up etcd and has automatic restore in the case of a failure. Manually restoring to an old snapshot would be quite disruptive. What is your use case, and what would be the interface you’d like to see?


Indeed, the networking layer that EKS forces upon you is quite strange. For example, you can only run 17 pods on a t3.medium instance because of how IP addresses are managed in your VPC (and because of various mandatory daemonsets that run pods on every machine, like aws-node). Most people will likely find out about this very low limit not in normal operations, but when one AZ blows up and pods are rescheduled on other AZs. You will hit the pod limit, things will be down and nothing new will schedule. And Amazon provides no meaningful monitoring for health at the control plane level; you need to bring all that yourself.

Managing worker nodes is also a huge pain. They recently released a tool that manages some of it, but it doesn't work with older clusters. To do something simple like fix a vulnerability in the linux kernel, you have to make a new CloudFormation stack (involving cutting and pasting a ton of random stuff; the template ID, the AMI ID, etc.), edit several security groups, start the new nodes, make sure they join, drain the old nodes, delete the old stack, edit security groups. Upgrading the Kubernetes version in the cluster is a similar story, especially if you skipped an intermediate version. (That said, incidental changes like adding more nodes is easy. It's a lot rarer than upgrading Linux or the k8s version, however.)

Their load balancers other than Classic don't work every well either. I would prefer to use a NLB for all incoming traffic (terminated by Envoy in the cluster which does the complicated routing) but that apparently results in kube-proxy not working right... and every time you change a set of pods whose selector matches the NLB service, it changes all the IP addresses in DNS, and traffic just stops arriving until the DNS TTL changes. It is very broken and the docs should say "never even think about using this" instead of "it's beta and here are some weird caveats that probably won't dissuade you from using it".

Anyway, my overall impression of EKS is that Jeff Bezos walked into someone's office, said "you're all fired if we don't have some half-ass Kubernetes service in the next 3 months" and walked out. The result is EKS. It's wayyyyyy better than being locked into Amazon's crazy stuff (CloudFormation, etc.) but it's not as good as what other managed k8s providers offer.

If I were to setup k8s again, I would just buy my own servers and self-host everything. Dealing with Things That Can Go Wrong With Servers is much easier than dealing with Things That Can Go Wrong With AWS. (But I already have a datacenter, can get the machines a 100Gbps connection to the Internet for free, and have a team of network engineers to tell me which CNI to use. If I didn't have that I would just use GCP.)



Calico can only be used as the network policy engine on EKS, not as the CNI


EKS Engineer here.

Calico policy can be used with the AWS VPC CNI, but you can remove the default CNI and install Calico or any other CNI plugin you’d like.


In theory, you could replace the CNI on worker nodes, but is that something that is practically useful (when it can't be done on master nodes in EKS) and supported? How would the kube-apiserver, for example, communicate to the metrics-server if it is not connected to the Calico network?


You are correct that the API server is only aware of the VPC network, and not any overlays. One solution to the metrics-server or other webhooks is to use host-networking mode so the API server can have connectivity.


GKE, the equivalent GCP service, is free though. You pay for your nodes at regular Computer Engine rates, but the master is at no extra cost.


Same story for Azure and Digital Ocean, AWS is the outlier here.


Absolutely, but it's worth mentioning that it's not zero-cost for those providers. (I have no idea what their actual cost is, and I assume they take steps to minimize it...)

EKS is priced competitively, with Amazon's other offerings. I was a part of the chorus of voices saying "wtf Amazon, control planes ought not cost, nobody else is charging for them" but I think this is fairly priced for what it is... it only takes four reserved M4.large instances to eclipse the cost of an EKS cluster, and you will likely want more than that if you are aiming for real High Availability and trying to build it on your own EC2 nodes.

The default configuration of EKS is fully HA, AIUI is built to be resilient to faults like failure of an entire region.

If you don't want to pay for the Kubernetes API, the interop chances for engaging outside support that it gives you, and the associated complexity required to support it and keep pace with the release cadence, then there is also Fargate which is cheaper at low scale, (or my preference, go with the competition who have all agreed to undercut AWS, so why not take advantage as it's clearly favorable to run at least some of those workloads elsewhere.)

And hey, there's even Virtual Kubelet on Fargate if you want to get the best of both worlds, right? Of course that one comes with a big warning, DO NOT run any Production...


As much as I like the described move from "functions" to "containers", solutions like KNative / Cloud Run are still inherently more hackable than solutions like Fargate.

It's good to have properly defined, stable, open interface for compute workloads (containers as defined by OCI), so we don't have to lock ourselves into whatever shape of runtime environment our current cloud provides for their flavor of FaaS. So we don't have to learn cloud-provider-specific tooling, getting certifications for handling various tasks at AWS, becoming 2010s variants of Cisco-certified network engineers of the previous age.

But it's even better to have a properly defined, stable, open interface for orchestration too. To be able to run stuff locally. To be able to extend things. To ease the lock-in cloud providers currently have. To be able to actually understand what happens under the hood. And last but not least: to enable rise of open source solutions for higher-level abstractions, like KubeDB.


Funny that both sides has its gripes with Fargate.

For a serverless proponent like me it's still too much config.

For a no lock-in proponent like you it's not enough config.


The way to satisfy both camps is sane defaults.


“A good compromise is when both parties are dissatisfied”


A bad compromise leaves both parties dissatisfied, too.


Fargate is a very useful service for anyone hopping on the serverless/lambda train.

Here's a simple use case: say you have a simple banking website with a frontend and an API and whatever. People log in and check their balances, etc. Typically, this requires requests that take on the order of milliseconds, so lambda works for this [0]. However, every month, you want to go in and calculate what you owe in interest for everyone's account. Say that you have 100,000 users and it takes 0.01 seconds to do each calculation, so this job will take 1,000 seconds, or 16 minutes to run. Lambda functions are automatically killed after 15 seconds, so that won't work.

That's where Fargate comes in. You dockerize your environment and then run a command in Fargate. Now, it runs for 16 minutes for the task itself + 1 or 2 minutes to wind up and then it automatically kills itself when it's done. You pay for the 16 minutes of run time and call it a day.

[0] One issue with lambda/serverless is dealing with cold starts when your application hasn't been used in a while, or you need more concurrent instances running, so it takes some time to "warm up". This is a good reason not to use serverless frameworks in many cases, but I won't get into that.


Mild correction, but AWS lambdas are limited to 15 minutes of execution time as of October 2018 [0]. Prior to that, they were limited to 5 minutes.

> One issue with lambda/serverless is... when you need more concurrent instances running...

Lambda can have 1,000-3,000 concurrent burst invocations for most of the popular regions and 500 for others[1]. Are you saying that's not enough?

[0] https://aws.amazon.com/about-aws/whats-new/2018/10/aws-lambd...

[1] https://docs.aws.amazon.com/lambda/latest/dg/scaling.html


> Mild correction, but AWS lambdas are limited to 15 minutes of execution

Whoops that's what I meant

> Lambda can have 1,000-3,000 concurrent burst invocations for most of the popular regions and 500 for others[1]. Are you saying that's not enough?

No, that's plenty for most use cases. I'm referring to "cold starts". When you haven't used a lambda function in a while, it takes a few seconds for it to get going. But when you hit it a second time, it responds more quickly.


I've run into this problem. But I believe the parent was referring to the situation where new lambdas spin up and aren't immediately ready to respond to invocations. The delay can be significant in some cases though almost immeasurable in aggregate.


3000 is not really all that many, given the prime use cases for Lambdas. Though at that point you'll probably be running into lots of other issues as well.


I recently had a use-case where that wasn't enough, and AWS would not give us enough concurrent invocations either.


Wouldn’t it be better to break your tasks into smaller, discrete units instead of a 15 minute job, such as using a message queue (SQS)?


Yes, that's ideal. But:

* Sometimes it's not efficient to break a task down or it simply can't be done. * Sometimes it just makes the code hard to maintain if breaking it down into different components is a difficult task

An example I can think of is loading a very large CSV file into a database. Lambda doesn't have enough memory to read the whole file and even if it did, it might take more than 16 minutes to run. I can try to break it out into multiple tasks and it'll probably be faster, but reading that code is (generally) gonna be much harder.


If your unit of work is a transaction that takes 15+ minutes that's not possible. A common example is nightly ETL jobs that update the world.


That’s not a good use case for lambda.


Well duh. That's why the top level comment is saying, "That's where Fargate comes in...".


> If your unit of work is a transaction that takes 15+ minutes that's not possible. A common example is nightly ETL jobs that update the world.

So what are you suggesting here?


How is it different from k8s jobs and cronjobs?


I really dont get the hate or disinterest with ECS or fargate.

It took me all of 4 days to come up with a CI/CD pipeline/production environment using ECS and fargate for the startup I was the cto for.

We are now looking at kuberneties or reserved instances to save some money as we have scaled to angel->seed->series A. Though tbh its looking like the extra cost is worth it given the out of the box 1 click scaling, blue/green deployments and "conatiner as a server" level monitoring and logging.


I have a CI/CD driven AWS Blueprint for Fargate - with a bunch of other production grade best practices built in. Should get you up and running in under an hour (much less if your experienced with AWS). https://github.com/rynop/aws-blueprint


Humans are more expensive than infrastructure!


What took you 4 days takes seconds using tools like Jenkins X and Kubernetes. Just sayin...


If you don't know Jenkins or Kubernetes it definitely will not take you seconds to learn them, spin them up, configure them, and put them into production.


We use Fargate a lot for internal tooling. It's pretty great! Being able to schedule containers instead of Lambdas is nice, and genreally not having the restrictions of a Lambda environment is nice.

We have, no joke, Lambdas that kick off Fargate tasks to do "actual work". Why do we have Lambdas? Because that's the integration point AWS has everywhere. "Run code when X happens" on AWS means a Lambda, so that's what we use.


I think that this should be the only use case for lambdas. They should respond to events and act as the glue between aws services. If you're doing anything that requires a running time of over 200ms you should be using a proper app server, probably inside a docker container.


I have on my todo list a lambda function that will get invoked from bitbucket and will kick off a bunch of tasks. Then another one that can be called multiple times and let you know if those tasks finished and the exit code.


Fargate is pretty useful and it works well... when it works. One of the most annoying things is that you can't (as far as I know and I hope someone will correct me and show me the light) see the errors from the docker engine. This means that if there is some issue while spinning up your docker instances that you'd normally see in the docker engine's log, you won't know what is going on when using fargate. This makes debugging these issue a nightmare.

Again, I hope I am just stupid and that there is some obvious way with which I CAN see the logs and someone will point out to me how.


If your container feels to even run, then no logs are generated. Logs are only generated after your container spawns. So if you get errors in your `ENTRYPOINT` or `CMD` commands, the only way is to override these commands with something like [1] and debug why it's failing.

[1]: `command: ["/bin/sh", "-c", "sleep 36000"]`


That is what I thought. How would your example command allow me to debug what is failing? Because the container will then spawn and allow me to enter it via ssh?


"tail -f /dev/null"


That is a nice trick.


I don't think I've ever seen an error from the "docker engine" on AWS Fargate. There is a task status field on the task, so if your container completely fails to start because you have a bad entrypoint command that can't be executed or your process immediately exits with a non zero exit code without outputting anything to stdout/stderr you will see a task status that says that the task stopped unexpectedly because the container exited, and why the container exited.

If your containerized process does start and output to stdout/stderr, but then crashes for some reason you can see those stdout/stderr logs in CloudWatch, via the awslogs logging driver.

In general I'd say the docker engine logs are the wrong place to try to diagnose an app running AWS Fargate. The point of AWS Fargate is to abstract away the Docker daemon, and just run your containers for you. So if the Docker engine itself was failing for some reason that is an issue for the AWS team on call to fix. Your area of responsibility starts with the application code that is running inside your container. The bits outside the container are the responsibility of an AWS employee.


In practice there are situations where you do want to see the error logs about your containers. I can't think of an example right now, but sometimes container configurations can be the cause of errors and seeing the logs is essential in not spending hours on investigating.


With AWS Fargate you can't interact directly with Docker in any way, you just use Fargate via an AWS API, and Fargate handles the rest.

So we do validation in the task definition you supply via API to make sure that things check out before AWS Fargate launches the container. So you shouldn't see a "container configuration" based error because our API catches config issues first and won't let you create a task definition that is invalid. If there was such an error that caused Fargate to construct a bad container run command and fail to launch the container that would be a missing validation or bug inside our code and we'd fix that.

If you do find something like this please reach out to me or open an AWS support ticket for the oncall team to look at, and we'll fix it!


One thing I'm keeping my eye on is the fargate virtual kubelet [1]. It seems to offer a way to use the familiar kubernetes tooling with a managed "clusterless" offering like fargate.

[1] https://github.com/virtual-kubelet/aws-fargate


Why deal with the complexity of one when you can have both!

(/s...)


Small note if OP is author or author is in the thread: There are two typo's of the word "Firecracker" spelled as "Firecrracker". It's spelled correctly in between and right after the second misspelling.

> Under the hood, they share the same virtualization technology called Firecrracker. Firecracker is a KVM-based virtualization layer that creates and manages minimalistic container-like virtual machines called “microVMs”. Firecrracker is still relatively new and was designed from the ground up to address issues identified with the previous generation of AWS Lambda.


Fixed, thanks for the heads up!


My experience with Fargate:

I had a Node/Express microservice running in lambda using the lambda proxy integration to send and receive files from/to S3.

I knew in advance that the 6MB request limit for lambda was going to hit us, but it was good enough for what we needed right then.

When it came time, I knew I was going to have to convert to Fargate but I didn’t know anything about Docker or Fargate.

It took less than a day for me to get the API up and running in Fargate following this tutorial.

https://medium.com/@ariklevliber/aws-fargate-from-start-to-f...

And automating it with Cloud Formation

https://github.com/1Strategy/fargate-cloudformation-example

To be fair, I did have experience with the other fiddly bits of AWS and CloudFormation so I knew how to modify the template to work in our own environment.


I stopped reading when the author claimed Fargate gave Kubernetes a run for it's money, which is blatantly false. They don't even occupy the same niche. Also, no, shitty lambda for containers does not give a hyperscaling container / virtual networking / etc. orchestrator designed to manage millions of persistent containers a "run for its money"


What's the Azure equivalent of Fargate? I have to migrate a bunch of small containerised PHP apps from AWS ECS into Azure, but while ECS has been easy, I'm sceptical of our ability to work with Kubernetes. I've looked at App Service, but I don't know anyone that's had experience.

If anyone has any feedback or advice I'd happily hear it.


There's also a service called Azure ACI which is pretty much the same thing as fargate


(disclaimer: I work for Azure on ACI and other container offerings) Azure Container Instances (ACI) is what you're looking for. The intent behind ACI is to provide the easiest/simplest way for you to deploy serverless containers in Azure (no management of underlying infrastructure). See this to get started: https://docs.microsoft.com/en-us/azure/container-instances/c....


Not sure there is a direct equivalent but Azure App Service will let you deploy a docker image and autoscale. The configuration is more analogous to AWS Elastic Beanstalk but they serve similar purposes.


Approachability-wise, wouldn't an even better approach be to drop ECS Fargate (in terms of API) and simply add to EC2 the ability to execute Docker images, instead of AMIs?

Feels like if AWS started with dedicated boxes, they'd now be calling EC2 the "Elastic VM Service Fargate".


Running Docker images directly on EC2 instances is also supported by ECS, in fact it the first iteration of ECS required you to run a cluster of EC2 instances. The API to do so is identical to the one used for Fargate.


Right but with ECS on EC2 you are still responsible for maintaining the underlying AMI. Security patches etc.

This was the major reason we wanted to move to Fargate in our org.


> I’ll also introduce you to a CLI tool that is Fargate’s equivalent to Kubernetes’ kubectl that can take you from a Dockerfile to a running web service in just two commands.

Is anyone aware of this tool? Is it something I can check out now, or do I have to wait for the next installment? This sounds compelling, and even if I don't want to use Fargate, it will help me to relate with people that have used Fargate, and give us more common ground so we can talk on equal footing.

(Subscribe me to your newsletter!)


Author here, the CLI tool is: https://github.com/jpignata/fargate

Part 2 is still planned and is now being fast tracked since there's interest in Fargate.


Thanks, this is awesome!

I am not a fan of Fargate based on personal bias mostly towards K8s and a sour taste I got when Fargate and EKS were first announced together, (with Fargate "available now" and EKS coming soon, it felt very desperate, but that's neither here nor there...)

Was very impressed by this screencast and will definitely try Fargate now thanks to your work here.

When I got to the point in the screencast where you build the image locally, I was on the edge of my seat hoping it would include some CodeBuild setup automation too; maybe you have already planned that, (or maybe that would ruin its simplicity.) That was a great demo, even without this part.


AWS Fargate seems very similar to what Hyper.sh used to offer (before they shutdown)


are you implying Fargate will suffer the same fate?


Iterations are still crazy slow compared to AWS Lambda. :(

https://s.natalian.org/2019-08-15/cdk-containers.mp4


AWS Serverless Community Hero and CTO/Co-founder of IOpipe here.

AMA!


You can run pure ECS without using Fargate, all it takes is to write a bit of extra Cloudformation code. Worked for me, with autoscaling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: