Building blocks of Amazon ECS

ajsharp · on Jan 24, 2018

"Let's begin with an analogy"

[4 paragraphs of fantasy explorer game references]

All I've learned is my brain hurts.

kinkrtyavimoodh · on Jan 24, 2018

Yeah seriously what the hell.

And it's not as if the game references are universals.

All this obfuscation for an obtuse and needless analogy.

linkmotif · on Jan 25, 2018

This is like the whole GraphQL Star Wars schtick. I really don’t enjoy Star Wars, so reading their documentation was a total drag.

RexM · on Jan 25, 2018

There are dozens of us!

linkmotif · on Jan 25, 2018

So good to know. <3

icebraining · on Jan 25, 2018

GraphQL just uses it as placeholder data, no? I don't think you have to know or understand anything about SW to read the docs.

DonHopkins · on Jan 25, 2018

Cringing in anticipation of the freemium / in app purchase analogy.

discordance · on Jan 25, 2018

Terrible doc writing.

'For your territory to be a part of Royaume, the land ordinance requires construction of a building (container), specifically a castle, from which your territory’s lord (agent)* rules.'

Zaheer · on Jan 25, 2018

It's a blog post and not docs. I don't think it's unreasonable to spice up blog posts whereas docs should be straight-up.

avemg · on Jan 25, 2018

Of course it's not unreasonable to do that. The problem is that the analogy was a terrible one.

tiffanyfj · on Jan 26, 2018

Suggestions on how to improve it?

danial · on Jan 25, 2018

The author is trying to have some fun. Perhaps she’s trying to empathize with a beginner getting their head around what ECS is.

We all have different perspectives because the lenses with which we see the world are different. It’s possible you didn’t enjoy it because you’re not avid fantasy explorer gamer. Maybe you didn’t need an introduction to what ECS is.

wutbrodo · on Jan 25, 2018

> Perhaps she’s trying to empathize with a beginner getting their head around what ECS is.

I think the complaint was that it made it _harder_ to understand. Analogies are helpful if the relationships they describe are already well-known, but just dropping _Royaume_ into a sentence is dumb and hurts readability (instead of, e.g., sticking with "kingdom", from the analogy).

tiffanyfj · on Jan 26, 2018

Suggestions on how to improve it? Whether it's kingdom or not, it's just a cluster name which can be anything.

paulddraper · on Jan 25, 2018

If you insist on analogies, just say "ECS is like a burrito" and get it over with.

_zachs · on Jan 26, 2018

I know right? I can't believe this article was approved with that analogy.

tiffanyfj · on Jan 26, 2018

Suggestions on how to improve it?

yeukhon · on Jan 25, 2018

I think a video could have been a better choice...

dionian · on Jan 25, 2018

Even if i skipped the spaceship whole analogy section (I did), all in all really glad to see more easily accessible explanations for AWS technology than the main documentation. I love how approachable this is, and the diagrams help.

nikkwong · on Jan 25, 2018

What this post doesn't explain well is the benefit of using ECS rather than just uploading a docker image to an EC2 instance and running it.

NathanKP · on Jan 25, 2018

Using a container orchestration system gives you far more resilience than just running a docker image on an EC2 instance.

When you run a docker image on an EC2 instance if that docker image crashes what will restart it? What if the EC2 instance itself fails?

With ECS you give the container orchestrator an intent:

"Run X copies of this container at all times, associated with this load balancer"

Then you give the container orchestrator a place to launch stuff: either a cluster of EC2 instances, or AWS Fargate.

Then the orchestrator continuously validates the state of your deployment and relaunches or moves containers as needed to serve your intent. You can even run spot instances to get up to 80% savings on your AWS bill, and if you get outbid on the spot market the container orchestration layer will move your containers to another machine.

That in a nutshell is the benefit: the orchestrator lives outside of your instance and prevents the instance from being a failure point in your system, by separating your intent to run an application from the underlying resources that actually run it.

sudhirj · on Jan 25, 2018

Even the smallest EC2 images can run multiple containers if your containers are small enough, and either way it makes sense to add a level of abstraction over your pool of EC2 instances - i.e. treat them as pool as opposed a set of discrete blocks. ECS lets you specify exactly what resources each container needs, and then lets you make your pool shrink or grow as necessary. The pool can also be a heterogenous mixture of instance types, even a combination of on premise (?) ones. And could also be spot instances, which are way cheaper. Could also be whichever spot instance types happen to be cheapest at the moment.

So yeah, being able to pool your EC2 instances is useful.

matwood · on Jan 25, 2018

> And could also be spot instances, which are way cheaper.

This what we are currently using ECS for. Keep a set of on demand instances to handle a base load and scale out on spots. Any compute that is not time dependent is on spots 100%.

We have been using AWS for a long time, and wrote/still have a system to scale out instances with AMIs. This system is heavy weight though because multiple AMIs cannot share an instance.

Kubs is our next step.

jdelman · on Jan 24, 2018

I've had a lot of good experiences with ECS so far, except for their scheduled tasks system. With cron, you check /var/mail. When a scheduled task doesn't run on ECS... you're SOL.

NathanKP · on Jan 25, 2018

The scheduled tasks system is built on top of Cloudwatch Events, which is asynchronous system. You would need to setup a rule in Cloudwatch Events to watch for task events that indicate that a task stopped with a nonzero exit code, or that a task was unable to be placed.

It's definitely more complicated than just checking /var/mail but with the full set of features enabled you can get visibility:

- If the task fails to be placed, or stops with a non zero exit code CloudWatch event rule for the task event triggers allowing you to respond to that as you see fit: email someone? trigger a PagerDuty? - If the task logging was enabled you can go to the CloudWatch log stream for the task and see the actual stdout from the container and see why it had crashed.

dguo · on Jan 25, 2018

I ran into this problem a few weeks ago and couldn't believe how poor the error handling is. I eventually found the FailedInvocations graph in CloudWatch, but I didn't see any way to actually find out what went wrong.

jdelman · on Jan 29, 2018

How did you solve it?

dguo · on Jan 30, 2018

I haven't yet. It wasn't a priority, so I stopped looking into it. I did find this forum thread though: https://forums.aws.amazon.com/thread.jspa?threadID=269884

I tried tweeting @AWSSupport, but they just referred me to the scheduled tasks docs that I had already read.

I suspect the issue is that I set up my tasks to run in Fargate mode, which I didn't realize at the time was brand new. Maybe it's not compatible with scheduled tasks yet.

markplindsay · on Jan 25, 2018

I haven’t used ECS scheduled tasks, only the Elastic Beanstalk equivalent (which I don’t like very much).

For my “worker” ECS containers, I set them up to run their tasks using an app-level scheduling library. When a task finishes, the app phones home an event to Datadog. If Datadog doesn’t get notified within a certain amount of time, I get an email. So far it’s proven to be pretty reliable.

I’m sure similar functionality exists somewhere in AWS, but DD is so easy and we use it for a bunch of other stuff anyway.

emilburzo · on Jan 25, 2018

It sounds like you'd benefit from AWS Batch better for the scheduled tasks.

After everything is up and running, all you need to do is submit a job (from a lambda function, CLI, whatever works best for you).

Then, in cloudwatch event rules, you can setup a new rule that only triggers on failed jobs, which can then have a SNS/Lambda target.

Or, alternatively, setup a cloudwatch alarm on the actual event rule invocations.

bdcravens · on Jan 25, 2018

Do failed scheduled tasks not show up in the container's Cloudwatch logs?

jdelman · on Jan 25, 2018

It's a failed CloudEvents invocation, which AFAIK means the event doesn't even hit the container/instance. There's some kind of permission that isn't set up right, but I've no idea what it is.

bdcravens · on Jan 25, 2018

Likely an issue with ecsInstanceRole (either not assigning it to instance, or an API call that role can't access)

drewjaja · on Jan 25, 2018

For those looking for a more in-depth explanation of Amazon ECS, I highly recommend the videos here https://awsdevops.io/p/hitchhikers-video-guide-aws-docker

hardwaresofton · on Jan 25, 2018

Why even use ECS when you can just start an Elastic Beanstalk docker-based cluster, and get superior, more focused web UI (at least the UI for EB is superior at present) along with much easier configuration?

If I sound bitter, it's because I am -- I recently spent about 2 days straight trying to build an ECS cluster reliably with CloudFormation, and while I must admit I was newer than normal to CloudFormation templates, the amount of errors I ran into (and incredibly slow provision/creation/update time), along with the broken edge cases (try and make an existing EC2 bucket with CloudFormation) was infuriating. Don't read any further if you don't want to read a rant.

While ECS is great in theory (or if you set it up by clicking tons of buttons on AWS's UI), it's significantly harder to automatically deploy than BeanStalk is, from where I'm standing. All I have to do is get a container with the eb command line tool installed, do a bunch of configuration finangling (which honestly is ridiculous as well, just let me give you a file with all my configuration in it, or read it from ENV, and stop trying to provide interactive CLIs for everything, for fucks sake) and I can run `eb deploy` and push a zip file with nothing but a docker container (muilti container setups are also allowed) specified up to the cloud. Later I'm going to look into using CloudFormation to set up EB, but I know even that will be simpler, because I won't have to try and make any ECS::Service objects or ES::TaskDefinitions.

Trying to use ECS made me so glad that Kubernetes exists. Unfortunately in my work I can't use it currently because that would mean the rest of the team stopping to learn it, but Cloudformation + ECS is a super shitty version of setting up a kubernetes cluster and using a kubernetes resource config file. I think the best part about kubernetes is that if the community keeps going the way it's going, cloud providers will finally be reduced to the status of hardware vendors -- all I'll need from EC2 is an (overpriced) machine, network access, and an IP, not to try and keep up with it's sometimes asinine contraptions and processes, or be locked in to their ecosystem.

boogiewoogie · on Jan 25, 2018

My understanding is ECS can run multiple tasks and services while a beanstalk just runs one.

hardwaresofton · on Jan 26, 2018

Yeah maybe it just wasn't for me -- with RDS as the database and ElastiCache for redis, all we needed was to run the API server (one container, we don't even have any like queue job workers or anything). When I started looking into how to run containers on AWS ECS seemed like the best fit.

santoriv · on Jan 25, 2018

Sort of. You can actually do a multi-container deployment in Elastic Beanstalk. I did that the other day with 1 EC2 instance and it deployed both containers to the same instance. I don't know what the behavior is if you have multiple containers and multiple VMs though.

One of the other annoying limitations of a single container deployment on Elastic Beanstalk is that you can only open a single port to the docker container - which is really problematic in a lot of situations.

pmelendez · on Jan 25, 2018

I am probably wrong but it sounds your problems were mostly with Cloudformation. But I think there are options for the provision, you could use the cli or you could use third party tools like terraform.

I haven’t tried it but maybe Fargate would it be an option?

hardwaresofton · on Jan 26, 2018

I partly considered not writing the original post because this thought crossed my mind as well -- I think that it IS a problem with ECS because just about every other S3 object was easy to create and consistent when I would apply the configurations. The ECS Service just seemed to be less stable.

Also, I was using the AWS code SDK to build a little deploy script and it actualy worked really great. The reason I can't use a third party tool like terraform is because I can't force the rest of the team to learn about it (similarly why I can't use kubernetes)

empath75 · on Jan 25, 2018

Once you have a pipeline built for creating and deploying cloudformation templates, it gets a lot easier.

If you’re building something in aws using the web ui, either you’re doing something on a very small scale or you are fucking up.

hardwaresofton · on Jan 25, 2018

Right, that's what I mean -- I use the web ui the first time to get the feel for the service (in this case EC2), then try to make it permanent with CloudFormation, prompting a deep dive into the reams of documentation and CloudFormation API docs. My problem was the amount of surprises that I came into contact with trying to write a consistently working CloudFormation for ECS. I wrote a CloudFormation config that would succeed, then I'd shut it down, then it'd fail -- that happening absolutely decimated any trust I had for the effectiveness of the orchestration of an ECS.

Out of all the pieces that were being spun up/down, the ECS Service was the shakiest/hardest to model in my own mind -- every time I got an error from it it would make me scratch my head, even looking at the created instance in the web UI. Maybe this is all just attributable to lack of operator knowledge, but I don't think so because at this point I know _exactly_ the EC2 objects I want to make, but after my experice with trying to set up ECS I'd rather not try again. The documentation is extensive though, so I am very thankful to AWS for that.

Also, of course, all of this takes a pretty long time -- though to be fair, AWS is instantiating AMIs in this time, much harder than just spinning up a container like a kubernetes cluster would have to do.

bpicolo · on Jan 25, 2018

That said, I don't think it's unreasonable to want the UI to be good, it's just the case that for AWS it typically is not, and it doesn't lend itself to creating good cross-region, immutable-style infrastructure. I see no reason why the UI couldn't theoretically be good enough here for most typical use pieces though. I guess that just goes against AWS's approach of giving you every building block imaginable rather than a coherent platform.

ravenstine · on Jan 25, 2018

ECS is pretty nice. I wish they did what Rancher did by allowing docker-compose.yml and an extra YML file for Rancher-specific setup.

bdcravens · on Jan 25, 2018

ecs-cli let's you do that, where you specify a compose file that's mostly compatible with docker-compose.yml, and an additional yml file for ECS-specific parameters:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/...

mcheshier · on Jan 25, 2018

Except it doesn't work with anything complicated, like volumes, in my experience.

dsmithatx · on Jan 25, 2018

We use one docker-compose.yml for local dev and another for deploying to ECS. Locally the docker-compose.yml configures volumes, with ECS we do it in the task definition. For shared volumes to gather logs we are using EFS which seems to work and mount in the same way as NFS.

fapjacks · on Jan 25, 2018

So many things are janky in ECS that it ends up costing too much time and energy to get anything done. I've posted at length before about all the specific problems I hit while building on ECS, but lately I've just been telling people that it's like that janky toolbox in the funny old "PHP is a fractal of bad design" blog post [0]. Of all the AWS services I ever used, I find ECS by far the most terrible. Even worse than Amazon's documentation. So this really stunted and awkward spaceship analogy is pretty hilarious, given that it's about ECS.

[0] https://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/

sudhirj · on Jan 25, 2018

That’s probably why EKS is well under way. ECS was built at a time when there was no clear winner in the orchestration game. Now there is. AWS pretty much admitted that they would throw themselves behind hosted Kubernetes going forward.

NathanKP · on Jan 25, 2018

AWS employee on the container services team here.

We are definitely building out a hosted Kubernetes solution with EKS, but its important to realize that Kubernetes is just one part of the container services ecosystem at AWS, living alongside ECS.

Just like we have multiple database solutions (DynamoDB, MySQL/Postgres in RDS, Aurora, Redshift, etc) we also have multiple orchestrators (EKS managed Kubernetes and ECS).

ECS is like DynamoDB: it is the AWS native solution to the problem. Built to be incredibly scalable, and built for AWS patterns and best practices that we know will scale for our customers.

EKS is like RDS: its the AWS managed hosting for the open source software, designed to solve some of your scaling and configuration problems, and it gives you more flexibility, but that flexibility also gives you the power to do things that may not scale as well.

In ECS you may not be able to do everything you can do with Kubernetes (just like DynamoDB limits what you can do compared to the capabilities of SQL), but just like DynamoDB can scale far past SQL because of its limits, ECS is more scalable than Kubernetes because of its limits.

You can run a much larger cluster in ECS than you can in Kubernetes. We would actually still recommend to our largest customers to use ECS instead of EKS if they have any significant scale. The thing to realize is that ECS is a multi tenant control plane: it already keeps track of every single container on every host in every cluster from every ECS customer per region, and that scale is tremendous, far larger than anything Kubernetes can do. For reference at re:Invent we shared that we have millions of EC2 instances under ECS management each month, organized into more than 100,000 customer clusters, and we launch hundreds of millions of containers every week.

With ECS we can add a new customer with >2k instances into the multi tenant control plane with ease. On the other hand configuring and managing Kubernetes to handle this is a fun challenge. (https://blog.openai.com/scaling-kubernetes-to-2500-nodes/) For a lot of orgs on AWS they would rather have the boring, simple power of ECS.

Anyway I hope this provides some perspective on it. We are excited about Kubernetes and excited to offer EKS, but ECS is still our recommendation for the largest customers.

fapjacks · on Jan 25, 2018

But with the inefficiencies and limitations of ECS versus something like Kube or Swarm or whatever, you're scaling your costs way up, too. Also just preemptively I realize I came off pretty hostile in my parent comment and I want to apologize and say it's not meant personally against anybody working on ECS. Having interacted with ECS engineers from quite early on in the lifetime of ECS, I could tell that you folks were under a lot of pressure to get something out the door as quickly as possible.

NathanKP · on Jan 26, 2018

No problem! I'd love to hear more from you about which aspects of ECS you find most inefficient or limiting though, because obviously one of our goals is for ECS to reduce your costs, that's why it is a free service after all.

Every bit of feedback we get from the community helps us make the service better. Please email me at peckn@amazon.com (and anyone else who reads this and wants to chat feel free to as well).

dkobran · on Jan 25, 2018

This makes me want to die

cagenut · on Jan 25, 2018

I happen to be working on this and since there's a thread why not ask...

Has anyone gotten "multi-tenant" (default-restricted) overlay network(s) working on ECS in a way that they like?