"All problems in computer science can be solved by another level of indirection" - David Wheeler
That's what "containers" are, of course. There's so much state in OS file namespaces that running any complex program requires "installation" first. That's such a mess that virtual machines were created to allow a custom OS environment for a program. Then that turned into a mess, with, for example, a large number of canned AWS instances to choose from. So now we have another level of indirection, "containers".
Next I expect we'll have container logistics management startups. These will store your container in a cloud-based "warehouse", and will continuously take bids for container execution resources. Containers will be automatically moved around from Amazon to Google to Rackspace, etc. depending on who's offering the lowest bid right now.
It's more like a ping-pong. Things start off simply, but over time as the layers of abstraction pile up, things become brittle and unworkable.
I view containers as more of a reworking of a key computational abstraction (VMs) than an evolution of them. We finally have operating systems with enough inter-process isolation, sufficiently capable filesystems (layering), etc. that we can throw out 80% of the other unnecessary junk of VMs like second kernels, duplicate schedulers, endless duplication of standard system libraries, etc.
So it's more like we've hacked/refactored virtualization into a more usable state, and gotten rid of a lot of useless garbage that it turns out we didn't actually need. It's a lot like how a big software system evolves, now that I think about it.
I'm genuinely curious, although a bit naive WRT containers. Outside of an aesthetic preference (for being able to remove 80% unnecessary cruft), what is the advantage of containers? I was under the impression that VM overhead was marginal in terms of today's computing.
I ask because I'm familiar with VMs, having worked with them extensively for a number of years. VMs work quite well for any application I've needed, so what would be the benefit of switching to containers? I've got lots to do, and lots to learn, but I can't see learning containers (and being out of sync with the rest of my coworkers) being a priority.
But I'm willing to change my mind if there's a concrete benefit. Right now, VMs work just fine, but maybe there's something I'm missing...
VM overhead isn't trivial. It still remains a pretty big factor in terms of cost bloat for CPU-bound stuff. Also, VMs take a godawful long time to start up; if you care about, say, responding to load within ten seconds, VMs aren't a great choice.
They're fine for a lot of things, of course. I use them all the time. But I use containers for other things.
I recall several reliable testers confirming that the CPU overhead of virtualisation was negligible, somewhere around 2%. Unfortunately I could not quickly find those papers now, but I did find a old VMWare whitepaper[1] showing they had ~7% overheard 5+ years ago, which sounds about right considering what kind of advancements they would have made in half a decade.
Sounds feasible, but CPU usage isn't really talked about as an advantage of containers.
I expect startup time and memory usage would be lower, but to my mind the advantages are mainly around flexibility... e.g. How long it takes to create or upload an image file. How long it takes to set up a minimal infrastructure with several components to it on a single EC2 instance. Decoupling the operating system patch cycle from the app deployment image generation cycle. etc.
It's just MUCH more memroy efficient to run containers and also VMs typically have worse I/O throughput.
CPU scores are fine though.
As an example i am running around 20 containerized servers on my Laptop in a 4GB VM which would typically be run on 20 distinct VMs on one or more hypervisors. It's not very fast but the density of servers you can put on your hardware is MUCH bigger.
Ah sorry! I didn't think you meant literally "10 seconds", was assuming you just meant quickly (a few minutes).
I can't really think of a use case though where someone would need more capacity in sub 10 seconds. Maybe if you only intend to scale horizontally with a bunch of 500Mb instances and had little to no room to set an appropriate scaling threshold? What would be a couple examples? With the apps I've seen the past several years generally they have scaling thresholds at 'X' resource and 3 minutes is more than enough to provision extra capacity for their needs.
Containers are just a way to launch threads without polluting your local namespace or system. It's a way to say "hey, this stuff shouldn't interfere with anything else".
Well, we've had various containers such as BSD jails, for decades. The useless garbage wasn't necessary. Seems like ping pong happens whenever "kids these days" don't know why the status is quo then have to relearn the old lessons.
IMO, the problem is that your standard OS has way too much stuff running.
A SaaS app running in production should be about the size of your binary, and the libraries it uses. Instead, we have X, smtp, terminals and a full filesystem running. home directories and uids make no sense in an app that uses no unix users except for the one you're forced to use.
I'd really like to see a much smaller, simpler, non-posix OS for running server apps.
"OSv was designed from the ground up to execute a single application on top of a hypervisor... OSv... runs unmodified Linux applications (most of Linux's ABI is supported) and in particular can run an unmodified JVM, and applications built on top of one."
I'd really like to see a much smaller, simpler, non-POSIX OS for running server apps.
The POSIX system interfaces (read, write, open, close, etc.) are OK. It's the Commands and Utilities that are the problem. Do you really need Bash available? How much of the 50,000,000 lines of Linux need to be inside your VM running your one web application? How much attack service is provided by the presence of all that stuff?
There's a project which has taken the C runtime library and made it run on a bare VM, so you don't need an OS instance at all. If you're just running one program, that makes a lot of sense.
This doesn't really pencil out..."your binary, and the libraries it uses" can easily get into the GB when you include components like the .NET framework or java base class library. I don't know exactly how large a fully-loaded NPM repo with warm cache or warmed-up rvm installation directory are, but it isn't tiny.
Second, POSIX is a standard for how the operating system API works that has nothing to do with what packages are installed -- and it's a pretty low-level API, for doing stuff like read, write, fork, exec, etc. This isn't what's adding bloat.
In this case, the problem isn't being solved -- solving the problem would mean moving away from dependencies on the global OS namespace by relearning how to write self-contained applications (some people never forgot).
Containers are just a big wad of duct tape holding together the ball of mud that comprises most web applications' server-side components.
Add containers, and you haven't solved the problem, you've just made two problems.
It's great for those nasty legacy apps that only work on old unmaintained versions of Rails or old OS Versions etc.
Take all the nastiness and throw it into a box, without needing to contact Ops to reserve memory and provision a VM.
IMO, it's one of the major reasons why Enterprises get so excited about Docker. Legacy app dependency issues are horrible once you get past a certain scale.
VM's are expensive and non-self-service at most orgs since they tie up RAM and licenses.
Functions are just a big wad of duct tape holding together the ball of mud that comprises most applications' lines of code. Add functions, and you haven't solved the problem, you've just made two problems.
Does this sound true to you? What makes containers any different from the organization that the abstraction of "functions" bring to ordinary sequential programs?
Functions (should) abstract over irreducible complexity.
As long as we're asking hypotheticals — why do applications need to control the global OS namespace and the dependencies between elements in that namespace to a degree that the applications themselves can't be easily deployed without containers?
It's not really adding another level of indirection, it's taking one away. The pain of change remains in that you have to internalize yet another new layer, BUT at least this way you get to leave VMs behind. It's trading one layer for another slightly more granular one instead of piling another one on top.
Docker in general is just another swing of the granularity pendulum. Since the rise of distributed environments in the late 1980s, the pendulum has swung back and forth between microservices (which become a version control tangle as they move independently) and monolithic applications (which become a bloatware problem as they have whole kitchen sinks to move around). The core problem is that software is complex, and at a certain level, you can't take complexity away - just push it around here and there. A large number of small pieces, or a small number of large pieces. Which kneecap do you want shot in?
After a few years of trending toward monoliths via chef/puppet/ansible DevOps automation, Docker is going in a different direction, toward fragmented SOA. It'll go that way for a while until it becomes too painful, and then new tech will come to push us back to the monolithic approach, until that hurts too much...
The good thing is, these cycles come in response to improvements in technology and performance. Our tools get better all the time, and configuration management struggles to keep up. It's awesome! Docker will rule for a while and then be passed by in favor of something new, but it'll leave a permanent mark, just as Chef did, and Maven, and Subversion, and Ant, and Make, and CVS, and every other game-changer.
Security-wise, if I understand correctly, this is a very interesting offering.
1. The containers live on "your" VMs so you get the isolation of a virtual machine and do not worry about the other tenants' containers.
2. The VMs are part of a "private cloud", i.e., the internal network is not accessible by other tenants' VMs and containers.
#2 is what worried me the most in other container service offerings. It's easy to overlook protecting your internal ip when you manage VMs, it's even easier (and expected) when you deploy containers.
I'm here at AWS reinvent and just saw the EC2 Container Service presentation. They specifically targeted security as part of their design.
Basically, you launch a cluster of EC2 instances that are "available" for containers to launch into. So these are your instances, running in your VPCs. It's really the same security profile as the standard VPCs plus any other security issues your particular docker containers expose.
Digital Ocean has something called "Private Networking" that's internal to the data center but shared with all other customers. It's not obvious from reading the website that this is the case.
I'm disappointed that this requires an invite, particularly so close after Container Engine which I was able to try out immediately while still watching Cloud Platform Live the other day.
Is this typical for new AWS offerings?
It makes me wonder if it's something that truly isn't ready for prime time, but is being rushed / forced by the mounting Docker hype and GKE announcement.
Considering they've been tweeting about it [1] since before their competitors announced things I'd say it's unlikely to be a "response". It's far more likely that Docker has now been out long enough for the various providers to build services around it. AWS already had some docker support built in in April [2]. It's also pretty common to release services as previews. GCE lists theirs as an Alpha quality product.
Given that kubernetes (the project behind GCE) was open sourced in early June, I hardly think a tweet from a week and a half ago shows it's not a response to Google.
He also mentions the elastic beanstalk support for Docker from April. It's quite obvious that everyone has been working on Docker support for a while now anyway.
According to one of the AWS devs, they plan to start honoring invite requests in about 2 - 4 weeks. It appears to be in preview right now mostly b/c the loose ends aren't tied up yet. For example, in their demo today, they launched EC2 instances in a cluster using an AMI that's specially enabled for the EC2 Container Service but which is not yet publicly available.
Anyone have any insight about if this handles service discovery? It claims "cluster management" which usually means discovery, but there is no mention of it. Maybe Amazon is expecting you to handle that?
I was wondering this as well. It seems that they will provide for constraints around co-located containers (similar to pods in Kubernetes) but I'm not sure how discovery for containers scheduled across hosts is meant to take place.
...including the Docker repository and image, memory and CPU requirements, and how the containers are linked to each other. You can launch as many tasks as you want from a single task definition file that you can register with the service.
Very few details but it looks like container lifting across hosts. If so this is great news.
Yes but there are a lot of ways that "the containers are linked together" could be implemented and some of them e.g. key value store require modifying application code quite a bit whereas e.g. DNS does not.
Wasn't there just an AWS announcement yesterday about the ability to register VPC-private DNS records in Route 53? It screamed "SkyDNS competitor" to me but I couldn't figure out what Amazon wanted such a thing for. Makes sense now.
Route 53 launched private (vpc) dns last week. Its actually a common pattern to manage ec2 instances via dns records. Many people had built this on top of the public route53 offering, see zonify from airbnb as an example. Private dns improves on that model as the vpc instances never have to communicate with the public internet now.
You install an ECS agent on each physical server that runs alongside dockerd and reports state back to the central ECS API. You can query that API for service discovery.
No mention of Elastic Load Balancing integration or even EBS integration. Thus avoiding the 2 hardest problems in container management.
To make this not suck you will still need a proxy layer that maps ELB listeners to your containers and if you intend to run containers with persistent storage you are going to be in for a fun ride.
Probably best to integrate functionality for interacting with storage systems into Docker itself, probably as a script hook interface similar to the way Xen works.
So Azure, GCE, and now EC2 all support docker natively. Sorry Canonical and LXD, but docker has basically won at this point. There simply isn't a good reason to "compete" when you can just add features to docker at this point.
Well, LXD doesn't actually exist yet, so Docker couldn't possibly be built on it.
You are likely confusing LXD with the present day LXC, which is understandable. You thinking that Docker is built on LXC is also understandable - but also not quite right.
Docker used LXC as the default container implementation for much of it's lifespan. However, it has since dropped LXC as the default and uses libcontainer instead.
The linux kernel does not actually specifically have a container implementation. Userland tools such as LXC ( https://linuxcontainers.org/ ) and libcontainer ( https://github.com/docker/libcontainer ) utilize things like kernel namespaces, cgroups, and a variety of other features to deliver the container experience.
Incorrect, docker is built ontop of Linux kernel namespaces and cgroups. The first backend was LXC (notice the C and not D). The current backend is libcontainer, which is a native golang re-implementation (more or less) of LXC. There is a docker backend for LXC, but it is not the default and is likely not used super heavily.
Note that there isn't really a Linux kernel feature called LXC. The LXC userspace just ties all of the namespacing and cgroup functionality into 1 coherent super-chroot style environment. That is the same as docker does with libcontainer.
Is there a point-by-point (and detailed) comparison somewhere between FreeBSD jail and LXC (or libcontainer) ... it would be very helpful to see that comparison.
This is actually a pretty reasonable comparison. It is somewhat obvious the guy knows more about BSD Jails and elaborates a bit more on it, but overall, this is pretty accurate:
I just asked that on Twitter, because I also am not getting it. It seems to me that ZFS and Jails provide identical functionality, but without Docker's networking headache.
That said, even if my simplistic synopsis is correct, brining a Jails-like experience to Linux would still be a really solid step forward. Besides which, Jails have been underutilized at shops I've worked at. If Docker popularized the concept, that's still a huge win.
No, Docker is not build on lxd. It used to be built on lxC, which is now an optional backend.
The grand-parent is correct that lxd is a newly introduced competitor to Docker.
I won't comment on whether it's good or bad, but it's an objective fact that canonical decided to reimplement what Docker does rather than contribute to it.
Don't worry Solomon, we still love Docker and will continue to use it :)
That being said, if they do somehow manage to get hardware assisted containment for containers, I see it as a no-brainer for docker to adopt it as soon as it is reasonably possible. Are there any plans (that you can speak of) regarding something like this, or are you waiting for LXD to be more than vaporware at this point?
I can say that there are employees of major silicon companies already working on contributing all of this to upstream Docker. I was shown a real proof of concept already, it's very promising and not at all vaporware :)
> it's an objective fact that canonical decided to reimplement
> what Docker does rather than contribute to it.
What was the reason that Docker reimplemented much of LXC rather than contribute patches upstream?
Latest LXC supports features that Docker's reimplementation doesn't, and it seems likely to get further ahead feature-wise now that Ubuntu is pouring more resources into LXC/LXD.
You forgot to quote the part where I said "I won't comment on whether it's good or bad".
Canonical doesn't need to justify itself to me, no more than the Docker maintainers need to justify themselves to you. It's just how open-source works: you weigh the pros and cons of re-using vs re-implementing, make a decision, and see if the community follows you. In the case of Docker, the community followed. In the case of lxd, I guess we'll see. Either way, more choice and competition means the user wins.
Happened to find the pull request discussing the lxc-driver issue. It sounds like there was some interest in contributing upstream, but it didn't really go anywhere.
https://github.com/docker/docker/pull/5797
From the thread, it seems like the concern was just that lxc-exec wasn't well maintained and the lxc interfaces weren't stable since it was undergoing heavy development. I think that's changed recently with both lxc and docker now post 1.0 release and 'production-ready'.
Docker still uses LXC if you want it to via the lxc-exec driver and the --lxc-conf option. It's just not the default, which probably makes sense since the lxc options mainly apply for advanced users.
So by default, Docker uses libcontainer for a simpler installation experience. But for advanced users, using the lxc driver is an option to look into.
IMO, Docker wins if it continues to play nicely with the other open-source projects people use with Docker, and also give credit where due.
That's what "containers" are, of course. There's so much state in OS file namespaces that running any complex program requires "installation" first. That's such a mess that virtual machines were created to allow a custom OS environment for a program. Then that turned into a mess, with, for example, a large number of canned AWS instances to choose from. So now we have another level of indirection, "containers".
Next I expect we'll have container logistics management startups. These will store your container in a cloud-based "warehouse", and will continuously take bids for container execution resources. Containers will be automatically moved around from Amazon to Google to Rackspace, etc. depending on who's offering the lowest bid right now.