Thanks for sharing your experience as another cautionary tale.
At Convox we have been running Docker in prod for 18 months successfully.
The secrets?
1. Don't DIY. Building a custom deployment system with any tech (Docker, Kubernetes, Ansible, Packer, etc) is a challenge. All the small problems add up to one big burden on you. 6 months later you look back at a lot of wasted time...
2. Don't use all of Docker. Images, containers and the logging drivers are all simple great. Volumes, networks and orchestration are complex.
3. Use services. Using VPC is far simpler than Docker networking. Using ECS is much easier than maintaining your own etcd or Swarm cluster. Using Cloudwatch Logs is cheaper and more reliable than deploying a logging contraption into your cluster. Use a DB service like RDS is far far easier than building your own reliable data layer.
Again thanks for sharing your experience as a cautionary tale.
If you are starting a new business you should not take on building a deployment system as part of the challenge.
Use a well-built and peer reviewed platform like Heroku, Elastic Beanstalk or Convox.
At Faraday, we're also been running Docker in production for over a year, and we're very happy with how well things have been working.
As nzoschke recommends, we rely heavily on ECS, RDS and other managed services. We are very careful about using exotic new Docker features until somebody else has successfully used them in production. We use DNS, load balancers and regular networking for discovering containers and communicating between them.
And it all basically just works. The worst problem we've encountered is that twice a year, we deploy a new version of ecs-agent to our staging cluster and need to revert it because of issues.
For us, the biggest challenge has been setting up a good local workflow for developing Docker apps with multiple services and multiple underlying git repositories. The docker-compose tool is great but it doesn't go far enough and doesn't provide enough structure. We've open sourced some our internal Docker dev tools here http://blog.faraday.io/announcing-cage-develop-and-deploy-co... but we think there's a lot more which could happen to make it easy to develop complicated apps.
As another Faraday employee, I'll add that we didn't start right away with deployment. Instead, we used docker to provide a consistent way to test our apps. Once all of our apps were properly dockerized to run tests, we were able to move to deploying the app with a simple docker-machine + docker-compose setup. After that, it was a relatively easy move to ECS. docker-compose can probably handle 80% of your development needs, but the tool we wrote allowed us to quickly switch our apps from a canonical "image mode" to "source mode" while developing. It also provided a way to run acceptance tests locally with our entire app cluster running locally. A little more info: https://dkastner.github.io/2015/07/25/accidental-continuous-...
Re 1: Why wouldn't you recommend going with kubernetes? From someone that hasn't deployed anything with it (yet), it seems rather straight-forward if run on a managed cluster (gke). I might be wrong and am thus genuinely interested in what you did for deployments.
I'm relieved, thanks guys. I really don't see the complexity when deploying to a (managed) k8s cluster. Sure, we'll still have to configure a deployment pipeline, add some scripts to push and deploy the right images etc., but overall, k8s seems really well suited to the task.
I've been running Docker in production for half this time and got pretty much to the same conclusion. ECS has its issues, but is still much more manageable than anything else (on AWS). And if started today, would've definitely used Convox (iirc, at the time you didn't support private subnets).
I never liked Cloudwatch Logs and recently switched it for an ELK stack (hosted by logit.io).
If the argument is "don't DIY" then the software is kinda broken. The whole purpose of a software project is to have users... use it. If you're going the way of outsourcing why stop at using ECS, just pay someone to run your software business and focus on... idk, marketing?
With all due disrespect and without commenting on the efficacy of using convox or any other platform, this becomes too self serving.
If the problem is Docker complexity that needs to be solved. If the problem is difficulties of running stateless apps, or distributed storage and networking these need to be highlighted and widely understood.
Using Convox or any other platform will not magically make them disappear. You still need to troubleshoot and understand what you are running. Another layer to hide the complexity can hardly help.
> Another layer to hide the complexity can hardly help.
I strongly agree! I'm sensitive to self-serving, but while Convox is another layer, it does make problems disappear by disallowing them.
Use Convox and try to boot or deploy a docker-compose.yml that uses networking or uses volumes incorrectly. You are blocked from doing so with a nice reason why.
> If the problem is difficulties of running stateless apps
This is not the problem. In fact this is a solved problem.
> or distributed storage and networking
These are hard problems.
Networking is largely solved with AWS VPC. Other providers do this very well too. This lends to your earlier point, adding the Docker networking layer generally doesn't help the problem. I take it for granted that any new networking stack shouldn't be trusted.
I think the problem started far earlier. The author couldn't name a single clear rationale for using docker at all. One client asked them to, but why continue? They were using virtualenv and it mostly worked fine. So why docker period? Complexity for the sake of complexity? The rest of the article then basically says complex things are complex.
Well, I didn't name the reasons of why we continued using Docker because I thought they were obvious to any serious docker user. The reasons are all the promises that docker makes to you without actually telling you the complexities involved. Namely - the promise of better portability, better dev-prod environment replication (You can even run vagrant for dev portability but you wouldn't use it for prod, but with docker, its just docker everywhere), better scalability with managing clusters and deployments, and so on... without actually running heavy virtual machines. And no, as per the docker promise, it wasn't supposed to be complex.
I'm a big Docker fan, but this is interesting to me mostly because it shows how it's hard to get started, particularly the opening few sentences about the paucity of documentation for helping you get going from scratch. There have been good guides, but they go out of date quickly! There's so much change (because it's still very much under development) that it's hard for someone coming to it fresh to figure out what's up-to-date and what isn't. For those of us working in container tech and tools this is a good lesson about making sure the entry curve isn't too steep.
I'd argue that it wouldn't be that steep if Docker wasn't trying to do too much. There's so many moving parts - apart from regular process isolation, you've got to manage storage (which can be a pain in the ass) and firewall configuration (which is another very complicated and not well documented topic). In any non-trivial setup those two topics are already big.
Now add container management - understanding the split between images and containers, then how to publish and download images (which IMHO should be a completely separate project), how caching works so you could write a decent Dockerfile... And then there are features like Docker Swarm which I never touched and seem particularly complex to me.
It's way too many projects stuffed into one thing.
The feeling you describe reminds me of how I felt evaluating puppet and chef some years back - with a particular view to running the open source parts (not being dependent on third party support).
It felt like every other link led back to the various sales teams for the enterprise solutions, and there existed no page with a straightforward overview of all the components and what puppet / chef actually did.
Ansible was a breath of fresh air - as well as saltstack (but at the time it had too many holes in its security/transport story).
What resources would you recommend to get started for someone new to Docker? Given that it changes quickly it would be great to have recommendations from someone working with it.
I would strongly recommend first making sure you are comfortable with the concepts and architecture of decoupled systems and services. The biggest pain I see in practice are groups that come from monolithic software development backgrounds.
This article is a great example where in using docker for the first time they chose to attempt to run the state layer(database) in a container. State layers tend to be always available and difficult to scale horizontally anyway. This cancels out a decent chunk of the benefits of containers in production. If that team had more experience with service based architecture they would have known to use an third party database service provider(RDS) or host their own on a persistent server. Especially on their first experience with containers.
Echoing the above, this is extremely important. Before jumping into any of the container orchestration tools you should familiarize yourself with terminology of Docker and it's concepts. I made the mistake of jumping in to Swarm (still unstable IMO) and Docker Compose before fully grasping how containers talked to each other, along with many other concepts I should have read up on first. (Un)fortunately, Docker is changing rapidly and the documentation doesn't always keep up. If something in the official docs doesn't make sense, Google around to gain better context.
> This article is a great example where in using docker for the first time they chose to attempt to run the state layer(database) in a container.
One question I've had as I evaluate moving a Rails/Postgres app to GKE is whether I should abandon the thought and go with AWS/RDS, or containerize Postgres on GKE, or some third option. Has anyone else been part of a similar migration onto GKE, and how did it go?
I support the idea of declarative configuration and dependencies, but I also realize that this is a very very hard thing to do, when each language have its own preferred package manager and they all expect a mutable environment.
(I tried NixOS for 2 months and switched back to Ubuntu. The issues are too numerous so far, despite heroic work of NixOS dev team)
I the configuration.nix of NixOS is a nice, small way to get your dev machines configured and installed with the stuff you like. Just copy it on your new machine and off you go :)
I don't know if I would ever use Nix instead of the more language specific packet managers.
Nix derivation and nix store managed artifacts is a superior model to many language specific package managers (though the learning curve is non-trivial), we use it with Haskell extensively, for instance.
I can't wait for someone to write a great book on Nix. I was so lost in it. I had it running on my home server, and needing to compile my own things because of what Nixpkgs lacked was a constant struggle.
It was a very frustrating experience. A frustration which was led by the fact that i could tell how powerful Nixos was - if only i could grok it.
Yeah a book on nixos would be valuable I think. Understanding nixos really comes down to reading all the docs then trying to accomplish what you need to do, learning by doing and reading source code.
I love nix, my biggest gripe would be that the Nix language is dynamically typed...
No. They do not. Docker wants to manage containers but nix wants them in the store. I went so far as writing a derivation and tool for pulling a container into the nix store without docker so it could be docker loaded.
At the very least Nix would make your Docker builds more reliable. Granted, you could then start making arguments for "why use Docker at all", but i can't answer that.
I hate to say it but this article sounds like it was written by your typical one trick pony, technology averse client services agency.
They are the last to adopt new things as they tend to be the lowest on the talent chain as meeting estimates, often via client coercion or moving goalposts, is much more important than successful solutions over the mid-ling term.
For anyone reading this that is part of that group, leave stuff like containers to outside organizations with stronger engineering talent and focus on what ultimate makes you money which is client networking and sales.
>> written by your typical one trick pony, technology averse client services agency.
You can be forgiven to have that perception. Sorry, but that's not true.
>> They are the last to adopt new things
Are you actually complaining about this? Well it just shows your immaturity as an engineer and your lack of understanding of how tech startups are run. This is something my very first manager drilled into me on my first job a decade ago, as should yours have - not to use any v1.0 software for ANY CLIENT PROJECT, but only for your hobby projects. Wait atleast for a v1.1
>> as they tend to be the lowest on the talent chain as meeting estimates
Please don't embarrass yourself. There are comments on this thread from the maintainers of both docker and kubernetes projects as well as people who have been running docker in production for much longer than us. None of them think that this article is stupid. Now, if you think you're more talented than us and all of them combined, just tell us why?
Your comments would be much more useful if you could say something like -
"They are so stupid because they couldn't figure out X which was as easy as doing Y."
Anytime you want to go toe to toe let me know. Being cheaper and more predictable is so different from actually having to make the best possible product.
This is absolutely the truth. Docker isn't something you can get an afternoon briefing on in a meeting room and then kinda wing it on implementation. How much you actually read the documentation will be immediately and plainly obvious and inversely proportional to the amount of problems you run into.
>> afternoon briefing on in a meeting room and then kinda wing it on implementation
I don't know where did you get that impression from. The article clearly states we ran docker in PRODUCTION for 6 months. The article wasn't written after a docker trial over a weekend.
>> How much you actually read the documentation will be immediately and plainly obvious and inversely proportional to the amount of problems you run into.
Pretty much everyone who's commented here and have had docker experience in production, agrees that a docker documentation sucks. Maintainers of the dockers project have already conceded that this is an issue so I don't why you're saying this.
Reading these articles and comment is fun, we are a very large organization and starting to use docker, and we love making things even more complex than other folks, so I can tell this is going to a big mess. Fundamentally we (as in this industry) seem to love simple ideas that become massively complex.
What I've found is that as long as you have someone on the books that has more than a passing interest in learning about how Docker works, you'll be fine. The vast majority of problems with Docker boil down to not fully understanding the features being used. Docker is very easy to get going, and a very deep technical subject at the end of the day. There is a lot to absorb, and if you're going to throw it into production, it would behoove you to fully understand the technology. This isn't something you can expect to be without problems unless you invest in reading the documentation. This opinion isn't popular, because everybody wants to think of themselves as truly and fully understanding what they're using. But the truth is right there in the problems people run into.
Not being a fan of docker but "Since images sizes can be as high as few GB, its easy to run out of disk space. This is another problem with docker which you have to figure out yourself. Despite the fact that everyone who’s ever used docker seriously has to come across this issue sooner or later; no one tells you about this at the outset" -
monitoring free disk space is something you definitely want to do regardless of using docker or not.
You need to be sure that you don't lose important data when running something like this in your setup, but it works nicely to remove old images. This script is deployed to non-coreos servers as well.
I want to use Docker in production so I don't have to write hacks like this.
I want to put my application, any application, in a nice tidy box, ship it to a server, any server, an be confident that it'll work. That's the promise of Docker. To me, Docker doesn't deliver on that promise if I have to copy&paste a 12 line bash script from HN to be able to do that or else face out-of-space problems at surely the worst possible moment. If I have to jump through hoops like that, then what's the gain?
Might as well just install an Ubuntu VM, apt-get everything I need by hand or with ansible and add my app to the startup script. I'll have similar complexity in a more mature and well-understood environment.
The size of docker images ended up being the reason we aren't using it. Transfering them took forever, and so if we had an emergency update to make, we'd have to go outside our normal routines, which is brutal.
I'm sure we'll end up finding another way that's similar (provides the same benefits for us) but without such crazy image sizes.
Is your app gigabytes in size? Docker can't help you with that, but nothing else is going to help you, either. There has been a ton of momentum lately in getting Docker images pared down to miniscule (e.g. ~25MB... megabytes) image sizes with for example alpine. If you're using something like "ubuntu:14.04" for your images, this is absolutely a problem you can solve.
I just switched to this, problem solved. But I tried sevarl other solutions and found many were out-dated and no longer worked well, with no doc on which version of doc the approach worked with. Should be built into docker IMO and I've heard 1.13 will have a clean-up facility built in.
For those interested, this is built into rkt with `rkt gc`, for an easy cron job experience. You can set a grace period for how old an image has to be to qualify.
A lot of this was the motivation for writing the book we wrote on Docker in Practice [1]. The reality of implementing such a technology requires a combination of improvisation, technical nous, bluffing, and a willingness to work through the inevitable problems [2].
I've talked about the relative immaturity of Docker as a used system (outside of dev) [3] and am struck often by how rarely people understand that it's still a work in progress, albeit one that can massively transform your business. The hype works.
That said, Docker can work fantastically in production, but you need to understand its limits and start small.
As someone starting considering Docker (and possibly Swarm), these seem to be pretty serious criticisms. Any experiences to corroborate / counter these two posts? Going by what's written here it would be suicide to use Docker, but many people are...
The central themes of the recent criticisms of Docker are made from the perspective of enterprise. Rapid iteration by Docker-the-Company is a bigger concern when the use case is closer to the Cobol-in-a-Container end of the spectrum. The alternatives are coming from organizations like RedHat and CoreOS and the big cloud providers that are focused on selling into enterprise.
At the container level, Docker containers are the basis for the container specification from the Open Container Institute. The kertuffle is over orchestration. Swarm is a feature from Docker-the-Company that trys to make 'Hello World' container orchestration Ruby-on-Rails easy. Right now the alternative orchestration layers are more toward the "Apache server man page" end of the spectrum.
Essentially, Docker-the-Company and the Docker critics are focused on different contexts. Docker-the-Company thinks container orchestration on a Raspberry Pi is worth pursuing. The Docker critics are coming from a world where CentOS 5 is still relevant (metaphorically).
My advice — do not use Swarm. Start directly with Kubernetes or DCOS. You might think that those are overkill for smaller projects — well, I also thought this way, and got into similar issues as this article describes. When I started again with Kubernetes, life became much simpler.
Sorry I'm really confused and rather new to the devops world, aside from pushing an app to a single VPS. What problem does Kubernetes solve exactly? Is it to tie a single application spread among multiple servers?
Docker way is structuring your apps in "one service per container" — Docker is designed around this. Even the simplest applications require multiple containers (one for front-end, one for back-end, one for the database etc) — even without clustering or high-availability, which complicate things even more.
When you have multiple containers, it is imperative that they are managed as a group, with clearly defined dependencies. This is why Docker requires an orchestration tool. And when the number of your production nodes is greater than one, Docker on its own becomes inadequate for that.
I can understand the use of Kubernetes to spread over multiple hosts (but that's generally for scaling & availability).
If you have multiple devs...how does docker-compose fall short? (I'm asking because I wanted to suggest using docker for my team and wanted to know it's short comings before I suggested it).
From my understanding: you can use compose to orchestrate databases, cache, app all on one machine quite easily (and pass environment variables, get them linked etc.)
In addition to multi-node distribution, there are a bunch of things Kubernetes gives you. Things that you may think you don't need, but you probably will if you ever intend to put something into production. To name just a few:
- Secret distribution
- Managing persistent volumes
- Monitoring containers for failure, restarting according to policy
- Service discovery and DNS integration
- Integration with load balancers, setting up routes, etc.
Kubernetes is an OODA loop- it schedules containers to run on servers, and reschedules in response to events. The hope is that it takes a lot of plumbing work off the table and handles it for you.
I would put it this way - how much value gain are you expecting integrating docker in your processes? Is it a pressing need right now? How badly would it affect you if it doesn't work out for you? The article is the summary of our small scale experimentation. You can definitely experiment and weigh the gains for yourself. But its good to have the issues listed in the articles here at the back of your mind.
Thanks! My position is, I'm starting from a blank slate with a new project. I want feature set X, and Docker and Swarm supports feature set X. (where X is something like running a number of little microservices with failover, load balancing, local integration testing, mixed language platforms, sharing development across organisations). I could get them another way, but I want to start on the right foot with a new project.
> failover, load balancing, local integration testing, mixed language platforms,
Docker does none of that.
It's only a packaging and deployment system. You package the app as a docker image, then you can call a docker command on any system to grab that image and start it.
Without docker:
1) You'd make a zip/deb/rpm of your application.
2) Download the zip to some servers
3) Update the dependencies & systems stuff
4) Start the app
With docker:
1) You'd make a docker image [basically: run a script to install the app and the dependencies, as in the previous steps]
That's a very nice summary of Docker, and kind of highlights why the hype is somewhat... misplaced.
Jails (from bsd) and chroots are a great idea. But a way is needed to manage the file system of a) the chroot (the c libraries, the configuration files the application code). This is docker image; b) persistent data (database, images and binary user data etc). As far as I can tell docker doesn't really come with a compelling story here - something that's easier to manage and gives high performance (say something that competes with iscsi for database files, and a solid out-of-the-box clustered filsystem).
Now, docker gets (justified) hype for pushing the jail/chroot (aka "container") idea. But I think a lot of people (possibly including docker Inc) think that docker does much more (and do those things well) beyond being a nice-ish set of tools for building and managing self-contained chroot file systems for applications ("images").
Docker Inc certainly is working on "everything else" - but I think moat would be well-served to look at Lxd/lxc if what you want is "lightweight Linux vms", or kubernetes if what you want is to move towards a "container/chroot (micro) service paradigm".
Kubernetes might seem a bit complex, but that is because it tries to solve a complex set of problems.
Docker is more like "yo! Synchronise your /etc/passwd file across systems so you can log in to all your machines", while kubernetes is more like LDAP+dns+kerberos. More complex, but more sane. And built not just to get started, but continue to work as your system evolves.
And I've been quite happy playing with Ubuntu and Lxd + zfs on the other end - the simple light weight Linux vm end.
I've used Docker in production environments and found that it was helpful on some projects, but a disaster on others. The article brings up many of the pain points that you'll find using Docker for anything more complex than toy projects. It also gives some really good advice (like not putting your DB into a container). I'm currently finding that docker-compose is great for local development, but we've learned to be very cautious about introducing it (Docker) for anything production based.
I would suggest starting off by stepping back from picking specific technologies and architecting your system properly first. Where do you need load balancing? What are the different parts of the system? How can you break those parts up so that different teams can work independantly from each other? What microservices do you really need? Do you actually need microservices? How will they communicate?
If you start off by having to design everything around Docker (or any other implementation detail) then you're going to have a very brittle system that's going to cause you pain in the long run. After designing things fully you may realise that you've managed to eliminate most of the initial perceived complexity and can actually work just fine with more boring tools.
Docker and swarm don't actually solve any of feature set X. You have to do that yourself, and while they may help facilitate a solution, they're not going to really do anything for you. Lots of people have jumped onto Docker without really understanding what it's doing for them, which is why it seems like everybodies using it.
If you don't know which problems you have the Docker is a good fit for solving, don't use it yet. If you can't fit it into your process at a later date, it probably wasn't a good solution for you in the first place, so you'll have saved yourself a headache.
I can assure you, you can do all that without getting yourself into the microservices mess and docker. As you said, please get them another way. If this is a serious project with deadlines and client money at stake, as you said please do it another way.
Honestly if you're new like you said (blank slate), I'd recommend using Docker for your development. Get comfortable with the idea of containers first. Start with one, then try using more (your app, redis, rabbit, etc all containerized... but personally I run DB locally).
> Orchestration across multiple physical server gets even more nasty where you’d have to use something like Swarm. We’ve since realised that swarm is one of the lesser preferred options for orchestrating clusters.
Could you elaborate on this? Did you settle on an orchestrator, and if so, which one?
We didn't use orchestration across multiple physical machines so it was possible for us to get away with just docker-compose. Otherwise we would have had to go with swarm. Kubernetes is a way better choice than both compose and swarm but it has a even more steep learning curve.
Just a quick heads up: On chrome v54.0.284.0.68 on my Android nexus 7 tablet the new tutorial page is half cut. I only see "asics" instead of what I assume is supposed to be Basics, all the text below is also cut:
Overview
asics
through of the basics of the Kubernetes cluster
module contains some background information
...
There's no way to zoom out, and there's 2 hamburger menu : one white on black on the left and one blue on white on the right, both are broken
The 2 other links work well
I won't deny that when I started on k8s a year ago it was confusing and I was pretty lost. But I also didn't have docker and docker-compose experience. If you have the latter I feel that learning k8s should be pretty simple. The most difficult parts of k8s to me now are the more abstract features, like ingress, rbac/abac and things of that nature.
The system has matured incredibly over the last year. There are also third party tools like kompose/compose2kube which will convert docker-compose deployments into kubernetes manifests.
This article reflects our transition experience with Docker, it took us two years to finally feel comfortable (old chaps from decades of corp dev), but end result is very positive and rewarding. It is fair to say that this should be expected for any infrastructure migration, there is nothing wrong with being slow and careful as long as we are moving forward. Based on experience from our team, the problem has never been finding help/answers, problem is we were facing, one one hand, an encyclopedia of single page documentation like Dockerfile/docker run command, while practical guidances and gotchas are scattered around rest of teh Internet, blended with personal and business specific opinions. It is hard but eventually we figured the best way to work with containers from git to build server to deployment, that fits well with our productivity workflow;, as well as where to not use docker.
We spent hours figuring out a good way to use a databases in both dev and production with docker. It was tricky since docker containers don’t support persistence unless you use a mount-point. There were a few patterns documented which didn’t work for us or we didn’t really like. We had to figure it out by ourselves. This is another area where you’re expected to figure out by yourself whether it is a good idea to use docker to run your production database. Hint- Its not.
I think that a problem that leads to a lot of these articles being written is that the motivation of the authors to use Docker is unclear.
Why bother to put your database inside a container if the way you ran your databases before worked just fine?
You shouldn't just rush to put all your processes in containers because it is cool and Containerization Is A Good Thing. You should use technology that makes sense given the problems you are trying to solve at a give moment.
>> Why bother to put your database inside a container if the way you ran your databases before worked just fine?
For the same reason you'd bother putting anything in a container at all. The purpose of docker -as advertised by docker- is to containerize EVERY service. Unless you can point me to a single mention in official documentation that databases are an exception. Till date, I haven't found any and thus this blog.
Completely agree. I don't know where or how people started thinking that putting a stateful service inside of a stateless container was a best practice, but no one should be doing that in production.
I wouldn't contemplate it, except that I'm looking at migrating a Rails/Postgrep app to GKE, and containerizing PG seems like the lowest-friction path. Are there better alternatives for getting to GKE, or is GKE a bad fit for this scenario?
As I said, if it is really as obvious as you say it is, please point me to a single mention in official documentation that databases should not be containerized.
Why would documentation about docker include basic fundamental knowledge about deploying services in production? This isn't docker knowledge, this is service based architecture 101. Why is it docker's responsibility to teach you general software engineering concepts?
If anyone is looking into orchestration systems, I can't speak highly enough of Kontena (https://github.com/kontena/kontena). While it is still in the early stages of development, it is a great "small-mid level" platform with a ton of features. The config and concepts are also easy to wrap your head around. We chose Kontena as Swarm is still unstable (imo not "production-ready" but I've seen counter-anecdotes) and Kubernetes was lacking some features we required. The devs have been super helpful with any problems we've had in deployment as well.
There's a lot of "figured it out ourselves" without actually sharing what they figured out. It's like this post is a whole lot of nothing: we know docker in production is not easy but how about telling us more details
Isn't every "figured it out ourselves" followed with the sentence describing what we figured? I see only one section where I didn't describe what we figured, which is the workflow part and I mentioned there that I would write a separate blog.
I agree with most of the points here, but some stuff is misleading. For example the "you have to rebuild after everything that code change" is ridiculous. Just use "COPY SRC/ /app" In your dockerfile, and in dev mount SRC/ as a volume over /app. There, hot reloading sorted for development.
Don't get me wrong, docker is one of the most frustrating technologies I've used (partly because it shows such promise), but a lot of the problems he describes can be sorted with the most cursory Google.
> I agree with most of the points here, but some stuff is misleading. For example the "you have to rebuild after everything that code change" is ridiculous. Just use "COPY SRC/ /app" In your dockerfile, and in dev mount SRC/ as a volume over /app. There, hot reloading sorted for development.
The article _specifically_ refers to production environments, and, by extension, staging servers:
“Good practices dictate that you don’t mount your source code directory in the docker container in production. Which means you also have to rebuild the image on test/staging server every time you make a single line of code change.”
Then it also states in the “logging” section that, contrary to the development environment “On production, since your source code directory isn’t mounted in container […];” so they clearly _are_ mounting the source code in the development containers.
Oh right, sorry, I thought it was obvious that you have to 'rebuild' the image when a code changes. But if you have a "COPY src/ /app" as the last command in your dockerfile then that's not an expensive operation as all the previous ones are cached.
Unless you change your system packages or add a new line to your requirements.txt, in which case the cache would be invalidated and the build takes longer.
Perhaps the article should rephrase it as "it's very complicated to learn how to do it properly, and not obvious when you're doing it wrong".
My experience with Docker is that Google searches often turn up with configurations that other Google searches will say are a bad idea. There don't seem to be any kind of well-documented emergent best practices.
yes, that's pretty much what I meant. Our usage scope has been limited and so has been the time/resources we were willing to invest into researching fixes for things which were obvious for us before.
Because since the source code folder was not mounted on the container, we had to rebuild the image with the updated code and then restart the container. On dev machines, since the source folder is mounted, django runserver's autoreload function works as usual.
Grandparent is referring to hot reloading, wherein the application itself recognizes changed source files and reloads them on the fly. A common feature of server-side MVC frameworks like Django, Rails, etc.
Often times I'll just run the container interactively with a shell and start/stop the app manually as if it was running locally while doing development work, almost as if its a super lightweight Vagrant setup.
So, something like
docker run -it -v /path/to/source:/app my_image /bin/bash
I use pm2 with --watch parameter inside the container, the code is mounted, so I can edit outside, but I don't need to restart the app. I do webpack -w locally though to simplify.
> Since the client wasn’t keen on spending on getting a more private repos, we managed with the single repo to save our base image with the most of the dependencies bundled into it.
Well, most of your client are bound to do this. The point is we managed to get away with only one private repo. My guess is you'd need more than one private repo very rarely.
Why is running a db in a container with the data directory mounted from the host is such a bad idea? Dropping the container on a new host with a copy of the data directory looks easier than having to install the db from scratch, even with automatic tools. Are there performance penalties at run time?
It is fine for dev, but not for production. To quote from the link at the end of the article-
"Docker is meant to be stateless. Containers have no permanent disk storage, whatever happens is ephemeral and is gone when the container stops. Containers are not meant to store data. Actually, they are meant by design to NOT store data. Any attempt to go against this philosophy is bound to disaster.
Moreover. Docker is locking away processes and files through its abstraction, they are unreachable as if they didn’t exist. It prevents from doing any sort of recovery if something goes wrong"
"A crash would destroy the database and affect all systems connecting to it. It is an erratic bug, triggered more frequently under intensive usage. A database is the ultimate IO intensive load, that’s a guaranteed kernel panic. Plus, there is another bug that can corrupt the docker mount (destroying all data) and possibly the system filesystem as well (if they’re on the same disk)."
1. Containers are not ephemeral. They have a lifecycle. Data written in the container is persisted to disk and available after the container is stopped and then started again.
2. Processes/files/etc are not locked away as if they don't exist. See `ps aux` on the host. You will see all the processes running. You can inspect the filesystems for each container, etc. There is no magic here.
3. A database crash could cause data corruption inside a container or not. This has nothing to do with the container, and chances of a database crash are not made worse by being in a container.
That said, I would let a volume driver manage persistent storage rather than manually managing this through the host fs... but that's my preference.
--- EDIT ---
Disclaimer: I work at Docker Inc, and am a maintainer on the Docker project.
0. The lifecycle of docker containers is an extremely complex topic with limited documentation. It's safe to assume that it's out of reach for 9X% of readers here. One needs to fully understand the lifecycle of their containers to attempt to run databases in Docker, that's a huge barrier to entry. Advising 100% of people to run production (i.e. permanent, long lived) databases in Docker is terrible advise.
1. The entire concept of containers is based on being ephemeral. They do have a storage (in /var/lib/docker/<cryptic-structure>) and they should be started with -rm to make sure that everything they did is cleaned up automatically after they exit. If you want to keep the data and make something around that, good look with that!
2. Wrong. There is a truckload of magic going on here from filesystems to networking. Docker is hell to debug. A fucked database hidden away in Docker will be close to impossible to debug. If you're a sysadmin, you do not want to be in that position, trust me.
3. The odds of a database issues are at lest 3 orders of magnitudes higher if running within Docker. The docker ecosystem is notoriously unstable and the filesystems are unreliable. (Plus Databases are IO intensive which is gonna trigger all the rare bugs and race conditions).
Seriously. If you got a brain cell at Docker Corp. PLEASE STOP overselling your product and advising it for absolutely everything without considerations for what people are doing.
Every time one of you guys advise to run databases in Docker, you're objecting to everything that docker stands for (i.e. statelessness). Not only it is confusing the hell out of people but it's putting them on a guaranteed path for future catastrophic failures.
Running production databases inside docker. Just because it's not strictly impossible, doesn't mean it's possible.
[See RFC1925 https://tools.ietf.org/html/rfc1925 ]
(3) With sufficient thrust, pigs fly just fine. However, this is
not necessarily a good idea. It is hard to be sure where they
are going to land, and it could be dangerous sitting under them
as they fly overhead.
0. There is a plethora of documentation. Even the CLI suggests the lifecycle (start, stop, restart, pause, unpause).
1. This is simply not true. Your understanding is that they are based on being ephemeral, but this is not inherent in any sort of design of containers.
2. Magic is not really magic when you understand what's happening. Cgroups apply resource limits on a process, namespaces limit what a process can see. These come together to make containers. The host still has full visibility on these processes just like any other process on the system.
3. Do you have data to back this up? A container is just a process that is namespaced and resource limited. If you are writing to the copy-on-write filesystem provided for the container with a database, then you are doing it wrong (in 99% of cases). For that matter, you can even use ZFS for the container FS, which has been in use in production scenarios for quite some time... performance may not be great with ZFS here but integrity will be (not that I'm advocating for writing directly to the container FS... not at all, really).
There is nothing about Docker and statelessness. It can sure make cleaning up after a process a bit simpler but this doesn't mean that docker equates to statelessness.
Storage is hard whether you are in a container or not. Process isolation does not affect this.
0. That doesn't explain anything about what's happening underneath. It's far from enough to even form a mental model about Docker operations.
1. The stateless & The ephemeralness & The tooling. It all goes together. Just because its not enforced all the time at every level doesn't mean that it's a good idea to diverge from it.
2. What about the networking? the DNS magic? the storage? the filesystems? the lifecycle of data across containers & images and containers & further containers? the log management? the logging drivers? It would take multiple books to cover these topics.
3. Again the filesystem and storage issue should cover an entire book. There are many blog posts and issues talking about that. ZFS only became available very recently and exclusively to Ubuntu, it's ridiculous to consider that as a real world scenario.
Docker equals stateleness. That's the only thing it's supposed to do and could do well. Maybe you should consider focusing on one use case that Docker does well (i.e. packaging & deploying stateless applications). That would make up for better documentations and explanations and goals ;)
(IMO. After reading your comments, it seems that you have no clue whatsoever about systems internals [or maybe we just don't communicate well on that]. That's scary if Docker itself doesn't have a clue about what it is nor what it should be.)
> For that matter, you can even use ZFS for the container FS, which has been in use in production scenarios for quite some time... performance may not be great with ZFS here but integrity will be
It's not a very good fit for a production database if "performance may not be great"?
> (not that I'm advocating for writing directly to the container FS... not at all, really).
> There is nothing about Docker and statelessness.
You just recommended against storing state in the container FS on the previous line. What kind of state are you advocating a container should keep (that is different from what is captured the docker file and any separate data volumes)?
> Storage is hard whether you are in a container or not. Process isolation does not affect this.
But abstraction does. Normally for a database, you'd have a mirrored set of ssds, lots of ram, spread over a couple of physical nodes. Maybe with a loadbalancer thrown in.
Or maybe you'd run your nodes as a vm, with iscsi or some other nas/das. I can't recall seeing reasonable advice on how to set up such a production system with docker (but I haven't looked all that hard!).
Last time i checked, I couldn't find any suggestions for high-performance, well-tested container storage?
Depends on in high-performance is what you need, but this was just an example of even the container FS can have incredible integrity.
Why would a container keep from using mirrored sets of SSDS, RAM, or an LB?
The absolute worst case you can set these up manually on your host and map the directories into the container.
A better scenario, the various storage systems (EMC, NetApp, Ceph, name it) out there have volume plugins integrating with Docker, Kub, etc.
How to handle storage in the container depends on your needs, just like as if it was VM or a physical machine... and ultimately the setup is in the worst of cases no different.
I think there's a bit of mix up between what an image is and what a container is. You normally don't write data to an image, but you can write data to the container. You can keep this data so long as you keep the container, and you can commit that container's data to the image if you wish.
Yes. The filesystem in the container is a real filesystem backed by the disk.
How this happens is dependent on the storage driver used.
The `aufs` driver (default when available), as well as `overlay(2)` and `vfs` drivers just sit on top of the existing filesystem at `/var/lib/docker` (or the defined docker root).
BTRFS, ZFS, and devicemapper must be pre-configured to even use and depends on how you configure these, but still generally would be on an actual disk.
I still don't understand this theory. As stated above, containers have the option to mount volumes on the host file system. Anything written while the container running is immediately persisted, and if the container dies you just re-mount the volume and continue as normal.
To harden this even further, you can run clustered DB nodes in Docker (+<your_preferred_orchestration_tool>) quite easily. So with persisted data, multiple node replication, and server snapshots I'd be interested to know as well.
Kind of very basic issues IMO. The only one that's hard is logging. We are using K8s on AWS and I haven't found a good solution for logging centralization. Personally I don't like the Kibana interface for that kind of stuff.
My criticism to Kibana is that it runs all the queries for a dashboard in parallel. It overloads the Elasticsearch machine(s) even when there are only few data. A small single ES server would be enough if the queries would be run sequentially. I wonder if Elastic is in the business of selling multicore 32 GB servers that sit idle most of the time.
Good write-up considering that you had to manage the adventure of learning and scaling Docker along with the pressure of commitments from clients.
I would like to share some pointers on how I and my team deals with the issues that you mentioned in this post.
1. Orchestration :-
For us, Swarm never became a choice, as we started our journey adopting Docker in early 2015. That time there was no Swarm. We resolves to Mesos and Marathon for our orchestration. Both worked out well for us in the long run. We have production systems running this setup for last few months. Swarm is mature now, and with Docker 1.12 its been made more easier to use. The good thing with Swarm is that you could avoid adding another new system in your infrastructure like Mesos, Kubernetes etc. if you have reasonably simple requirements. We found that Orchestration also established service discovery and routing capabilities for us. We use HAProxy and Mesos DNS for our routing and discovery needs. Marathon-lb project is used to allow us to reconfigure HAProxy everytime a new Docker Container is deployed by the CD Pipeline. Marathon manages our service ports across the cluster, and every new Docker container gets its own unique service port. This service port is then informed to HAProxy, and reload happens. This setup worked good for us, although we had some initial trouble. We also practice Zero downtime deployment with our stateless services using the ZDD script inside the Marathon-lb project.
2. Running out of disk space :-
This is a common problem especially with the idea of rebuilding and deploying disposable containers with the CD pipeline. In our case, we use Monit to gather system wide metrics at all times. We use a Garbage collection script that we developed in house to remove the old Docker images and Containers periodically whenever Monit detects file system usage beyond the set thresholds. We do continuous production deployments as often as we need, so this allows for our Docker image diff to be minimal. We avoid big bang releases so that the latency for docker push on the Build server and docker pull on the cluster is minimal. The Spotify Docker-GC project is a good choice according to me.
3. Docker registry :-
We use Docker Registry container that runs on the Mesos cluster via Marathon. The Docker Registry is backed by a shared volume on the Docker hosts. We share the same volume on all Docker hosts in our cluster. So, if the registry crashes on one host, Marathon is able to redeploy the Registry Container on another host which has the access to the shared registry volume. We tried moving our Registry backed to S3, but never in production. For the systems we manage, we need the Docker images in house due to compliance requirements. Therefore, we could not use Gitlab Registry or Docker hub for our production deployments. But I heard good things about Gitlab registry.
4. Logging :- We use Logspout on each Docker hosts. It forwards the logs to our managed Logstash and further to Elasticsearch service. We use Kibana for log dashboard. Logs are rolled over on each Docker container, so that we avoid storing the logs on the host for long time. However, any distributed logging introduces log ordering and latency issues. So, we are tackling them as of now through some optimisations.
5. Dependency and Base Images :-
We use hierarchical model of managing Base Images : One top-level Registry (Global), and isolated docker registry for each project.
Every Base image gets into our Top-level Docker registry which is curated, and the associated Dockerfile for that Base image is checked into our Git Repository. We insist using these Base Images from our registry for all projects.
Each project can then inherit the base image, and customize to the local needs of the project. We follow CI and CD for our Base Images as well. Each project gets a notification when the Base image changes. They are free to opt in or opt out. This model works for us, but many not be that interesting for smaller setups. I had written about it here:- http://thenewstack.io/bakery-foundation-container-images-mic...
6. DB and Persistence :-
We avoid running stateful services in production on Docker Container, as we have qualms about the persistence support in Docker. However, we do use Docker volumes for all purposes in non-prod environments including CI, Elasticsearch and other services. We are very interested to pursue this further with ClusterHQ and Flocker based offerings in the Docker ecosystem. I may blog about it in the coming days on this. But so far, I don't have any production experience with the Database in Docker. But I am optimistic that this will happen soon.
7. Longer build times :-
This is correct as per your assessment, but widely varies across deployments. As said earlier, we want the team to have faster build times, so we build and release as often as possible.This allows to not have to deal with build latency. We use lightweight Base images like Alpine, and prevent the use of Configuration and Package mangers like Puppet inside the container. In our base images, we prevent bloating by avoid installing irrelevant packages. This has backfired some times in production, but we have found ways to go around it most times.
Overall, I am aware of the challenges that you had, and can connect with all of them. Docker is not the panacea for all the infrastructure woes, but its certainly gives the taste to me and our team on how software development and delivery will change for good in the coming days.
A lot of the issue I see him describe in production are fixed by Kubernetes. Compose works fine for local dev orchestration but pattern doesn’t work for deployment. The ideal world would be that I can run my compose file on my cloud provider, but Swarm isn’t there yet. I have to rewrite my compose file using kubernetes configs — it’s not a 1:1 mapping but the high level connection are there if you think of Cabernets Pods as Docker Containers. He mentions orchestration across a cluster with Swarm is nasty, but it’s elegant w/ Kubernetes.
Docker Registry:
Obviously, there is no constraint permitting him to use a 3rd party service. Why to let Google Container Engine (GKE) or AWS ECR handle it for you?
Longer build times:
I think this is really where he is missing the mark. It sounds like he has a fundamental misunderstand that if you mount the source code in dev you have to do it in prod too and that you have to have 1 container. Not true: You can mount the source code in dev using compose, so you don’t have to rebuild every time you change a line. Also, I think it’s a pattern in docker to try to keep your containers as atomic units of your app architecture. It sounds like they are trying to bake all competent of their app into 1 container (app + db + service, etc). Just break them up into containers, link them up w/ compose. This architecture them translates cleanly to one of the cloud providers for production: GKE or AWS.
DB and Persistence:
Yes, I think it is very clear that containers are stateless. So, yes if you want to run a DB in a container you’d have to mount an external drive somewhere. There merits and risks of that are another discussion, but as he states it’s generally frowned upon to containerize a DB. (not completely sure why, some argument about stability of container and corruption of data…) I talk more about this here: https://news.ycombinator.com/item?id=12913198
Logging:
I think 12-factor style app containerized fit more smoothly into the docker compose style architecture. Accordingly if all you containers are logging to std out it, it’s all conveniently merged and printed out to the terminal if you run compose. Then on production, GKE handles it nicely too w/ the Logging system.
In conclusion, I think most of his problems would been avoided if he didn’t skip researching Kubernetes and if he didn’t make the mounting oversight. The other big oversight I think he did, at least he didn't mention, is that he has no deployment tool. I wouldn't be able to effectively deploy w/o a build tool like Jenkins. I talk about a lot of these issues and how to fix them here: https://news.ycombinator.com/item?id=12860519
Yet another guy who doesn't know how to use Docker properly and just brags about it.
There are a lot of misinformation in this blog post.
The conclusion from these is not that Docker sucks, but YOU HAVE TO LEARN it. I agree that it's a very steep learning curve, but after the pieces come together, Docker solves quite a lot of problem and actually very useful.
I think the article is very clear that this is about OUR experience of using docker. It is also stated right at the beginning that we have a business to run. If it going to take 2 years to integrate it into workflow (as it did for one of the commentors here) and hundreds of developer hours just to make it work, we have to decide whether it is viable for us. Other than that, I don't hear you refuting a single point that is made in the article.
>> Yet another guy ...
If there are a lot of people who are finding it difficult to grapple with it, maybe its really not their fault?
You have to learn it — where? Documentation is sketchy and mostly obsolete, docker-swarm is buggy, genuine production issues are routinely closed with WONTFIX resolution.
Could you point me to good piece of documentation or success story of running Docker in production?
(I do use Docker, but it requires to learn Kubernetes or DCOS to do anything production-related. And those are separate projects).
Every one of these blog posts begins with almost exactly the same thing:
The documentation just plain sucks. The doc parts that don't suck are outdated, and therefore useless.
As long as that is the case, you are going to have people who go through the exact same troubles with Docker time and time again (one theme is databases in docker).
(Transferring ~/.docker wasn't an option as I already had a set of my own hosts on my computer, and manually editing undocumented configuration files is harder than it sounds — for some reason I forgot, I couldn't partially transfer some hosts but not the other. This is after I have spent 2 days in even deciphering what has happened, as error messages are uninformative and Docker CLI just hangs in an endless loop in such situation).
Indeed, would definitely be cool to be able to export a specific machine config to a tarball that can be imported on another machine. I don't work on machine, but anyone made a proposal for this (probably with warnings about copying ssh keys and such)?
I know this doesn't help with your current issue on docker-machine, but...
I think the issue here is docker-machine's primary intent is as a developer tool to spin up dev environments quickly and easily (zero-to-docker as we say), as such the datastore and security model is tailored to this.
We are working on production-level infrastructure management, the base-layer of which you can find here: https://github.com/docker/infrakit
In essence, it is impossible to work with an existing Docker-managed host from another computer. I often work from home, and when I wanted to manage my Docker hosts from my laptop, it turned out to be impossible and this issue was closed without resolution.
This was the day when I gave up on using native Docker.
I should mention Docker has a "Docker Captains" program which are a group of people who write quality articles on using docker from intro-level to advanced.
At Convox we have been running Docker in prod for 18 months successfully.
The secrets?
1. Don't DIY. Building a custom deployment system with any tech (Docker, Kubernetes, Ansible, Packer, etc) is a challenge. All the small problems add up to one big burden on you. 6 months later you look back at a lot of wasted time...
2. Don't use all of Docker. Images, containers and the logging drivers are all simple great. Volumes, networks and orchestration are complex.
3. Use services. Using VPC is far simpler than Docker networking. Using ECS is much easier than maintaining your own etcd or Swarm cluster. Using Cloudwatch Logs is cheaper and more reliable than deploying a logging contraption into your cluster. Use a DB service like RDS is far far easier than building your own reliable data layer.
Again thanks for sharing your experience as a cautionary tale.
If you are starting a new business you should not take on building a deployment system as part of the challenge.
Use a well-built and peer reviewed platform like Heroku, Elastic Beanstalk or Convox.