Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Docker rollout – Zero Downtime Deployment for Docker-compose (github.com/wowu)
187 points by pankarol on Feb 7, 2023 | hide | past | favorite | 88 comments
docker-compose is great for single node docker deployments, but it doesn't have a feature that would allow zero downtime deployments. It's not possible to deploy often if your app goes down every time, and using Kubernetes/Nomad/Swarm on a single node is an overkill.

I created this Docker plugin to be a drop-in replacement for the restart command in usual docker-compose deployment scripts. It performs a simple rolling deployment of a single service.




Dokku Maintainer here.

This is pretty neat. One of my gripes about docker-compose - and a major why I've been hesitant to add support for it - is that the updates are not zero-downtime. That makes it much more annoying to use for app deploys as you either have to rewrite the compose command as a docker command (defeating part of the purpose of a compose file) or accept the downtime during a deploy. I'll definitely be including this tool (or something like it) with Dokku once I actually add compose support.

Combining this with either Caddy Docker Proxy[1] or Traefik[2] could be quite nice for a very simple app deployment system.

Would be super awesome for this functionality to land in the official `compose` plugin, but for now this is a great way to dip your toes into app deployments without too much overhead.

There is a small island of productivity tools around docker-compose that would be super nice to build, and it's nice to see something like this land :)

  - https://github.com/lucaslorentz/caddy-docker-proxy
  - https://doc.traefik.io/traefik/providers/docker/


This is slightly off-topic, but does Dokku take care of host level security at all? Or do you have secure the server normally before using Dokku?


It doesn't, mostly because it's a fairly large area to have opinions about, and not everyone agrees on what is required (in fact, I know of at least two orgs using Dokku that would definitely not agree on each others requirements).

Honestly I think Dokku is complex enough as is without me trying to hamfist my idea of security onto a user's host. I don't think I know enough about the topic to make good decisions that everyone will agree upon (and will be friendly enough to users to not make them upset with Dokku).


Ah I see. That's a fair point. With Heroku you delegate both (1) the server management and (2) the deployment aspects of an application. Dokku seems to solve (2), I hope someone will come up with an automated solution to turn a linux box into a secured application runtime.

I use digital ocean apps and can't help but feel grifted, the price for what you're getting in specs is terrible compared to a VPS or a bare metal. The same is true with Render or Fly. But I just don't have the time or confidence to run my own server.

I feel like there is an opportunity for huge savings here, I just need that fictional piece of software.


Yeah that makes sense. I really liked Triton from Joyent, as it provided a neat host os that could essentially be looked at as a multi-host docker installation but didn't run much else. You could then run the Dokku docker image on top of that and only really expose `nginx` (for routing) and `openssh` (for builds and interacting with Dokku), limiting your surface area. You can do this today if you wish on top of Alpine Linux or some other small host os, but that only takes care of half the problem (the other part being firewalls and detecting intrusion attempts imo).


Any chance Dokku will ever run on OpenBSD? Its philosophy of "everything disabled unless you need it" would make it an excellent choice from a default-security perspective.


Dokku mostly uses existing "unix" tools or golang binaries, so it should in theory work (buildpacks are another matter, as they are mostly written for Linux, but a Dockerfile-based build should work). The big issue is actually interacting with the Docker daemon, as that is Linux only. If you could provide a `docker` binary on the host that was a wrapper around Jails for the commands we use, it should work just fine.

That said, I don't use BSD and wouldn't have time to investigate adding support for any of the BSDs.


what about zero downtime in batch processing services/cron jobs instead of “only” services that serve HTTP requests


Thats a bit more difficult. I don't think there is a great way for cron jobs other than triggering them and reverting if there are errors, though I haven't seen a single deployment tool that would do this.

For batch processing, Dokku checks the process uptime - if the process stops/restarts within X seconds, its likely unhealthy, and the deploy should be reverted. Some frameworks expose http endpoints that can be used to perform healthchecks as well.

At the end of the day, monitoring for errors in an error tracking service - such as Sentry - and seeing if new errors start spewing for a given deployment is probably the only better thing folks can do, although reverting code isn't always straightforward.


God, I'd love compose support in Dokku.


What exactly is the problem with short downtime during updates/deployments? A lot of people don't care if their deployment requires a few seconds of downtime. And most users of $randomwebsite don't really care if they need to hit reload button every now and then.


If you work on a site with production traffic that you care about, you won't want users to see intermittent errors.

Even if you are OK with the occasional error like that, if your deployment system DOES produce intermittent errors, you'll deploy less often... which can become a significant drag on your overall culture of shipping new code.

You'll end up wanting to bundle fixes together and deploy them once a twice a week, where the same team with zero-downtime deploys might happily ship new code to production several times an hour.

So I see zero-downtime deploys as a very impactful way to encourage a healthy culture around shipping code often.

They're also one of those things that's relatively easy to engineer into a system early on, but ferociously difficult to add to a system years down the line when it's already serving large amounts of production traffic and has lots of existing code already built for it.


Honestly, I have a slightly different answer than most; because there shouldn't be.

Docker already goes through great pains to achieve much of what it can do, and it would be bizarre to go through the effort to use it and then not take advantage of some of the core things it can enable. Even without Docker, service managers like systemd can be used to implement zero downtime deployments too.

At this point it'd feel weird and broken if a service management tool just had no support for a "gapless" deployment. I'd feel uneasy using it.

The way I see it, what Kubernetes does with health and liveness checks is actually just good hygiene irrespective of how many requests you can tolerate dropping, or if your container even serves requests to begin with. Tools that manage services should strive to provide similar functionality where possible.


Dokku is in use in production by a large number of companies. Those companies might setup Dokku on a single, large host - perhaps collocated somewhere - and not want to have downtime for their product(s). Others host a swarm of Dokku installations and also want to avoid downtime during deployments.

I suspect that "most users" will actually bounce if a site is down for more than a few seconds, or at least become frustrated with the availability of the site they are accessing.

Aside, I'd rather not field support requests from folks about how one of the main features Dokku promotes doesn't work when they configure it one way or another.


I guess if you have great ambitions about "web scale" and client retention, a few seconds downtime on your website might mean a loss on a significant amount of potentially new customer.

As for me, I actually like downtimes. We have data migrations to do anyway on most major updates (they are automatic when the new container boots but the app is unavailable), which can sometimes take hours, and that leaves us notifying our customers for updates and gives us time to do some sanity checks before handing them over control.

Compose is a great tool for single (and probably few) VM(s) deployments. I used to love swarm for slightly bigger ones as well since it's a lot easier to manage than kube and can easily be tailored to ones need, still sad it's being deprecated.


Classic Swarm was deprecated and hardly anyone has been using it.

Swarm mode, which everyone usually is talking about is not being deprecated. Docker 23.0.0 shipped with new features for Docker Swarm.

https://docs.docker.com/engine/release-notes/23.0/


The apps that I build (and use dokku to deploy - thank you josegonzalez, it's a great tool) are critical to their users.

While ten, twenty, thirty seconds isn't much in the grand scheme of things, if it happens just as someone is doing an important update, it reduces trust in the software. If it happens a couple of times, they're going to report back to their boss that the system is unreliable and fails just when they need it most.

And with international users, I can't be sure that I'm picking a time to do a deployment where no-one is going to be using it.


Zero downtime deployments are just a side effect of doing any kind of A/B deployment.

The same features can help you avoid longer downtime. I use dokku at home to run a little web portal. I have a CHECKS file in the repo that has

  WAIT=1
  TIMEOUT=1
  ATTEMPTS=10
  / Portal
  ...
If I try to update to a new version and the site doesn't come up at all (or a GET for / doesn't have "Portal" in the response), the update is aborted and requests never fail.


It depends on the site.

If there are a lot of users and you need to deploy an update during the middle of the day, even a little downtime could be disruptive.

If you have a global audience then maybe there's no safe "after hours" time to deploy.

Zero downtime deployments are fairly critical for a lot of Continuous Delivery / Deployment workflow, so more solutions the merrier as far as I'm concerned.


Why few seconds? Some java applications can easily take minutes to initialize.


But that's a bug that could be fixed.


In the Java world very long start-up times are a feature, not a bug


People might not care, but APIs do.


I'd say it's the other way around. Any sane API client would retry for 503. But people might not.


Docker compose is great, I use it a lot for local development and testing of distributed systems. With a few tweaks you can simulate almost anything in containers including systemd and low-level networking stuff, which e.g. makes simulating an entire Ansible based setup trivial.

Too bad Docker doesn't seem to push this much, with a bit of extra work this probably could be the deployment platform for 95 % of all software systems.


Docker compose is also fine for production environments, if you are okay with a few minutes downtime during updates and your application is just running on one or two machines. Even 90% of the enterprise systems I worked with so far were okay with those restrictions.

Sooo much simpler than k8s.


FWIW ECS supports rolling updated for docker compose deployments: https://docs.docker.com/cloud/ecs-integration/#rolling-updat...


I also really like Docker Compose. Where I think it struggles is when you have to get off the single node and on to network services. Then it starts to have less relevance. I know Swarm was meant to fix that, but I never tried it.

Conceptually, if I could just have an extra component to docker compose, where I install a little agent on to my nodes, I name my nodes (or nodegroups) in my d-c file as to what should go where, and have a master node that I do docker-compose up on, and I can represent an external service (e.g. AWS RDS) as a virtual service with an endpoint to connect to, and it all magically deploys, I think that would be incredibly powerful.

Maybe that was Docker Swarm, of course.


This still is Docker Swarm.

The news about its death are just plain wrong. There are new features in the most recent release of moby/moby and development in general has been picking up again:

https://docs.docker.com/engine/release-notes/23.0/


I don't understand. Is this common usage or some unique setup you've discovered?


I think the parent comment is saying that it's a shame Docker aren't pushing docker-compose more in general, rather than a specific use case. Assuming they are, I have to agree. Docker-compose is excellent for so many use cases where swarm, nomad, or k8s would be too much but straight docker isn't enough. Seeing more investment in compose would be fantastic.


Well, the good news is compose v2 is a huge improvement and they've been releasing at a much higher cadence than v1.


FWIW I have discovered a simple technique that solves this problem without any additional software: I create two or more services that are identical in all but name, so instead of a `backend` service I'll have `backend1`, `backend2` etc.

When I need to restart the service, I do it in a staggered fashion, first stop and restart `backend1`, wait a few seconds, then stop and restart `backend2` etc. I put this in a script, works without any problem.


Call them 'blue' and 'green', and you have blue-green deployments.


Unlike the sibling comments, I would call this a rolling deployment: you are removing an old host and adding a new one at the same time when you restart.

In a blue green deployment I would expect to see you add backend3 and backend4 with the updated code, and then rename services (or redirect traffic in some other way) to them. The difference is, if something goes wrong, backend1 and backend2 are hot standbys and we can revert the change by redirecting traffic back to them.


That's known as a blue-green deployment strategy and has been around since deployments began


Simple and ingenious. I like it!

Do they use separate ports? Is there a load balancer to point to both services?


This one has downtime right ?


It would depend on the program. If you've got a load balancer in front, or they're receiving requests through a database or message queue, then there won't be an outage.

Arguably anything where uptime is important should be setup this way. Upgrades can fail, and that will cause downtime anyway.


If you restart them one at a time, you always have at least one instance running.


Isn't that functionality already buried somewhere between Docker Stack and Docker Swarm?


Yeah, looks like a “poor man” docker swarm. And I would just go with swarm. It’s very good. And very easy to move to from compose. Too bad k8s won the hype battle here.


People get scared away from swarm because they make it sound like you NEED 3 managers minimum. This simply isn't true and it runs great even on a single node. I've moved all of my production and stage environments to be swarm based.

I just wished there was an easier way to have file-shares between nodes. Makes running a reverse proxy like traefik tricky.


You may like the new release of moby 23.0.0 then :)

see https://docs.docker.com/engine/release-notes/23.0/

the community is working on adding Swarm support to CSI vendors:

https://github.com/olljanat/csi-plugins-for-docker-swarm


TBH, to me Swarm always feels like something way more complex than Compose, but also way less flexible than Kubernetes. For most people, I guess they either want simplicity, or a full flexible cluster with a great ecosystem.


yeah, and docker swarm on a single node is not overkill at all. plus, don’t need to install anything. Sure you have just one manager but with one node you don’t need to worry about quorum, if it goes down then that’s that.


any chance you have a link to a good readme on setting up single-node docker swarm?


I love this set of guides: https://dockerswarm.rocks/

Shameless plug: I'm also developing a GUI for Docker Swarm, if you don't feel like fiddling with command line too much: https://lunni.dev/

The getting started guide will walk you through setting up a single-node Docker swarm from a clean Debian / Ubuntu install: https://lunni.dev/docs/install/


> I'm also developing a GUI for Docker Swarm, if you don't feel like fiddling with command line too much: https://lunni.dev/

This seems like a cool idea, I wish you the best of luck!

I will, however, also link Portainer, which I've used for both Docker Swarm and Kubernetes management: https://www.portainer.io/

It does basically everything you might want (it even gives you redeployment webhooks, which will make new containers be pulled, if you enable them), with the caveats that they're focusing a lot on adding paid functionality nowadays and infrequently you might have connectivity issues, which a redeploy of Portainer will fix.

I've also heard of and used Swarmpit a while back: https://swarmpit.io/

When I did use it, however, it did feel a little bit broken in places and the experience wasn't quite as smooth as Portainer was (also, for whatever reason, it seemed to give me back different YAML than the one that I had last deployed, maybe generating it on the fly?). No idea what was up there, but the UI was pleasant regardless.

So I think it'd be pretty cool if someone were to borrow the ideas that work from either those as inspiration for their own tool! :)

On the CLI, there's also ctop, which is nice for inspecting the current containers on a node, even if it doesn't really have much to do with Swarm: https://ctop.sh/


Thank you so much!

Lunni actually uses Portainer as a backend right now. Portainer is a very powerful tool, a Swiss army knife of sorts, but I find the UI a bit complicated. (That's actually one of the reasons I started Lunni!) E. g. to deploy a new stack, you've got to wait for environment list to load, select an environment, go to Stacks, wait for stack list to load and only then click New stack. In Lunni that is one click away from the dashboard.

Swarmpit is also pretty nice, but yeah, broken in a couple ways. The quirk you encountered is actually pretty easy to explain: I think old versions of Swarmpit didn't store the YAML, but reconstructed it from the current stack state. It's actually pretty neat: if you deploy something outside of Portainer, it will complain that it doesn't know this stack and won't let you update it. Swarmpit however would show you what YAML could produce this stack, so you have at least something to work with. I might borrow this idea at some point, too :-)


As a satisfied Swarm user, and with positive signs regarding Docker Swarm's future (repo activity up, Mirantis talking about it and its future), I wondered if there would be an opportunity to propose managed hosting of Swarm deployments. Not having much time to work on it right now, I'm happy to see someone else doing it. Best of luck!


Nice, glad to see innovation in this space.

I built my own GUI along similar lines using ImGui/Wasm for personal use; figured that as K8s/complexity already won the mindshare there wasn't much point in releasing tooling for simple (aka unfashionable) tech.


Thank you!

I think simple tech is underrated nowadays. Kubernetes is nice and cool when you're Google or Amazon or a startup with enough investor cash to buy a whole team of devops engineers. When you're a small team that tries to bootstrap something, or a single human being just want to run a thing or two for yourself – not so much.

Docker Swarm to me is the balance point: it's easy enough to learn to start small, but it's powerful enough to scale if you need it later.


1. Install regular old Docker

2. Type 'docker swarm init'

3. There is no 3, you're literally finished and now have a full-on Swarm node w/ all features.


the docker docs are actually pretty good.

general intro https://docs.docker.com/engine/swarm/

step by step tutorial https://docs.docker.com/engine/swarm/swarm-tutorial/

^ takes you through a 3 server setup and deploying services via the CLI rather than a docker "stack" file, which is basically a compose file with the ability to set additional deployment specific properties.

just set up the one manager and skip anything about additional worker nodes or draining nodes.

for docker stack files -- see the docker compose v3 reference but take heed of any `docker stack deploy` caveats in the compose documentation. anything about docker stack deploy` is "stack file" territory.

https://docs.docker.com/compose/compose-file/compose-file-v3...

compose file v3 is being depreciated this year, with a new specification aiming to unify the two. but it gives you an idea of the historical differences between old `docker-compose up` and `docker stack deploy` (swarm).


Swarm has some limitations, I outlined them here: https://www.reddit.com/r/docker/comments/10w7pfs/comment/j7l...

I’ve seen docker compose deployments where the only missing part was the ability to replace the container without downtime. Swarm would also solve this problem, but it won’t always be as simple as redeploying the stack in swarm mode, especially if you rely on `docker-compose run` in your deployment process.


for the case of docker compose run I do this: have a service that runs, does something, then terminates and isn't restarted (see docs). We do migrations and other one off things with that.



> Whenever you read here "Docker Swarm" we are actually talking about "Docker Swarm mode".

> Not the deprecated product called "Docker Swarm".

"Hey, Bob, what can we do to confuse people?" - "I don't know... hey, did you hear about that Perl 6 thing? That worked pretty well!"

I like it though, looks much more hands-on than the documentation which often feels just short of being useful.


They need a rebrand... maybe "Docker Flock". Users will flock to it and leave Nomad, k8s etc behind.


That's a pretty cool name, I like it!


Is docker swarm alive and well? I always get the feeling it is something abandoned. But I love the simplicity.


Docker Swarm is getting some more attention after having been neglected, with repo activity going up[1], and Mirantis even publishing a sponsored article on The Next Web[2]. Things look brighter for Swarm than one year ago.

Interest in Docker Swarm also persist I think. My post on starting Docker Swarm in 2022 [3] on my very low profile blog gets very regular visits.

[1]: https://github.com/moby/swarmkit/commits/master

[2]: https://thenewstack.io/docker-swarm-a-user-friendly-alternat...

[3]: https://www.yvesdennels.com/posts/docker-swarm-in-2022/


> Using container orchestration tools like Kubernetes or Nomad is usually an overkill for projects that will do fine with a single-server Docker Compose setup.

Couldn't agree more! Very nice!


I did something similar using git hooks (for Heroku-like deployments) and curl for health-checking the app [0]. In my case, instead of using replicas, I created multiple services in the docker-compose.yaml file. The services are the same but with different names (e.g: app1 and app2). Then, during deployment I can update only app1, run the health checks, then update app2 if everything is ok.

[0]: https://ricardoanderegg.com/posts/git-push-deployments-docke...


Awesome! I might use this.

I think this approach should work fine with https://github.com/lucaslorentz/caddy-docker-proxy as well.


AFAIK nginx-proxy does not enable true zero-downtime. Thus using this tool with nginx-proxy does not enable zero-downtime.

Example.

Deployment goes like this:

1) 2 app versions are deployed - blue (vCurrent) and green (vNext); both are up and ready to handle connections

2) We are about to shut down blue and replace it with green

3) Blue is handling long-running http request; receives shutdown signal; keeps handling long-running request, but does not accept new connections from now on

4) because it is still active (nginx-proxy wont remove it), blue will still receive connections from nginx-proxy, and all of them will fail because of reasons stated in (3) Connections routed to green will succeed; This is where zero-downtime fails.

5) Once long running request is handled, and blue removed from docker, only then nginx-proxy refreshes its configuration and routes all traffic to green.

If i am correct, this tool solves issue where non-working container are configured to receive traffic by nginx-proxy, but does not solve issue where partially shutdown old container still receices traffic and drops it.

I dont know about caddy-docker-proxy.


Awesome, I was looking for something just like this the other day. I’ll be checking this out!


This is great to see - fills the biggest hole in Docker Compose.

I built something similar, it was a bit fragile though. What I ended up using instead was a lot simpler with only a little downside: Caddy for a reverse proxy will buffer requests when the downstream is down, unlike Nginx (which we used before). So during a deploy nobody gets any errors, just a brief 2s delay while the new service boots up. Seamless enough for me.


I just did a quick search for how to achieve a basic no-downtime deploy with Podman. It turns out it is not that hard [0]. Not sure how to do this if pods or containers are managed as systemd services, though.

[0] https://github.com/evolutics/zero-downtime-deployments-with-...


Docker has some serious flaws with respect to app health and doing something about it when it's unhealthy.

Just look at this PR that is 3years old and counting: https://github.com/moby/moby/issues/28400


As you can see Swarm is not a solution to having a proper action to a health check event on a container.

If a fly lands on you, we don't need to shoot you and awaken the new clone of you.


What's wrong with Docker swarm mode? It already supports stacks and blue/green deployments.

Doesn't this represent reinventing the wheel?


Is a single node k3s overkill? I don't think so. Once you outgrow it, you can port your deployments to a real k8s cluster.


From what I have ever seen, even k3s has a reasonably heavy memory requirement for cheap vps instances, vs docker-compose. I think they target different markets.

e: I went to re-check, and running the default get.k3s.com install instructions had k3s using 30-35% of 1GB of memory, without anything running. Thats pretty meaningful on low resource machines like a RPI, NAS, $5/vps etc.

Also FWIW, Hashicorp Nomad uses ~3.5% on the same machine (but provides less than k3s, more than compose).


I mean, if you're in the realm of 'no-downtime upgrades' because it will impact your application, maybe spending an extra $5/mo makes sense rather than relying on bespoke tooling.

There's also k0s, which aims to be even lighter than k3s.


Woah, alright! This is what I've been waiting for. Thanks for building this... I can't wait to test it out!


If you are only running on a single node, it doesn't sound like downtime is a big concern regardless?


There's a difference between a hardware or other catastrophic failure giving you serious downtime and wanting to deploy a new version several times per week.


Definitely true, but having zero redundancy doesn't seem like uptime is a high concern.


There's other automation like running the singular machine in a scaling group with health checks. I think on all the major cloud providers scaling sets are free so it offers a way to run automatically healing singular machines.


>> as it's not possible to run multiple containers with the same name or port mapping.

I think you can :)


elaborate


I'm not sure what parent means but there's SO_REUSEPORT. I don't know if you can set that with Docker. There's also other tricks like not forwarding ports with -p and manually adding a firewall rule to rewrite or forward to the correct socket (you can still access containers using their container IP)

Not sure what is meant by service names


Interestingly, it looks like DHH is working on something similar (for Rails): https://github.com/mrsked/mrsk

> MRSK deploys web apps in containers to servers running Docker with zero downtime. It uses the dynamic reverse-proxy Traefik to hold requests while the new application container is started and the old one is stopped. It works seamlessly across multiple hosts, using SSHKit to execute commands.


thanks , have been looking a simple solution for this


so a blue-green deployment for docker-compose, neat


Urgh




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: