Hacker News new | past | comments | ask | show | jobs | submit login
Run More Stuff in Docker (bergknoff.com)
199 points by psxuaw on Dec 26, 2020 | hide | past | favorite | 286 comments



Let's not. I don't want to install Chrome which is already 73 MB, now bloated up with a whole lotta bullshit that's 500 MB+ image. Imagine downloading every application as a docker container. WTF.

Docker is for distribution of applications when deploying them to servers. As a developer, it's amazing at that and have brought peace and joy in devops. Let's leave it there, shall we?


My disdain for docker is not so much disk usage (I consider that battle lost in modern Software Engineering) but all the other crud that it does - such as re-implement an entire network device and routing and all. I’ve seen this cause teams countless hours of downtime with things like IP address collisions with VPCs etc, and I’ve never really been given a satisfactory answer as to why this is useful. So now we’ve solved dependency hell and have to faff around with networking and all. Great.

Also the engine runs as root and takes commands from normal users which I always thought was a no-no but I guess docker is ‘special’ - and should be able to do whatever it likes. Containerisation is a fine idea and concept - but I think there are still big caveats.

I’ve been using podman a lot and I hope it becomes more commonplace. I hear that there’s a lot of SV politics and drama going around surrounding the various companies and backers which I have exactly 0 interest in though...


Totally agree that Docker is made for deploying to servers. But the disk space critique doesn’t hit for me. Even very nice SSD’s are cheap enough that 500 MB is negligible. My internet connection also makes downloading a large docker image no bigger of a deal than downloading Chrome, YMMV.

I think the necessity of a VM when using Docker on Mac and Windows is the primary reason that running your “normal” apps in a container isn’t the right move.


The problem isn't 500 MB disk footprint, it's all the RAM memory going to waste when loading in redundant libraries. Chrome already is a memory hog on its own, imagine all applications suddenly bringing in their versions of their libraries.


Also the network requirements to install and update such large containers is a problem, a lot of people don't appreciate how slow the internet is in most of the world compared to those with access to any kind of fiber end point.. this extends to similar concepts like snap packages.


RAM is the same situation as disk for me — I have plenty to waste.

Again, not advocating Chrome in a container, I don’t even run Chrome outside of a container. I just think it’s odd to get hung up on these sorts of resource requirements given the state of computing.


You may have plenty of resources but there are many that have to "make due" with 8 GB, 4 GB or even less RAM.


Totally fair, but I think the Venn diagram of folks that use Docker and folks that have 4GB of RAM is pretty slim. If the resources are that limited, Docker might not be the best choice for running anything.

Related: Chrome on 4GB of RAM sounds painful. Thoughts and prayers to those folks.


> Even very nice SSD’s are cheap enough that 500 MB is negligible.

However a nice SSD with a respectable TBW value is not still cheap. a 860 Pro is almost twice the cost of a 860 Evo. Pro provides twice the TBW value.

> My internet connection also makes downloading a large docker image no bigger of a deal than downloading Chrome, YMMV.

Not everyone of us has pipes that fat to our homes which provide sub 10ms pings and almost LAN-speed access to rest of the world. My office workstation's network is limited by my network card but, my home has a much slower connection.

I wish that internet on this planet to be a full-fat-tree network but, we're not there yet.


> However a nice SSD with a respectable TBW value is not still cheap. a 860 Pro is almost twice the cost of a 860 Evo. Pro provides twice the TBW value.

TBW is almost never a concern for desktop users. The Evo has 600 TBW endurance per TB of storage, that would be a full disk rewrite every day for two years.

You will never download enough Docker images for personal use to burn out your 860 Evo before you would have replaced it anyway. (For those unfamiliar with Docker, spinning up a 500MB image ten times doesn't write 5GB to disk!)


> TBW is almost never a concern for desktop users.

You're right however, most of the people who'll use this kind of setup is not ordinary desktop users.

My desktop has 4 disks (2 SSDs and 2 HDDs). My write rate for "Home" SSD is 3TB/yr. To keep the value low, I've moved VMs, big downloads and other stuff to one of the HDDs. System is on another SSD and it's cumulative write was about 3TB in 8 years but, I moved logs and high-write portions to another HDD to keep that value low.

3-4TB / year on the other hand is pretty in line with a Windows 10 installation's behavior when used by a normal desktop user, as intended.

>You will never download enough Docker images for personal use to burn out your 860 Evo before you would have replaced it anyway.

Considering other stuff I do, I could easily double or triple the amount of writes in my Home SSD, but VMs and other stuff already can sit on the RAM once running so there's no speed problem.

While write amplification is not a big concern anymore, building software and other small file operations can accumulate fast, so I still can't trust a SSD blindly.

On the replacing of drives, while a dd or rsync is pretty straightforward for a seasoned Linux user, I don't prefer to change hardware just for the sake of it, or abuse it because it's cheap and can be replaced on a whim. At the end of the day, I'd rather use my system efficiently rather than recklessly both in terms of resources and endurance, because being able to rely on your system is underrated imho.

Because of my Job, we torture systems up to and beyond their design limits and a little optimization can go a long way in these scenarios. I like to apply that knowledge to my systems to extend their useful life.


So, are these the numbers?

TBW is 600 TB, but you do 3TB/year across four disks, and I'll be generous and say you do .75/SSD/year.

so in 750 years you'll hit the rated TBW?

If you did a full 3TB/year on the one SSD, you hit it in 200 years?


No, 600TBW is for a 1TB Evo. For a Pro, it would be 1.200TBW.

Unfortunately, neither of my disks are that big. Home is on a 256GB SSD which boils down to 300TBW. System SSD is a bit older 120GB OCz Vertex 3. This model doesn't have a TBW rating.

As a result, if I don't do anything heavy, it'd last for a century in the best case. I'm not sure about OCz though. It reports 100% life remaining but it's from the skunkworks era of the SSDs so, I can't be sure for anything.

The numbers climb very fast when you start to develop stuff and enter compile -> test -> debug cycle. So, I'd rather have that endurance and use it while developing software rather than eating it while doing daily stuff.


You mean that era right after when Intel released the first consumer SSD, and suddenly everyone was rebadging flash with who-knows-what controllers?

Anandtech was the best site for that era with their testing.


Almost, but not quite. The Vertex 3 was one of the best SSDs at that time. Had one of the better Sandforce controllers with proper MLC flash but, had to write compressible data to obtain these speeds.

It has no temperature sensor so it cannot compensate for temperature or understand its environment. It doesn't have a TBW rating (IIRC) and reports writes and reads in GiB. So it's somewhat limited when compared to today's drives.

OTOH, it's pretty dependable and stable so far.

[0]: https://www.anandtech.com/show/4256/the-ocz-vertex-3-review-...


In many cases you don't need 500mb of base image.

Minimal "distrofull" images are in the 50mb range. But often you don't really need a real distribution inside your docker image.

See https://github.com/GoogleContainerTools/distroless. But distroless is not about one specific tool or base image, it's a paradigm that addresses precisely what you say (without throwing away the whale with the bathwater)


Docker has a layer file system. Meaning if you do it right, that Chrome container will share the same 500 MB base image layer with the Gimp container, or whatever, making it less bloated than it appears when looking only at the footprint of the first image.

I'm not stating that I believe it is a good idea to run desktop apps in Docker containers. It is not a good idea. But it is also not true that if someone would do it, it would necessary lead to a bloated filesystem.

Docker will use a lot of disk space when using multiple base images.


> that Chrome container will share the same 500 MB base image layer with the Gimp container, or whatever

Only if they’ve all chosen the same base image.

And if application developers could all agree on a fixed base image with fixed versions of dependencies, that’d be a Linux distro and we wouldn’t need docker to begin with :)


I fully agree with the original premise (to run more stuff in Docker). But if I maintain two tools, both using Docker, let's say they're both built in Python, they'll both be in their own Git repository, and their build pipelines will come up with independent Docker images for whomever wants to use those tools, and it seems like frivolous maintenance, if not an anti-pattern, to make sure that these tools are always using the same base image. And yes, if bandwidth and disk space were constraint for me, I'd probably reconsider (even on the original premise), but they are not.

By the way, while I think you should run more stuff in Docker, I also think it is still totally reasonable to have your "main language stuff" installed "normally" (not in Docker). If I'm a Java developer, I'm "happy" to deal with Java's dependency b.s. on my normal system. Meanwhile, if I ever have to touch Ruby, I do not want to deal with Ruby's dependency b.s. on my normal system (rather run that in Docker). And vice versa, as I'm sure.


As a Ruby, Node or Python language user, you may want to try rbenv, nodenv or pyenv respectively.


Or the ASDF version manager that covers all of them, and much more.


Both of your Python projects will likely have `python:latest` (or such) as their base image. Sure, they will diverge after that (with further layers for dependencies and your code), but at least the base system will be shared.


Even if they used python:latest (as if that is the only flavor), they're only "latest" when I do my build. So, now I have to coordinate builds and releases. Unlikely to happen, and, again, an anti-pattern.


> If I'm a Java developer, I'm "happy"

Oh yes, it would make me "happy", too.


We all know that his doesn't work in practice. Somebody will "urgently" need another dependency and build a new base image with this dependency inside. Or a security patch comes out and half of the developers update while the rest doesn't. Do this 4 or 5 times and you'll have the exact same fragmentation.


That is if you are using one base base image. Most likely you will be using Alpine, Ubuntu, Debian, Centos, Red hat, Oracle Linux and who knows what else in different containers, unless of course you have the patience to repackage all then versions you need for your favorite base image.


> Imagine downloading every application as a docker container. WTF.

Help me understand the WTF here, and also how that’s meaningfully different from an OS X app?


A MacOS App is just what you need to run the app on MacOS. A Docker Image is essentially a mini operating system in a can. Most Docker images contain shells, the entire Python install, an init sequence... piles and piles of redundant stuff.

If you are running a Docker image on MacOS or Windows, you first have to start docker which is itself a Linux Virtual machine.

Docker is a great dev tool, but if you don't need it, it's a ton of bloat.


I think you are responding too literally to his comment, which is spot on.

A macos app is running in a sandbox and runs in a conceptually similar way to docker.

Go look in ~/Library/Containers

also look at the filesystem under <appname>.app


> I think you are responding too literally to his comment, which is spot on.

Should I respond metaphorically?

> A macos app is running in a sandbox and runs in a conceptually similar way to docker.

A Mac App fundamentally has access to system libraries and leverages those. A Docker container is designed to ignore the system and builds its own environment.

If you run a Docker container on a Mac or Windows, you are now running 3 operating systems. The host OS, the VM, and the Docker image.

This is not the same as a Mac App. Literally, figuratively, or hypothetically.


Except Mac apps don’t each ship a libSystem.


Because unlike Linux they don't have to, since MacOS actually defines the concept of a stable base system that can be targeted by applications. Docker exists in part because the Linux world has no such concept.


Thats the point here. The two aren't similar at all.

MacOS apps contain all the app. Mac Apps leverage all of the OS functionality they can. They are strongly tied to MacOS and rely on it for most functionality. When I upgraded to Big Sur, the Mac Apps that run on my computer adapt to the changed OS libraries and often present differently.

Docker apps contain their own complete environment. They are deliberately engineered to disassociate from the base OS.

Electron is a sort of middle ground largely ignoring many system libraries, but using others.

The only thing Docker has in Common with Mac Apps is the fact that they keep associated files bundled together.


Jumping from 73 MB -----> 500 MB or whatever.

Have you been running Docker on your system? Have you tried running: `docker system prune -a`? It's gonna print something like Total reclaimed space: 31.2GB.


Its worse than just the size jump. Every application would also need to run all its code inside a linux VM. So that means:

- Dedicated RAM for all your docker apps (which you have to partition manually)

- Slow startup time for the first docker app you run each time you reboot, as it boots the linux VM.

- All syscalls run through the VM's emulation layer, which is way slower than native

- No access to the system's native UI toolkits. (Docker apps can't dynamically link to cocoa for obvious reasons.) We can probably work around this by wasting a huge amount of developer time making a networked bridge or something silly but ??

- No easy way to load and save files (the VM doesn't have filesystem access)

- All the other downsides of docker - like it gobbling up all your disk space with images that are no longer used

- Lower battery life, because the host can't easily sleep the VM's scheduler. The CPU can't enter low power state when nothing's running.

Please don't do this. We already have a special kind of container for "self contained program on disk". Its called a statically linked executable. They work great. They're fast, small, have access to everything the host system provides. You can SHA them if you want and you can easily host them yourself using a static web server. My computer's responsiveness is more important to me than your shiny docker shaped toy.


> Slow startup time for the first docker app you run each time you reboot, as it boots the linux VM.

This, and the other VM-related criticism, is true, but only on Mac & Windows. Running a Docker container on a native Linux system carries very little overhead.


Yep. My comment was mostly made in the context of running macos desktop apps in docker. Windows would be largely the same, though the existence of WSL might make some things easier.

Linux desktop apps running in docker would work better because there's no VM in between the native environment and the application. You'd still need to forward GUI calls across the LXD container boundary. That used to be easy with X11 forwarding, but AFAIK wayland removed that feature. I don't know enough to guess how difficult that would be to implement.

But, if you solved that it should be mostly smooth sailing. Linux desktop apps distributed via docker would just be the same apps, but unnecessarily bigger, with probably less access to the filesystem. They would be harder to patch for security errata. And they would leave extra junk in your docker images cache. Docker's sandboxing would be nice though.


WSL2 mostly solves the problem of manual memory partitioning.


But the 73m -> 500m is still true, right?


Depends entirely on how the image is built. You can have a Docker image that contains nothing but the application binaries, but then the question is why use Docker at all.


For all the other reasons the author notes: the main one being that the application running in the docker image has no access to the host system other than what the user explicitly gives it. It's a very minimal sandbox and often all the application needs.

You can't really reproduce that with any popular desktop operating system. Even if you could the interesting thing about docker is that starts with a default deny environment built on the principle of least privilege.


I'm not an expert in this area, but I've seen plenty of accounts of how Docker can be very insecure. Perhaps it's possible to configure Docker so that it is very secure, but even Google has had people break out of their containers, so these claims about container security should probably come with a disclaimer: "Docker is very secure as long as you are one of the top 0.1% in the field and never mess up". VMs seem to still be the proper tool for securely isolating processes, and should perhaps be the recommendation for the other 99.9%.

A few comments on how insecure the typical Docker recommendations are:

> Sandboxed - security claims about Docker have always been controversial. Simple Unix/BSD constructs like chroot/jails are far simpler and they are reliable

https://news.ycombinator.com/item?id=25547629

> However, I don’t agree with using it for sandboxing for security. Especially if you are giving it access to the X11 socket, as the author does in the examples. It might not be obvious, but the app will have access to all your other open windows.

https://news.ycombinator.com/item?id=25547551

> Fun fact, most docker hosts will allow access to all your files anyway! (specially true on docker for mac, which all the cool kids(tm) here are using). Even if you restrict container host-FS access to a source repo dir, mind rogue code changing your .git hook scripts in there or you might run code outside of the container when committing ;) > Another slightly relevant fun fact, USB is a bus. That means that any device can listen in on any other device. And USB access is given by default to some X-enabled docker (--tty something), and to most virtualbox machines (including the hidden one running the fake docker linux host on docker-for-mac), and more recently Google-Chrome. ;)

https://news.ycombinator.com/item?id=25547833


Podman and LXD have far more secure defaults. It's a terrifying to see that Docker is still around. jail(8) from FreeBSD seems to be the best (from security standpoint), it's also the first container system.


I think the experts in docker would note that sandboxing and privilege isolation are technical features which have many uses, but that is not at all the same as making broad-based claims about the security of docker or the virtual machine and host operating system technology upon which it runs.


It's obvious that any system with faulty sandboxing or privilege isolation is quite likely to be insecure. If the default "sandboxing" of Docker allows access to nearly everything of importance on the host, then we can indeed make "broad-based claims" about the insecurity of the system as a whole.


> You can't really reproduce that with any popular desktop operating system.

If you’re running Docker, you’re running Linux, and you can use the same kernel features as Docker to sandbox applications yourself. This way, you only incur the overhead of a separate network namespace and filesystem if you actually need it. Services can be sandboxed with a few lines of configuration in a drop-in unit, and applications can be run with a shell script that calls `systemd-run`, just like the `docker run` shell scripts suggested in the article.


Do you know of a tool that can do this for arbitrary programs? Maybe something like CARE [0] to find what the program should be accessing, then building the appropriate systemd configuration files?

[0]: https://proot-me.github.io/care/


> If you’re running Docker, you’re running Linux

As slow as it is some people apparently enjoy running docker from win / mac


Docker needs Linux to work. On Mac/Windows containers run in a Linux virtual machine.


I thought so too, but apparently support for native Windows containers has existed for a while now. e.g. https://poweruser.blog/lightweight-windows-containers-using-...


Neat! I had no idea this was a thing, thanks for sharing.


I wouldn't call "running an app in a linux VM on windows", "running linux". As a user, your main interface with your computer is windows, which is what "i'm running windows" means, and obviously "you can use the same kernel features as Docker to sandbox applications yourself" wouldn't work, e.g. you couldn't sandbox photoshop like that.


Isolation and reproducibility


At the end of the day, all the docker images depend on a lot of ./configure && make install, if not a lot of apt-get installs, which are often not reproducible. You don't get reproducibility unless you've done it all the way down the chain.


Not the most useful for software intended to be used on your local machine. You don’t get the isolation benefits because the whole point is to give the container broad privilege so you can use it as if it was a native utility. Docker also doesn’t make a non-reproducible build process reproducible.

At the end of the day if you have a precompiled self-contained artifact Docker is just a convoluted way of calling exec.


I am so glad I can read something like this in public.

When I say stuff like this in job interviews, I get eye rolls and don't end up getting a job.

The industry is so far up its ass lately, no one even dare stating the facts in public. /rant


Now talk about deploying something NOT using Kubernetes. The double whammy. Now you're looking like an insane 100 year old graybeard fossil.


I had a conversation a couple months ago where some guy was telling me that I should run my personal website using Kubernetes. It’s a static website served with Nginx! I just about lost my mind trying to talk to this guy, and he kept trying to convince me to try Kubernetes for my personal website.


Why even run nginx? My personal website is served out of a private S3 bucket, with CloudFront sitting in front of it.

As a former sysadmin, the less actual admin I need to do the better.


You could have other, self-hosted services there. That's my setup: I have self hosted trilium and used to self host matrix. The personal nginx served website was just a cherry on top of the cake.


> I have self hosted trilium and used to self host matrix.

Wait, you’re self hosting multiple apps on one system, and not using K8?!? Your apps can’t autoscale or do blue/green upgrades or any of the cool stuff you are now mandated to do by the cargo cult. How do you sleep at night??

/s


>or any of the cool stuff you are now mandated to do by the cargo cult.

This is ultimately what drives me insane about the tech sector: so many choices are trend based and cult like when any sort of technical discipline shouldn't be.


S3 has no https and unsatisfactory performance.

My website used to be in an S3 bucket, I was unhappy with it.


If you run it behind CloudFront it does support HTTPS.


Anything “supports HTTPS” if you count putting an HTTPS proxy in front of it. In other words, S3 does not support HTTPS, but you can work around that limitation by combining it with other services.

My personal website is low-traffic, it’s not going to be very hot in any CDN caches. Based on my experiences with S3 performance, adding another layer of cache misses in front of it is probably just going to slow things down.


Fair enough.

The suggestion was more for the HTTPS part and not the performance part of your complaint. It's simple enough to set up, pay-as-you-go, only requires one vendor, and should be very scalable.

I'd just as soon throw up nginx on a Linode, myself.


Or overuse of microservices


^ this. Proliferation of many microservices and repos is a trend I’m trying (and failing) to buck at my current job.

The next time I have to copy/paste the same utility to a new microservice because [insert corporate reason] I’m going to scream.


Fossil here, happily using VMs.


At this rate we should make a fossil of deployments moto, where we claim there is nothing wrong with makefiles and bash scripts :)

like http://programming-motherfucker.com but against triple-buffered-virtualization.


I guess that makes me fossilizing, because this is the direction I think we should really be heading.

When I stare at docker long enough I wind up at "why couldn't this be a static binary" or "this would be easier to secure if it was it's own VM".


Many people are basically misusing Docker to work around Python and Ruby dependency problems.


^ this is a very good point. Someone else's mess, let me put it all together, in a nice pot plant.


9/10 times I see the problems getting solved in the same way you can do without docker, except now the application is also running as root and several more containers are added to work around other issues created by running in docker :(


A Docker container is a lot simpler to maintain than a VM, for just about any task you care to mention.

A static binary is much better than a Docker container, I agree. Unfortunately a lot of useful apps can't or won't ship as a static binary (Java, Python, PHP apps), and Docker makes them behave much more like a static binary.


Java can certainly be shipped as single static binary, AOT compilers exist since around 2000.

Same applies to Python bundlers like py2exe.


Just want to call out: this is on Windows and macOS. So, these remarks are certainly true for probably a majority of Hacker News readers, but if you were on Linux these limitations do not exist. I'm also not sure how much these limitations hinder you in reality, I hear few complaints from the macOS users around me, and we use Docker a lot.


> Its called a statically linked executable. They work great.

Doesn't this make OpenGL, Vulkan, and parts of glibc break?


Yes. In the case of glibc, it's long past time that flaw was corrected by moving NSS modules out of process and using a pipe or socket to run queries. IIRC OpenBSD already does this. They already cause problems with pulling in their transitive dependencies into your process' address space and potentially causing nasty conflicts. Same goes for PAM modules.


That's absolutely terrifying. Thanks for the insight.


wtf are you doing to create such large Docker containers? Are you installing each app on top of a fully-fledged Ubuntu? Are you leaving the build tools in the final container?


I mean it could just be very many containers that add up. And in many cases you'll have loads of images close to 1GB unless you go to great lengths to try and shrink them (e.g. anything that uses Python).


I can get an Alpine Linux container running Perl with a chunk of CPAN on it to 60MB by just stripping the build tools once I've installed everything I want. Is Python really a GB bigger, or are people not taking the most basic of steps?


Basic steps. Welcome to the long September of Docker. Whereas most folks who use it regularly know basic maintenance patterns, new users (myself included) must wade through inscrutable documentation or (worse) poorly written blogfarm posts in order to bootstrap up to a level of proficiency that passes interview smell tests.


Do you have any documentation (or blogs) on a higher level that you would recommend? I too have been using Docker for a while, and recently also for certain programs on my desktop (mainly firefox, github cli, azure cli, aws cli, terraform).

I like to think I am somewhat proficient in using Docker correctly by now, but I am still discovering new tidbits of practical knowledge, tips and good practices every few weeks. And I always want more. :)


Itamar Turner-Trauring's blog has a number of good articles on Docker best practices:

https://pythonspeed.com/docker/


I do not. I'm a new user who has found resource identification difficult.


So maybe I was quick to comment, but I glanced at the images I have sitting around and python:3.7 is 876MB. However, python:3.7-slim is 112MB (still nearly twice the size of the image you describe, though). But it's conceivable that many people running into limitations with the slim image would just opt for the full Python image instead.

Maybe a better example is the Tensorflow image, which is close to 4GB for the GPU version. It's also the kind of thing that would be a pain to rebuild from scratch if you wanted to find a way to save space.


Do you also understand that once you have any image built on top of python:3.7 then that 876MB doesn't need to be downloaded again for subsequent images?

> the Tensorflow image is really big

This is kind of meaningless without comparing it to how large Tensorflow is with related dependencies if you just install it centrally?


Yes, I understand that Docker caches layers. Like I said, if docker system prune is clearing a huge amount of space, it's like a large number of different images and containers that might not share layers. If you're build, running, starting, and stopping containers all day, you might not be terribly careful about exactly what you're doing each time. That's why system prune is so useful.


That's what I am trying to ask myself too! I am sorry that I am a developer and I need to run a bunch of docker compose files.


If you're building docker images FROM SCRATCH then there's no improvement over a zip file or statically-linked single binary (which is what packages and installers already use).

Meanwhile Docker adds many more restrictions and limitations and is not a good fit for consumer-focused interactive GUI applications which are not the web and console apps that work best with Docker.


One improvement is the container is sandboxed and dependencies to the host must be added explicitly, eg --volume, --env, --net.

Sure you could do the same with chroot manually and there are many other tools that does the same thing, but somehow the docker way of representing these concepts are more graspable for average user, and much more widespread use.


OS X apps share frameworks in the user space, while docker images don’t...and docker images have all the constraints everyone hates about sandbox, but only make it worse. E.g. the file open dialog in macOS will grant apps access to the file when the user chooses the file. In docker, I have to manually add a path whenever I want tools to have access.


That doesn’t match my mental model of how that works. Can you give me an example of a shared 3rd party framework that a Mac app bundle will find on the file system rather than just include in the bundle?


Sharing dynamically linked 3rd party frameworks on macos is relatively rare. But all the system built-in frameworks are shared via dynamic linking in userspace. (Eg Cocoa, UIKit, Foundation, etc). Native macos apps feel native because they dynamically link to the system's standard set of UI libraries. (And more recently font sets like SF Symbols.)

For many reasons, none of that is available via docker.

The GP is also referencing the fact that even if you somehow had access to cocoa from your linux binary inside your docker VM, the docker image doesn't / shouldn't have access to your host (macos's) filesystem. So the save / load dialog wouldn't work properly anyway.


> none of that is available via docker

I'd expect someone building many tools like the author of the original piece is using the same shared image as the base for most of them?


The base shared image can't embed macos's system & UI libraries because:

- They're proprietary. Copying them from macos onto docker hub violates the macos software license.

- They change with each version of MacOS. You can't mix & match them.

I guess you could mount them into docker's filesystem, but then whats the point of using docker? And even if you did that:

- They're macos mach executable files. Linux (and therefore docker) doesn't know how to run mach executable files.

- Even if you could somehow embed them and get them to run, the libraries wouldn't work because they expect to be making syscalls to Darwin. They can't do that from inside a linux virtual machine.

You could probably make a weird RPC proxy involving a native macos process receiving network commands. But it should take a herculean effort to make it work at all, and even if you got it working it would probably be buggy (since everything would suddenly become async) and slow.


It’s rare, but if apps did want to do this the would typically place things at /Library/Frameworks for this purpose.


Loading a 1GB image to run a 50MB executable can be trouble. I used to work while commuting on a train on a daily basis. Unexpectedly needing a big download could ruin my morning.


Why are you running such huge images for a 50MB executable?


We expect every app developer to suddenly become an expert in efficiently producing docker layers and not just lazily use an Ubuntu desktop image?


I expect anyone distributing their app via Docker to learn how to use their tooling properly, yes?


Agreed, there are many things to get right about using Docker. But if you're just deploying an app to a somewhere, does it really matter if the image is a bit bloated (within reason)?


Probably because container authors learned from Docker's getting started video, which demonstrates a 1GB "hello world" image.


I disagree.

I think the beauty of docker - and this should spread elsewhere - is that everything starts with one text file.

The crufty part is all the crap that has to be added to the docker run commandline.


> one text file.

That! It is codified. If people had their main machine codified, Docker would maybe be less of a benefit. And I'm sure a lot of people here have that. But a lot of my colleagues don't. So, I give them a Docker image... instead of explaining the same Java developer for the umptied time how to make a virtual environment for my Python program.


Most package manager formats are simple text files as well.

Also, docker doesn’t solve the DLL hell problem, just pushes it out of sight. The nix package manager is something that actually solves it and it should be promoted and leave docker to things it is good at - containerization.


For end user devices, I much prefer Nix/NixOS [1] for this kind of thing.

With Flakes [2] (experimental feature), you get full reproducibility.

The documentation is spotty and there is a considerable learning curve, but I've switched to NixOS on my laptop and desktop early this year and am mostly very happy with it.

That doesn't cover sandboxing though. I would actually agree that sandboxing / restricting applications (Mac OS style) would be sorely needed on Linux desktop.

Flatpak and Snap are trying to do this. A much saner approach than trying to find or maintain up to date, trustworthy Docker files for applications.

[1] https://nixos.org/

[2] https://nixos.wiki/wiki/Flakes


Absolutely. Most Dockerfiles I come across are very much not written in a robustly reproducible way. The stacking filesystems model is really not very flexible (if you want to compose two docker images with dissimilar bases to be used together you have to resort to manually specifying which files to copy across from each base). Use of OS features to achieve its ends means the docker daemon itself requires far more privileges than I'm comfortable with and makes weirdness like docker-in-docker necessary...


Yeah, any Docker file that starts with `apt update; apt dist-upgrade` is an instant fail.

You might be interested in this fascinating exploration of attempting to bend Docker into better caching and composition by injecting blocks of Nix packages as individual layers:

https://grahamc.com/blog/nix-and-layered-docker-images


I think the base images that are commonly used are just as bad; likely some random snapshot of upstream repositories at arbitrary times. You can refer to the sha256 of base images to avoid things changing, but I've never seen anyone do that. (The advice is "never use 'latest' because that could change out from under you", but other tags are just as mutable, and so that isn't real advice. You can follow the advice and have something just as bad happen.)

The thing the upstream Linux distributions are missing are a lockfile with the hashes of installed packages. Programming languages figured this out (go.sum, package-lock.json, etc.) but distributions have not. Thus, people are often running "whatever" in production, because they simply don't have the ability to lock dependencies properly.

I assume Nix solves this problem, and people should pay attention to how important it actually is.


In production, the companies I work with certainly do rely on SHA256 checksums to specify the base images.


Do they just build their own images for open source things? I have never seen a dockerfile that does that.


Yep.


Honestly given how little reproducibility there is in like 80% of the Dockerfiles I've seen in the wild it astounds me how this became a solution for packaging at all.

Are solutions like source2image, jib, or the nix dockerTools.buildImage method gaining steam?


I've examined this a number of times over the years for deploying a ROS (Robot Operating System) workspace to robots and laptops. Some of our competitors use snaps or various container technologies for this, but I've always felt that the isolation/sandboxing would be more of a barrier than a help— it's one more layer of udev rules and other indirection to have to punch through with another set of config files (and to no benefit as we're not trying to run multiple independent "apps" or anything).

So currently at my org we use basically the same scheme I advocated for at a developer conference in 2016 [1]: we just build it all into a gigantic bundled debian package, and deploy that to the host system with apt.

I keep revisiting these various technologies and all of them seem less mature than apt/dpkg, and mostly in service of features and capabilities which don't apply to my particular needs. Obviously I'm a bit of a niche case, and my needs aren't everyone's, but of all of them, nix is the one that seems to be doing the most that is truly interesting and different; being able to send a new nightly build out to users with requiring them to re-downloading an entirely new asset would be a major win.

[1]: https://vimeo.com/187705228


I also suspect OS package managers are underused when it comes to deployments. I am guilty of that too, some irrational wariness of end-user tools being fit to deal with developer workflows.

That aside, there is one slight possible pitfall in discarding one tech because it is less mature than the other: if we assume maturity only increases with time, the oldest product (e.g. apt/dkpg) will always be “best”. Made a note for myself to prefer “not mature enough for my needs” over “less mature than X”.


I think for apt/dpkg, the system definitely does benefit from decades of careful, thoughtful design— for example you're pretty unlikely to come up with a version constraint between your packages which isn't expressible using the existing Debian scheme. And when I look at various modernish packaging systems, I often see a lot of this thrown out as unneeded complexity, only to be cobbled back in later when it's discovered how essential it really is (see for example Homebrew's long transition from being a dirt simple wrapper around "configure; make; make install" to becoming a full binary package manager).

There's also the ecosystem benefit of having loads of helpers and supplementary tooling applicable to the formats— even stuff like having first-class support in proprietary binary stores like Artifactory, vs Nix where it's basically a shrug and "well... it works with any WebDAV server, so take your pick I guess?"

The main complaint I have overall with Apt is the reliance on postinstall scripts, which means that even if you download and extract your packages in parallelized blocks, you still have a long serialized step when every single package needs to spawn a shell and run arbitrary commands, even if in most cases, the commands actually originate from a semi-declarative format (debhelpers, either invoked explicitly from the rules file, or implicitly by the presence of a corresponding debian/xyz file in the metadata). Anyway, if it were possible to somehow flag packages as atomic or configure-less, it might be possible to significantly speed up these operations, especially in environments like CI where you have everything mirrored in-network or possibly even on-machine so the overall install time is dominated by the package configure step.


apt/.deb unfortunately has a much much higher barrier to entry.

Docker, is simple enough to get even the most inexperienced developer going in a very short time.


I also phased out Docker for local, non-isolated workloads in favour of nix. It's saved me many hours of figuring out how to install software whose official packages don't work out of the box / aren't compatible with my distribution.


It's worth mentioning my pet project, Darch.

https://godarch.com/


Snap provides a strong sandboxing layer, but IIRC Flatpak does not.



Just some counter arguments to @jbergknoff's well put together page!

Docker is the best medium for distributing - A static file is far easier to share / distribute.

Cross-platform - You need an arguably complex and unstable Linux interface to run Docker images, cgroups et al

Sandboxed - security claims about Docker have always been controversial. Simple Unix/BSD constructs like chroot/jails are far simpler and they are reliable

Version pinning - a binary can embed a version and you can stick with it

Reproducible - Everyone gets confused about Docker image checksums. `sha1sum static-binary` is far far simpler.

Minimizes global state - wouldn't be a problem is people built static binaries.


What's the alternative to docker? And by that, I mean a solution that a team can reasonably use across Linux, Windows, and Mac. The simple reality is that, there is Docker and absolutely nothing else that comes close to working everywhere. Yes, it's not perfect but everything else is far less perfect. Static binaries are far too limited; most software requires lots of files spread all around the file system.

I did a project last year at a company that had dockerized their build, CI, and CD infrastructure. They had dozens of git projects with make files that triggered actions using docker. It was great. No need to install anything complicated; just works everywhere with just a minimum of scripts installed from a single internal repository. They did some nice hacks to work around some of the things mentioned in the article. Including using virtual box on macs to work around the filesystem limitations. This really becomes a show stopper for large complicated builds that are very io intensive.


> What's the alternative to docker? And by that, I mean a solution that a team can reasonably use across Linux, Windows, and Mac.

Virtual Machines. VirtualBox in combination with vagrant works reasonably well cross-platform.


Doesn't solve the same problem. You can use that to run docker of course.


Care to elaborate on your point? How is that not the same problem?


What problem does Docker solve that using a VM cannot solve?


The infrastructure for creating and sharing e.g. virtual box images does not exist. With docker I can run just about any server product you can name with a single command line invocation. Those commands don't exist for vmware, virtual box or other common virtual machine. That infrastructure simply does not exist.


> The infrastructure for creating and sharing e.g. virtual box images does not exist.

vagrant [1] provides this feature based on my understanding.

[1] https://www.vagrantup.com/docs/cli/cloud


> Static binaries are far too limited; most software requires lots of files spread all around the file system.

In my experience that is entirely and ludicrously false. That is merely how a lot of unix software does things by convention, but there are a lot of ways to make even poorly-thought-out unix software behave as a self-contained entity.


Well, yes that's exactly what docker does.

In contrast, most linux package managers seem to make a big deal about doing all of this slightly different on just about every linux distribution and even between different versions of the same distributions. The fact package managers exist proves my point: deciding which files go where is a big deal and there seem to be an awful lot of opinionated package managers making different choices here. Whatever standards and conventions exist here seem to leave an awful lot of choice and wiggle room.


> The fact package managers exist proves my point: deciding which files go where is a big deal and there seem to be an awful lot of opinionated package managers making different choices here.

Consider that "package managers" only really existed in the UNIX world until relatively recently. Other OSs just didn't make everything so complicated to begin with.

"which files go where" just isn't really a problem if you don't make dependencies some third party's problem.


ssh to an Archlinux or *BSD box?


Static linking is nice, but there are some licenses (notably the GPL and LGPL, which glibc uses) which don't easily allow that for closed-source software. On the technical side, as soon as some code needs to 'dlopen' something (e.g. a plugin or a system driver), you can run into trouble due to multiple instances of glibc or other dependencies running together.

But if none of those are a requirement for your use case (or you have workarounds), I agree that statically linked binaries can be a nicer solution than Docker.


You can statically link whatever you like, you just can't distribute it as one work. One of the crazy one executable docker containers strikes me as one work to whatever extent a static linked binary is.


> You can statically link whatever you like, you just can't distribute it as one work.

How do you distribute it then? Let's assume your statically linked binary contains both closed-source code and GPL/LGPL code.

> One of the crazy one executable docker containers strikes me as one work to whatever extent a static linked binary is.

I'm not a lawyer, but that's not my understanding.

A docker image is a glorified collection of files with some metadata, just as a tar file is. I think it's broadly agreed that you're allowed to distribute a tar file with unmodified LGPL dynamic libraries in it without having to open-source all of your code, and I think docker images are treated the same way?


The distinction is meant to depend not on the way of linking, but how intimately the pieces are joined together:

https://softwareengineering.stackexchange.com/a/167781

> If the program dynamically links plug-ins, and they make function calls to each other and share data structures, we believe they form a single program, which must be treated as an extension of both the main program and the plug-ins.

To extend this to archives of independent programs, they would be loosely bound, and therefore not form a single program. A docker container that exists only to package up libraries some executable is using would be closer to a single program than a collection of independent components.


> How do you distribute it then?

You distribute a script and whoever runs the potential violation assembles it themselves, like with zfs on Linux.

I don't really get the demarcations typically made since a proprietary media could conceivably be as hard to pull apart as using linking tools to break apart sections of a static binary again..

I get the general sense that people work around examples of what one interpretation says isn't allowed without getting many opinions on the work around.


Golang makes it pretty painless to build static binaries. go:embed expected in 1.16 even more so.


If you're building on Linux, does it usually embed glibc in them? If it does, you'll need to comply with the LGPL when you distribute your statically linked binary.


Doesn’t Go just make syscalls on Linux?


I think you can escape that with CGO_ENABLED=0


This is what we do. One binary output that has all native dependencies built-in. We also use SQLite so we don't have to waste time with administering hosted SQL instances.

Ramping new developers and environments is incredibly trivial with our stack, and we do not rely on any containerization tech. Just .NET Core, visual studio, Git[Hub] and SQLite.


> You need an arguably complex and unstable Linux interface to run Docker images, cgroups et al

What is unstable about it? As far as I can tell, only the Linux kernel interface is needed, and keeping that stable is an explicit goal of the kernel.


I believe there was a period of about a year where it wasn't possible to run Docker on Fedora 31 or higher because the former did not support the cgroups v2 interface.


> A static file is far easier to share / distribute.

Does glibc work with static linking these days? My understanding was that even with statically linked glibc, things tend to break when the host system has a different libc / a sufficiently newer glibc.

Also, how do you do OpenGL/Vulkan/etc statically? x11docker handles them more-or-less fine, but I'm fairly certain the GPU gods send you to Tartarus if you start trying to statically link in various vendors' libGLs...

> Simple Unix/BSD constructs like chroot/jails are far simpler and they are reliable.

_Fully_ agree about jails, especially nice since they're persistent. Though, for an X11-using application, I think you're screwed any way it comes out, since afaik there's no permissions difference between being able to create a window, and being able to steal keystrokes + send keystrokes to a terminal. Maybe the Qubes people have something?


Qubes people do have this issue solved. Wayland too, you have to opt-in to have an application steal all the keystrokes.

As far as security on end user machines goes, there's no reason to use Docker over Podman, except for cases where one needs to run docker in docker, which is a farse in and of itself.


The advantage of running apps in docker is that you avoid the risk of breaking your system by adding 3rd party package repos or even worse, curl | sudo bash. Cleanup/uninstall is very simple as well.

However, I don’t agree with using it for sandboxing for security. Especially if you are giving it access to the X11 socket, as the author does in the examples. It might not be obvious, but the app will have access to all your other open windows.


> It might not be obvious, but the app will have access to all your other open windows.

Fun fact, most docker hosts will allow access to all your files anyway! (specially true on docker for mac, which all the cool kids(tm) here are using). Even if you restrict container host-FS access to a source repo dir, mind rogue code changing your .git hook scripts in there or you might run code outside of the container when committing ;)

Another slightly relevant fun fact, USB is a bus. That means that any device can listen in on any other device. And USB access is given by default to some X-enabled docker (--tty something), and to most virtualbox machines (including the hidden one running the fake docker linux host on docker-for-mac), and more recently Google-Chrome. ;)


> and to most virtualbox machines (including the hidden one running the fake docker linux host on docker-for-mac)

docker-for-mac does not use virtualbox.


Docker For Mac does use a virtual machine to run the Docker machine, though. It's complicated to access the automatically generated mount point on Docker's mac when you create a virtual folder but don't bind it anywhere.


But does Docker for Mac give unlimited access to the host’s usb bus like GP claims? I don’t see any evidence that it’s true, and the mistaken claim that d4m uses virtualbox increases my skepticism.


I have no knowledge on that matter, I’m sorry


Docker is intimately tied to Linux. It it only "cross platform" in that it can use a VM (Mac, WSL2) or flaky compat layers (FreeBSD). If software embraces Docker, it effectively excludes other OSes, like Windows, BSDs, Haiku, Fuschia, etc.


Even though it doesn’t run cross platform natively, it does run cross platform acceptably. I have no issue developing, building, and shipping images on my Mac that are deployed to production.


in that case, why is it that docker is considered superior when you could virtualize the windows api and run native windows apps in linux instead?


And even then trying to run Docker on Fedora will prompt you to turn off Security-Enhanced Linux features or use Podman instead. So it’s really like, what, 1/2 or 2/3 or Linux support


What? Why would it prompt this? Docker supports SELinux.


I don't know the specific issue, but RH's been trying to push Podman fairly hard...


Fedora is using cgroups v2 by default since last year. This version of cgroups was not supported by docker until a couple of weeks ago. So as a Fedora user you could either use podman or modify your installation to use cgroups v1 instead.


Thank you, I may have misspoke but this is what I was referring to.


> Docker is intimately tied to Linux.

Linux and windows:

https://hub.docker.com/_/microsoft-windows


Have you used a windows only container? I've not seen one in the wild, but I might be terribly biased.


Yep, I've also seen several corporates deploying Windows containers to production. As more "traditional" windows focused companies move to containerization/cloud, there's an increasing use of Windows containers, either as part of a Lift & Shift effort to migrate workloads, or because their developers are more comfortable with Windows and so they target that platform.


No, not yet. But we might go that way in the future for windows software on azure, much along the lines of:

https://docs.microsoft.com/en-us/dotnet/architecture/moderni...


I tried to make one with VS build tools a couple years ago. It’s probably not too bad to do by now, but MS licensing is a massive drawback IMO. I’d rather focus on tech than reading license agreements that could be changed on a whim.


A few years ago I set a goal to minimize the number of applications installed bare-metal on my laptop. Everything is containerized, just as the blog describes. There is some initial overhead, and some extra work associated with maintaining the infrastructure, but overall I can report improved reproducibility (due to an isolated and static environment per app).

I use a bash alias for each common application. For example, "jup" launches a jupyter notebook. I have a container with the Python package "Black" which runs using a git hook to clean up my code prior to commits.


>For example, "jup" launches a jupyter notebook.

Would you be willing to share what you do with Jupyter notebooks, your workflow, how you collaborate with your team, your frustrations?


That just seems insane to be running black inside a container. Why dont you just configure your environment properly?


Serious question: Why do you believe that to be insane? The Dockerfile for a simple application such as black must be very short (probably 3-4 lines), the alias is probably quite short and the container overhead time is minimal for native docker (the story might be different for things like docker mac).

On the other hand, you get some benefits from installing black through docker rather than through the system package manager: is is completely isolated from the host and the only way to break black is to update it, changing anything in the host will not break your black install.

I am not sure either what you mean by "just configure your environment properly", but I am going to assume you mean installing black under a virtual env or equivalent? Then it is also annoying for different reasons: you must reinstall it once for each project, updating python to a new (major) version breaks your formatter, you cannot move the env around, to name the ones that come on top of my head.


> you must reinstall it once for each project

You can install it either in a venv outside of all projects or even "pip install --user black".

> updating python to a new (major) version breaks your formatter

Uninstalling the old version breaks the formatter. Installing a new one does not. Either way, with asdf, pyenv, and others you can keep all relevant version around.

> you cannot move the env around

Sure you can. Use "--relocatable"

I agree docker may be nicer if you're not working day to day in python... but in that case black is not a great example. You probably want to have a specific version bound to the project so everyone uses it (including the ci platform).


Did not know about --relocatable, thanks for the tip!

> But in that case black is not a great example. You probably want to have a specific version bound to the project so everyone uses it (including the ci platform).

That's what I do in my open source work, the CI will run the formatter and commit the result back in tree so it forces everyone on one version. At work this is handled by the developer tools' team.

> Uninstalling the old version breaks the formatter. Installing a new one does not. Either way, with asdf, pyenv, and others you can keep all relevant version around.

Yes but at some point you are basically making ad-hoc containers right? So why not use the generic one?


> On the other hand, you get some benefits from installing black through docker rather than through the system package manager: is is completely isolated from the host and the only way to break black is to update it, changing anything in the host will not break your black install.

Why would a system package, packaged by experienced maintainers, randomly break?


This requires you to give your normal user the ability to execute stuff via docker. This means that given the ability to execute code as the user, an attacker can trivially gain root access.



I think in this context that's not a real concern, as on most laptop the human user has root equivalent permissions anyway (via sudo). In addition, Docker supports user namespaces, allowing you to have root in container being mapped to a non root user.


But can you still trigger the containerized Black from VS Code?


Yes. Only caveat is depending on how it's invoked you might need some --volume tweaking in your alias. If it's just being sent stdin then there shouldn't be any problem. Mounting whole project is probably best anyway for it to find tox.ini or whatever other config files needed. Also make sure the image doesn't contain any excessive output that uncontainerized black doesn't print.


That just seems insane to be running black on any old system python. Why don't you just wrap it in a proper dockerfile?


Using that pattern, can you still easily run Black automatically on file save in IntelliJ or similar?

There were a few configuration steps the last time I configured regular black in IntelliJ, but the documentation wasn't too bad to follow.


> On a Mac, there is a major performance hit whenever you do disk IO in a bind mount (i.e. voluming a directory of the host system into the container). Working without bind mounts is extremely limiting.

In my opinion, this is a big enough issue as to throw the whole idea of “Docker as cross-platform platform/target” into major question.

When running more than a few containers on macOS, the performance is so bad it becomes almost unusable.


The only reason it works so well on Linux is because containers are bog standard Linux processes running with no real overhead sans the kernel structures needed to maintain the namespacing. Entries in the VFS are cheap.

Once you need a hypervisor most of the benefits are gone. But if you need the hypervisor anyway which might be the case for software that doesn’t have Mac builds then it starts to look attractive.


Volumes to macOS host performance has nothing to do with bind mounts. It’s the VM folder sharing that’s crap, doubled with the attempt at Docker for Mac to translate filesystem events from fseventsd to inotify and back.

Changing solutions (e.g Docker for Mac -> VirtualBox or Fusion via e.g docker-machine) makes a world of difference.


I am trying to go in the opposite direction offering a read-only, mountable file system that offer a well crafted set-up of tools and utilities.

The idea is that you mount the filesystem and you got all the tools you need, and more, well installed, and that are lazily pulled from the network.

You win on the space side, but you need a bit more of trust.

You can find more info here: http://packages.redbeardlab.com

On the GitHub repo where you can ask for more packages to be installed:

https://github.com/RedBeardLab/packages.redbeardlab.com/

And in this pair or articles for specific languages

Golang: https://redbeardlab.com/2020/12/21/packages-redbeardlab-com-...

And for JavaScript/node: https://redbeardlab.com/2020/12/23/packages-redbeardlab-com-...


Docker for every application? If you create and maintain the Docker image or Dockerfile for every applucation yourself, you must have plenty of time. If you rely on public images from Docker Hub, you must have plenty of trust in the creators of those images.


Hmmm... what alternative does not either take time or trust?


It's all relative, of course. But getting a signed package from the repo of the distro I'm using for years is something different than using a random image from hub.docker.com.


Sure, but you are still relying on trust, and you are choosing to limit yourself to things released by your chosen distro. This is the same as if you were to pick a specific docker publisher that you trust, and only use their images.


It's arguable that it's not quite the same. It all comes down to consequences.

If a distro messes up the trustworthiness of an application, they, the big and important company loses clout.

If the application developer messes up, they also lose clout - people may stop using their software.

Chances are, if you're using a third party for a third party piece of software that isn't officially dockerized by the company that developed it, nor a major distro, there's no real backlash if it doesn't work or if they get hacked, etc: "it was a third party trick, so _of course_ it wasn't trustworthy" would be the statement everyone makes.

Debian messing up, or Cisco or Oracle, etc, is a much bigger deal.


Yep, the reality is that we all rely on hundreds of millions of lines of code of software (mostly OSS) that make up our OS, tool chains, libraries, etc. every day. Basically, it's not feasible to even review a meaningful fraction of a percent of that in a lifetime; assuming you even have the skill level to do such a review. In other words, mostly you are blindly trusting other people to have signed off on something and that those people who you don't know personally did a good job of that.


I'm running a similar setup, whereby I run most applications (even the browser I'm using to type this reply) in docker or podman containers, opportunely created.

Judging from the Git repo containing my dockerfiles, I've been doing so since ~mid June 2018.

I've since automated:

* checking new versions of Git repos, alpine versions, and short crawlers for tools (i.e. I run "perl latest.pl" and a bunch of stuff happens and eventually some dockerfiles might get updated)

* auto-committing any change made from the above step (i.e. ./autocommit.sh) with a meaningful message based on the directory the dockerfile resides, as well as which environment variable containing the version changed

* I use https://github.com/crazy-max/diun/ running on my dokku server to keep up with base images updates (i.e. I get an email in the morning stating alpine:3.12 has been updated or debian:buster-slim or whatever); when a base image changes I have to manually "dp alpine:3.12" to "docker pull" and "podman pull" it; after that, I "make base-images" and my local base images (each coming with a short line to enable a local apt-cache-ng proxy) to also get updated; then a simple "make" makes all of them (docker build -t .... and podman build -t ...)

* Quite a lot of (mostly small) bash scripts to run those images.

As an example, the Dockerfile I use to build hadolint:

    FROM local/mfontani/base:latest AS fetcher
    LABEL com.darkpan.github-check github.com/hadolint/hadolint HADOLINT_VERSION
    ENV HADOLINT_VERSION v1.19.0
    RUN curl -sSL "https://github.com/hadolint/hadolint/releases/download/$HADOLINT_VERSION/hadolint-Linux-x86_64" -o /usr/bin/hadolint
    RUN chmod +x /usr/bin/hadolint && \
        /usr/bin/hadolint --version
    FROM scratch
    COPY --from=fetcher /usr/bin/hadolint /usr/bin/hadolint
    ENTRYPOINT ["/usr/bin/hadolint"]
... and the shell script I use to run it:

    #!/bin/bash
    DOCKER_FLAGS=()
    [[ -t 0 ]] && DOCKER_FLAGS+=(-t)
    podman run --rm --init -i "${DOCKER_FLAGS[@]}" \
        --network none \
        -v "${PWD}:/usr/src:ro" \
        --workdir /usr/src \
        localhost/mfontani/hadolint "$@"
It's not that speedy doing this, but it's... okay:

    $ hadolint curl/Dockerfile
    Took: 0.837s (837ms)


You also have to have plenty of trust in the creators of the applications. Using Docker isn't really different, especially when the creators of the applications have also provided the Dockerfiles. What you had before Docker was just as much based on trust.


I have much more trust in the application developers. Thus, I'd trust first party docker images, but repackaged applicaton images would fall out of date more often and remain out of date for longer than most distro packages IMO. And, with distributions, there's a community of maintainers that try to package everything to a standard, with docker, I don't think that's the case.


How is writing a dockerfile any different from installing an application normally? You just write down the steps you'd otherwise have to take.


TLDR: 1. You already have that on Linux. 2. Make your builds portable.

If you are running mainstream Linux(Debian-based/Arch-based, probably other), most of the Docker profits can be achieved with already installed and configured systemd and your distro's package manager.

Sandboxed? systemd.

Simple, uniform interface? Your distro has packages, and most likely services that can and should be sandboxed already run in systemd after installation, you can tune unit-file if you want, and systemd has security checker, that shows you what application in the sandbox can and can not do, without proxying things the Docker way.

Versions pinning? Pin version with your package manager. Want multiple versions? Check out DebianAlternatives system.

Reproducible? Fix your build/install configs, not the environment. If it builds on your machine, but not on the other, or run flawlessly on one, but not the other, it means you have implicit dependency on the environment, or wrong dependency versions constraints, which you likely don't know about. If you don't know your dependencies, you are definitely shooting blind.

Minimizes global state. Repo-based distros minimize global state by providing software that has most dependencies compatible, so you can have your minimal state and update your software too. Meanwhile with docker you need all dependencies and hope that they will match between images, so you can save on layers reusage. And if someone decides to update base image, and others dont... well, too bad, you have to have both versions of the base image.


The easy option always wins in the end. You could do all that, or you could write one Dockerfile and not care. Obviously this is not the goal of Docker, but it fulfills a need that many big companies have and will fund its continued development, while also kind of fitting in as a general reproducible environment service. It's not ideal, but the ideal version isn't as easy.


But then you have to write aliases for mounts and etc. meanwhile repositories come without any additional costs, and sysyemd units aren't more complex than dockerfiles, and usually already written by package maintainers. You are just one `apt install $packagename` away.


I want to add another point to this TL;DR: 3. Do your research before writing these blog posts.

I don't want to put down anyone for writing these blog posts. The idea is nice from a distance, but the reality doesn't work like that.

It's shoehorning something not suitable for this situation. There's singularity which runs on non-root environments, however it's not for desktop systems, but multi-tenant clusters.

I really get frustrated when people advocate expensive abstractions for minimal gains. We can use our processing power much more efficiently while keeping almost the same properties without the costly abstractions.

Piling everything on top of each other to create impenetrable and immutable abstractions is not the way to achieve this. Docker already makes debugging very hard by being immutable and impenetrable as is.


I appreciate the sentiment, but I wouldn't be able to put up with the slow container boot time for regular use. The author mentions this at the bottom. He claims 1sec delay. Last month I benchmarked it on new-ish hardware at 560ms, or 290ms if you disable namespaces (and so disable isolation).

I wanted to also benchmark bocker[1] (docker written in bash) for a baseline comparison, but it no longer runs and I threw in the towel after ~30 mins of tinkering.

Anyways, if you run something like `curl | jq | grep | less` you could be waiting a fair bit for all those containers to start. Package managers are pretty good these days. I think I can trust it to install `jq` properly.

[1]: https://github.com/p8952/bocker


> curl | jq | grep | less

OK I didn’t know about this. Does each one of the piped commands cause the creation of a new container? And why?


Assuming you've bound an alias to these commands to run them individually, each in their own container


Thank you. Why on earth would one wish to bind these commands to their own containers? Or was that just a trivial example for illustrative purposes that I’m obsessing over?


> Running a program in a container is a lot like running it normally, but the user doesn’t need to jump through hoops to configure the system, build and install.

Docker is itself a complex build tool which requires a bunch of install steps. If you are going to ship software to end users there is almost always a better way to bundle and ship than send someone a Docker container.

Docker is not a distribution tool, if you are expecting your end users to install Docker, you've already screwed up.

> Downloading a pre-compiled binary is almost like this, except with worse odds. Maybe there’s a build for your architecture. If it was statically linked, you’re golden. Otherwise, use ldd to reverse engineer the fact that you need to install libjpeg.

On Mac and Windows this is almost never an issue. Even on Linux, it's pretty straight forward to statically link your binary if you aren't sure about the environment it's going to be run on. Statically linked binaries are a bit bloated... but not as bloated as a damned Docker image which contains entire dependency trees.

In no case is "Making it into a Docker Image" a simpler/ better distribution mechanic.


> In no case is "Making it into a Docker Image" a simpler/ better distribution mechanic.

For my home server setup, I have docker containers for:

  * PiHole
  * NextCloud
  * Home Assistant
If I had to install each of those manually, I probably wouldn't have installed them. This is especially true of NextCloud, which almost certainly would have required me to learn how to run nginx on my own, install php or whatever application it uses as the middleware, etc.

Instead, I configured my DNS and ran a Docker command and was off to the races.

Server-level Open Source Software's installation process is often so complicated and has so many dependencies if it's not something you can get from your repository's package manager, at least in my experience, that docker is almost always the easiest option.

What beats a single command and maybe reading a config on what ports to forward or how to set up your config. And then you get a docker-compose if you want to be creative, yourself.

Unless you're already very skilled at dev-ops, docker is easier.


> This is especially true of NextCloud

I have a huge (personal) wiki page for installing and maintaining NextCloud from before they had a decent Docker image. Now I’m content to let it be a black box I don’t have to think about, so I run it in Docker. It saves a ton of time and hassle.

Docker adds a lot of value when devs ignore the best practice advice of putting everything in a separate container. That’s just a package manager with extra steps IMO. It’s the mini distro style containers like GitLab’s that can save you a masssive amount of time.


Only if you're already skilled at docker.


It's a fair point and a good example of why there are no absolutes, but hardly addresses the bigger point.

If you have an app that formats JSON files, are you shipping it as a Docker container? Or a linter? How about a text editor?

The number of applications where it makes sense to bundle them as VMs is quite small.


Sorry but you're wrong. A vast majority of software ships after it has been built as a stand alone application as well as a Docker container so that it can be run in container environments.

A lot of CI systems natively support Docker because of the variety of tools and images you have available to run you CI steps in.

Docker is an excellent distribution tool because it only requires Docker.

You can send someone some python code and ask them to run it, only to find they're missing a bunch of C libs required to build and run the code which is a pain to help them figure out how to solve.

Docker solves many of these problems but has a drawback of being more 'bloated' than other distribution mechanisms. It doesn't make it a bad one though.


> You can send someone some python code and ask them to run it, only to find they're missing a bunch of C libs required to build and run the code which is a pain to help them figure out how to solve.

You can send someone a Docker file only to figure out they've never heard of Docker. Let alone don't have it installed and aren't interested in setting up a container system to your 500 line Docker script.

All you've done is move your problem upstream. Unless your consumer is a web developer, you are out of luck.


The learning curve for docker is minimal. As I mentioned before it's very useful in CI systems where you want to be building and testing your code in a clean environment that is reproducible across machines.

Trying to debug when a core library or header dependency is missing to build your code requires far more skills and differ between operating systems and versions of operating systems.

The problem isn't moved upstream, it's tackled in a very clever and well packaged manner.


On linux just use the system package tool to declare your dependencies.

Yes, you may end up making a deb and an rpm but honestly, it's not an earth-shattering amount of work, lots of companies do it, and then the tool will tell the user "requires libjpeg".


The blog post links to Jessie Frazelle's GitHub repo [0].

Jessie also has a blog post about this [1] from back in 2015. If you prefer video format, Jessie also has a talk at DockerCon SF 2015 [2].

[0] https://github.com/jessfraz/dockerfiles

[1] https://blog.jessfraz.com/post/docker-containers-on-the-desk...

[2] https://www.youtube.com/watch?v=cYsVvV1aVss


I've actually gone the opposite direction in my home lab to run less in docker - unless you're building your own images it's hard to know what's in them, which images are built on out of date OS images or dependencies, what else is running etc.


Docker has its use cases. But running everything possible in a docker container just for the sake of "dockerizing" seems a little bit excessive.

One must basically maintain a (more or less) complete userland environment for every application. The idea of shared libraries is taken to absurdity this way. It would be better to build all static.

Then there is the waste of resources. I'm sure with plenty GB RAM, TB of SSD space and GBit of bandwith available nowadays many people don't notice. But for what?


> Then there is the waste of resources. I'm sure with plenty GB RAM, TB of SSD space and GBit of bandwith available nowadays many people don't notice. But for what?

Yeah, you deff notice if you are trying to scale as cheaply as possible…


I prefer to just make a big "devbox" docker with all the utilities I need installed, and the home directory as a docker volume.

The advantage is i don't have to fiddle with containers for every little program, no startup delay for every program, and my base ubuntu install stays clean and stable (which runs a VM and other services so stability is important).

It does have a few warts, the main one being you can't have the entire container root volume be a docker volume, so to install new programs persistently inside the devbox i have to rebuild the image (if they don't live completely in the home volume). But logically its an okay tradeoff because the only way to make the whole environment reproducible is to specify things at build time.


Is it possible to run your IDE within these docker environments? I'd love to set up a do it all docker container for the front end devs at my work.


I use the Remote Development extension for Visual Studio Code, which, among other things, allows you to attach an editor window to a running container.

https://code.visualstudio.com/docs/remote/containers


Were it not for docker, I probably would have never learned about Linux. I'll share why. I was excited to find docker. It was neat and could be setup to update your applications without manual interaction. So I had it run my media server apps on a seedbox I had rented. Fast forward years later got a form factor pc that came with the free version of Windows. Try as a might, I couldn't get docker to work on it. Required the Pro version or something like that. I remember thinking, well that sure is stupid! That was the beautiful beginning of my affair with Linux as an OS for my desktop/server needs.


I find that the time taken to get everything working means I have less time to get work done. The busywork does give me a sense of achievement, but that is not the achievement that matters. It's premature optimization to sandbox an application by default unless there is a pressing reason to do so.

For e.g. zoom and hugo don't really change the filesystem or OS settings beyond the folders they output to. I don't see a reason the have them sandboxed personally.


Isn’t that precisely a use case for sand boxing? If I know zoom doesn’t need file system, I can deny it access and guarantee it doesn’t use the file system without me knowing about it.


If you don't trust an application, sandboxing is one way to go. But that goes all the way down. If you don't trust docker and the OS, run them in a VM. If you don't trust the VM, run it on a spare computer. If you don't trust that computer's hardware, then use minecraft to make a PC with your own instruction set :) Same with networking. Use HTTPS. No? Also use a VPN. No? Make your own VPN. No? Use smoke signals with one-time pad encryption..

In the end you decide at what point are you willing to delegate responsibility for things working as they say they should.


I wish there were real / usable Windows containers out there. I'd love to spin up a quick sandbox to run a MSVC / MSBuild, but the smallest "image" in Windows-land for that is 12Gb.

Still use Docker alongside WSL2 for purely-Linux stuff like some node.js scripts or python things that don't need GPU.


It’s not clear to me what your requirements are, but take a look at Sandboxie for Windows. [1]

[1]: https://github.com/sandboxie-plus/Sandboxie


I'm looking at startup ideas in this area. Would you be interested in describing your pain points and use case?


I just want Docker containers for Windows with a reasonable size. They do have a Server Core (at about 1.5Gb) and a much slimmer NanoServer (100Mb), but the NanoServer is not capable of running the Visual C++ compiler for example.

Just look at something like this which tries to compile C++ for VcPkg: https://hub.docker.com/r/hripko/vcpkg/tags?page=1&ordering=l... (the Linux images are all < 300Mb, but the Windows one is 5Gb compressed).


> On a Mac, there is a major performance hit whenever you do disk IO in a bind mount (i.e. voluming a directory of the host system into the container). Working without bind mounts is extremely limiting. [..] If you’re using Docker on a Mac and you’ve never tried it on Linux, you owe it to yourself to try it on Linux.

Or use named volumes. I'm running a dockerized WordPress dev environment on my MacBook with average TTFB's of 40 ms.


I had to investigate why our docker-ized dev environment was so slow on macOS at my last job and this was the root cause. One project's test suite ran in about 5-10 mins in our linux CI/CD environment, and about 50+ mins in macOS with docker.

There was a very in-depth thread on the docker forums where the devs explained why there was such a huge performance penalty. IIRC it was due to all the extra bookkeeping that had to be done to ensure strong consistency and correct propagation of file system events between the virtualized docker for mac environment and the host file system.

The test suite would run integration tests that performed a lot of npm/yarn operations which meant lots of disk IO.


The reason is that docker on mac runs in a Linux VM and VM mounts are slow.


> Or use named volumes

And how do you access these from your host with high performant IO?


There’s no need to in my situation. 99% of the files are dependencies installed through Composer (PHP) while building the Dockerfile. The few directories I need to work on directly are bind mounts and don’t have much impact on overall runtime performance.


My personal computer doubles as my workstation. I manage my Python virtual environments with virtualenvwrapper, and I install all the other stuff -mostly OpenGL and gamedev related tools- with apt or from source.

Many developers would consider this Ubuntu 20.04 install that I run on a 9 year old computer to be an accident waiting to happen. But I'm finding this Ubuntu release to be extremely stable even at 3000+ packages.


I'm in the same situation: until I figure (if ever) how to use NixOS and control version my OS, I tend to use Docker for any application with lots of dependencies, specially Python related ones


Reach out to me if you want to learn Nix I'm always happy to help people climb the steep learning curve


Curious, does one need to use nixOS for an optimal experience?

I've been using debian for a long time and would prefer not to have to switch OSs but the idea of using nix for having full control over my package graph is very tempting.

I've been somewhat procrastinating on trying nix as I've heard GNU Guix has a similar feature set and haven't been able to decide on which one to dive into...

My ideal setup would be to just be able to run a single shell script that configures a new machine to the exact state of all my other dev machines. I have a shell script that somewhat does this but it's not completely unattended and still requires a lot of manual config for certain steps.

Aside from that my main use case is being able to easily share a dev/build environment with others for ensuring that they can compile a certain project exactly as I do. For now I just use docker but it's frustrating not having explicit control over the layer cache and being able to tell it what to cache and what not to.


> Curious, does one need to use nixOS for an optimal experience?

My experience: no. I've been very happily using Nix on debian for years. Debian gives me my boring desktop apps (I have very very boring tastes), almost all my tinkering and development starts with drawing all the prerequisites from Nix.

The only times you get weirdness from using non-NixOS linux are things like running opengl apps, which simply require a wrapper script like `nixGL` to function properly.

And darwin Nix is just about the only thing that makes macos a bearable platform for me.


Thanks setheron. If I find the time (the eternal issue) I may to take you on your word )


I've recently spent a few hours in NixOS land. I even built my own ISO and booted into it from an SD card. If you want a 30 minute intro to getting started then just ping me. I'm en route to something like what you want and I'm not far ahead of you.


Did you try the Docker-based approach after trying Python's virtualenv? Or did you go directly to Docker?


No, I've using virtualenv for quite some time and I still use it when I'm the person in control of the dependencies, though I'm moving little by little to the docker way. Why? Because sometimes the problem is not only on the dependencies, but in Python itself. As an example, as of today gcloud does not support Python 3.9.


Just want link to Whalebrew, which achieves a lot of what this article mentions but IMO is more user-friendly. https://github.com/whalebrew/whalebrew


If you are on a Mac or Windows, this approach only works for utilities that can run on Linux. If you need to run a native Mac program for example, docker won't be able to run that.

Perhaps Nix might help here, but I've never used it, so can't say for sure.


Many comments here point out how difficult it is to manage a separate dependency stack for each container when you use Dockerfiles to build them. This problem is just as difficult, time-intensive, and security-critical for microservice apps running on K8s as it is for CLI tools and graphical apps.

Worth pointing out that there is an incubating CNCF project that tries to solve this problem by forgoing Dockerfiles entirely: Cloud Native Buildpacks (https://buildpacks.io)

CNB defines safe seams between OCI image layers so that can be replaced out of order, directly on any Docker registry (only JSON requests), and en-mass. This means you can, e.g., instantly update all of your OS packages for your 1000+ containers without running any builds, as long as you use an LTS distribution with strong ABI promises (e.g., Ubuntu 20.04). Most major cloud vendors have quietly adopted it, especially for function builds: https://github.com/buildpacks/community/blob/main/ADOPTERS.m...

You might recognize "buildpacks" from Heroku, and in fact the project was started several years ago in the CNCF by the folks who maintained the Heroku and Cloud Foundry buildpacks in the pre-Dockerfile era.

[Disclaimer: I'm one of the founders of the project, on the VMware (formerly Cloud Foundry) side.]


I hadn't heard of Buildpacks before, sounds very interesting.

In particular the out of order layer replacement. I'm interested in switching to Buildpack for the images I maintain for my home cluster. Would make upgrading my base image so much simpler compared to rebuilding all the other images! I read a bunch of docs/articles since reading your comment yesterday but couldn't find any mention of this, or better yet an example. Are there some docs I missed? (I didn't look into the spec.)


Nevermind, I realized that rebase is exactly that. I had misunderstood the docs.


Don't hesitate to reach out on Slack if you have more questions: https://slack.buildpacks.io

A few tips on rebase:

(1) If you want to rebase without pulling the images first (so there's no appreciable data transfer in either direction), you currently have to pass `--publish`.

(2) If you need to rebase against your own copy of the runtime base image (e.g., because you relocated the upstream copy to your own registry), you can pass `--run-image <ref>`.


I get that doing APIs right is hard and that when you're trying to release a proprietary product having to worry about what libraries every system comes with may be a struggle and that if you're developing a webapp you may want everyone to have a consistent test environment regardless of what flavour of archlinux they're running but I really do NOT want to have to spend 2 hours downloading 300GiB of software updates when OpenSSL wets the bed and has a critical vulnerability. The nice thing about libraries and shared objects is that when I get an email saying: "Critical CVE found in something important that may actually affect you" I want to be able to run an update command which fetches a few libraries including the vulnerable one, reboot and be back in business.

I also don't really care for the "we have so much space now" argument. I certainly don't, I don't put in expensive 2TB SSDs in my laptop because I don't need them, and I don't want half of my 500GiB disk to be taken up by giant blobs of unoptimized docker images for the same reason that I don't want to run 10 copies of chrome at the same time to use a text editor, an email client, a web browser, a media player, a debugger, the thing I'm writing, three chat clients, and a partridge in a pear tree. I have extra space and extra processing power on my computer so that a: I can use it for the things I actually want to use it for and b: so that I have a snappy machine which can take an unexpected load (be it disk load or processing load) without problems.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: