Run More Stuff in Docker

systemvoltage · on Dec 27, 2020

Let's not. I don't want to install Chrome which is already 73 MB, now bloated up with a whole lotta bullshit that's 500 MB+ image. Imagine downloading every application as a docker container. WTF.

Docker is for distribution of applications when deploying them to servers. As a developer, it's amazing at that and have brought peace and joy in devops. Let's leave it there, shall we?

alibarber · on Dec 27, 2020

My disdain for docker is not so much disk usage (I consider that battle lost in modern Software Engineering) but all the other crud that it does - such as re-implement an entire network device and routing and all. I’ve seen this cause teams countless hours of downtime with things like IP address collisions with VPCs etc, and I’ve never really been given a satisfactory answer as to why this is useful. So now we’ve solved dependency hell and have to faff around with networking and all. Great.

Also the engine runs as root and takes commands from normal users which I always thought was a no-no but I guess docker is ‘special’ - and should be able to do whatever it likes. Containerisation is a fine idea and concept - but I think there are still big caveats.

I’ve been using podman a lot and I hope it becomes more commonplace. I hear that there’s a lot of SV politics and drama going around surrounding the various companies and backers which I have exactly 0 interest in though...

cmckn · on Dec 27, 2020

Totally agree that Docker is made for deploying to servers. But the disk space critique doesn’t hit for me. Even very nice SSD’s are cheap enough that 500 MB is negligible. My internet connection also makes downloading a large docker image no bigger of a deal than downloading Chrome, YMMV.

I think the necessity of a VM when using Docker on Mac and Windows is the primary reason that running your “normal” apps in a container isn’t the right move.

RedShift1 · on Dec 27, 2020

The problem isn't 500 MB disk footprint, it's all the RAM memory going to waste when loading in redundant libraries. Chrome already is a memory hog on its own, imagine all applications suddenly bringing in their versions of their libraries.

tomxor · on Dec 27, 2020

Also the network requirements to install and update such large containers is a problem, a lot of people don't appreciate how slow the internet is in most of the world compared to those with access to any kind of fiber end point.. this extends to similar concepts like snap packages.

cmckn · on Dec 27, 2020

RAM is the same situation as disk for me — I have plenty to waste.

Again, not advocating Chrome in a container, I don’t even run Chrome outside of a container. I just think it’s odd to get hung up on these sorts of resource requirements given the state of computing.

RedShift1 · on Dec 27, 2020

You may have plenty of resources but there are many that have to "make due" with 8 GB, 4 GB or even less RAM.

cmckn · on Dec 28, 2020

Totally fair, but I think the Venn diagram of folks that use Docker and folks that have 4GB of RAM is pretty slim. If the resources are that limited, Docker might not be the best choice for running anything.

Related: Chrome on 4GB of RAM sounds painful. Thoughts and prayers to those folks.

bayindirh · on Dec 27, 2020

> Even very nice SSD’s are cheap enough that 500 MB is negligible.

However a nice SSD with a respectable TBW value is not still cheap. a 860 Pro is almost twice the cost of a 860 Evo. Pro provides twice the TBW value.

> My internet connection also makes downloading a large docker image no bigger of a deal than downloading Chrome, YMMV.

Not everyone of us has pipes that fat to our homes which provide sub 10ms pings and almost LAN-speed access to rest of the world. My office workstation's network is limited by my network card but, my home has a much slower connection.

I wish that internet on this planet to be a full-fat-tree network but, we're not there yet.

piaste · on Dec 27, 2020

> However a nice SSD with a respectable TBW value is not still cheap. a 860 Pro is almost twice the cost of a 860 Evo. Pro provides twice the TBW value.

TBW is almost never a concern for desktop users. The Evo has 600 TBW endurance per TB of storage, that would be a full disk rewrite every day for two years.

You will never download enough Docker images for personal use to burn out your 860 Evo before you would have replaced it anyway. (For those unfamiliar with Docker, spinning up a 500MB image ten times doesn't write 5GB to disk!)

bayindirh · on Dec 27, 2020

> TBW is almost never a concern for desktop users.

You're right however, most of the people who'll use this kind of setup is not ordinary desktop users.

My desktop has 4 disks (2 SSDs and 2 HDDs). My write rate for "Home" SSD is 3TB/yr. To keep the value low, I've moved VMs, big downloads and other stuff to one of the HDDs. System is on another SSD and it's cumulative write was about 3TB in 8 years but, I moved logs and high-write portions to another HDD to keep that value low.

3-4TB / year on the other hand is pretty in line with a Windows 10 installation's behavior when used by a normal desktop user, as intended.

>You will never download enough Docker images for personal use to burn out your 860 Evo before you would have replaced it anyway.

Considering other stuff I do, I could easily double or triple the amount of writes in my Home SSD, but VMs and other stuff already can sit on the RAM once running so there's no speed problem.

While write amplification is not a big concern anymore, building software and other small file operations can accumulate fast, so I still can't trust a SSD blindly.

On the replacing of drives, while a dd or rsync is pretty straightforward for a seasoned Linux user, I don't prefer to change hardware just for the sake of it, or abuse it because it's cheap and can be replaced on a whim. At the end of the day, I'd rather use my system efficiently rather than recklessly both in terms of resources and endurance, because being able to rely on your system is underrated imho.

Because of my Job, we torture systems up to and beyond their design limits and a little optimization can go a long way in these scenarios. I like to apply that knowledge to my systems to extend their useful life.

AtlasBarfed · on Dec 28, 2020

So, are these the numbers?

TBW is 600 TB, but you do 3TB/year across four disks, and I'll be generous and say you do .75/SSD/year.

so in 750 years you'll hit the rated TBW?

If you did a full 3TB/year on the one SSD, you hit it in 200 years?

bayindirh · on Dec 28, 2020

No, 600TBW is for a 1TB Evo. For a Pro, it would be 1.200TBW.

Unfortunately, neither of my disks are that big. Home is on a 256GB SSD which boils down to 300TBW. System SSD is a bit older 120GB OCz Vertex 3. This model doesn't have a TBW rating.

As a result, if I don't do anything heavy, it'd last for a century in the best case. I'm not sure about OCz though. It reports 100% life remaining but it's from the skunkworks era of the SSDs so, I can't be sure for anything.

The numbers climb very fast when you start to develop stuff and enter compile -> test -> debug cycle. So, I'd rather have that endurance and use it while developing software rather than eating it while doing daily stuff.

AtlasBarfed · on Dec 30, 2020

You mean that era right after when Intel released the first consumer SSD, and suddenly everyone was rebadging flash with who-knows-what controllers?

Anandtech was the best site for that era with their testing.

bayindirh · on Dec 30, 2020

Almost, but not quite. The Vertex 3 was one of the best SSDs at that time. Had one of the better Sandforce controllers with proper MLC flash but, had to write compressible data to obtain these speeds.

It has no temperature sensor so it cannot compensate for temperature or understand its environment. It doesn't have a TBW rating (IIRC) and reports writes and reads in GiB. So it's somewhat limited when compared to today's drives.

OTOH, it's pretty dependable and stable so far.

[0]: https://www.anandtech.com/show/4256/the-ocz-vertex-3-review-...

ithkuil · on Dec 27, 2020

In many cases you don't need 500mb of base image.

Minimal "distrofull" images are in the 50mb range. But often you don't really need a real distribution inside your docker image.

See https://github.com/GoogleContainerTools/distroless. But distroless is not about one specific tool or base image, it's a paradigm that addresses precisely what you say (without throwing away the whale with the bathwater)

knowhy · on Dec 27, 2020

Docker has a layer file system. Meaning if you do it right, that Chrome container will share the same 500 MB base image layer with the Gimp container, or whatever, making it less bloated than it appears when looking only at the footprint of the first image.

I'm not stating that I believe it is a good idea to run desktop apps in Docker containers. It is not a good idea. But it is also not true that if someone would do it, it would necessary lead to a bloated filesystem.

Docker will use a lot of disk space when using multiple base images.

michaelt · on Dec 27, 2020

> that Chrome container will share the same 500 MB base image layer with the Gimp container, or whatever

Only if they’ve all chosen the same base image.

And if application developers could all agree on a fixed base image with fixed versions of dependencies, that’d be a Linux distro and we wouldn’t need docker to begin with :)

sverhagen · on Dec 27, 2020

I fully agree with the original premise (to run more stuff in Docker). But if I maintain two tools, both using Docker, let's say they're both built in Python, they'll both be in their own Git repository, and their build pipelines will come up with independent Docker images for whomever wants to use those tools, and it seems like frivolous maintenance, if not an anti-pattern, to make sure that these tools are always using the same base image. And yes, if bandwidth and disk space were constraint for me, I'd probably reconsider (even on the original premise), but they are not.

By the way, while I think you should run more stuff in Docker, I also think it is still totally reasonable to have your "main language stuff" installed "normally" (not in Docker). If I'm a Java developer, I'm "happy" to deal with Java's dependency b.s. on my normal system. Meanwhile, if I ever have to touch Ruby, I do not want to deal with Ruby's dependency b.s. on my normal system (rather run that in Docker). And vice versa, as I'm sure.

solarengineer · on Dec 27, 2020

As a Ruby, Node or Python language user, you may want to try rbenv, nodenv or pyenv respectively.

pdimitar · on Dec 29, 2020

Or the ASDF version manager that covers all of them, and much more.

Kwpolska · on Dec 27, 2020

Both of your Python projects will likely have `python:latest` (or such) as their base image. Sure, they will diverge after that (with further layers for dependencies and your code), but at least the base system will be shared.

sverhagen · on Dec 28, 2020

Even if they used python:latest (as if that is the only flavor), they're only "latest" when I do my build. So, now I have to coordinate builds and releases. Unlikely to happen, and, again, an anti-pattern.

LucasLarson · on Dec 27, 2020

> If I'm a Java developer, I'm "happy"

Oh yes, it would make me "happy", too.

foepys · on Dec 27, 2020

We all know that his doesn't work in practice. Somebody will "urgently" need another dependency and build a new base image with this dependency inside. Or a security patch comes out and half of the developers update while the rest doesn't. Do this 4 or 5 times and you'll have the exact same fragmentation.

dimitar · on Dec 27, 2020

That is if you are using one base base image. Most likely you will be using Alpine, Ubuntu, Debian, Centos, Red hat, Oracle Linux and who knows what else in different containers, unless of course you have the patience to repackage all then versions you need for your favorite base image.

peteretep · on Dec 27, 2020

> Imagine downloading every application as a docker container. WTF.

Help me understand the WTF here, and also how that’s meaningfully different from an OS X app?

ogre_codes · on Dec 27, 2020

A MacOS App is just what you need to run the app on MacOS. A Docker Image is essentially a mini operating system in a can. Most Docker images contain shells, the entire Python install, an init sequence... piles and piles of redundant stuff.

If you are running a Docker image on MacOS or Windows, you first have to start docker which is itself a Linux Virtual machine.

Docker is a great dev tool, but if you don't need it, it's a ton of bloat.

m463 · on Dec 27, 2020

I think you are responding too literally to his comment, which is spot on.

A macos app is running in a sandbox and runs in a conceptually similar way to docker.

Go look in ~/Library/Containers

also look at the filesystem under <appname>.app

ogre_codes · on Dec 27, 2020

> I think you are responding too literally to his comment, which is spot on.

Should I respond metaphorically?

> A macos app is running in a sandbox and runs in a conceptually similar way to docker.

A Mac App fundamentally has access to system libraries and leverages those. A Docker container is designed to ignore the system and builds its own environment.

If you run a Docker container on a Mac or Windows, you are now running 3 operating systems. The host OS, the VM, and the Docker image.

This is not the same as a Mac App. Literally, figuratively, or hypothetically.

oefrha · on Dec 27, 2020

Except Mac apps don’t each ship a libSystem.

AnIdiotOnTheNet · on Dec 27, 2020

Because unlike Linux they don't have to, since MacOS actually defines the concept of a stable base system that can be targeted by applications. Docker exists in part because the Linux world has no such concept.

ogre_codes · on Dec 28, 2020

Thats the point here. The two aren't similar at all.

MacOS apps contain all the app. Mac Apps leverage all of the OS functionality they can. They are strongly tied to MacOS and rely on it for most functionality. When I upgraded to Big Sur, the Mac Apps that run on my computer adapt to the changed OS libraries and often present differently.

Docker apps contain their own complete environment. They are deliberately engineered to disassociate from the base OS.

Electron is a sort of middle ground largely ignoring many system libraries, but using others.

The only thing Docker has in Common with Mac Apps is the fact that they keep associated files bundled together.

systemvoltage · on Dec 27, 2020

Jumping from 73 MB -----> 500 MB or whatever.

Have you been running Docker on your system? Have you tried running: `docker system prune -a`? It's gonna print something like Total reclaimed space: 31.2GB.

josephg · on Dec 27, 2020

Its worse than just the size jump. Every application would also need to run all its code inside a linux VM. So that means:

- Dedicated RAM for all your docker apps (which you have to partition manually)

- Slow startup time for the first docker app you run each time you reboot, as it boots the linux VM.

- All syscalls run through the VM's emulation layer, which is way slower than native

- No access to the system's native UI toolkits. (Docker apps can't dynamically link to cocoa for obvious reasons.) We can probably work around this by wasting a huge amount of developer time making a networked bridge or something silly but ??

- No easy way to load and save files (the VM doesn't have filesystem access)

- All the other downsides of docker - like it gobbling up all your disk space with images that are no longer used

- Lower battery life, because the host can't easily sleep the VM's scheduler. The CPU can't enter low power state when nothing's running.

Please don't do this. We already have a special kind of container for "self contained program on disk". Its called a statically linked executable. They work great. They're fast, small, have access to everything the host system provides. You can SHA them if you want and you can easily host them yourself using a static web server. My computer's responsiveness is more important to me than your shiny docker shaped toy.

evil-olive · on Dec 27, 2020

> Slow startup time for the first docker app you run each time you reboot, as it boots the linux VM.

This, and the other VM-related criticism, is true, but only on Mac & Windows. Running a Docker container on a native Linux system carries very little overhead.

josephg · on Dec 27, 2020

Yep. My comment was mostly made in the context of running macos desktop apps in docker. Windows would be largely the same, though the existence of WSL might make some things easier.

Linux desktop apps running in docker would work better because there's no VM in between the native environment and the application. You'd still need to forward GUI calls across the LXD container boundary. That used to be easy with X11 forwarding, but AFAIK wayland removed that feature. I don't know enough to guess how difficult that would be to implement.

But, if you solved that it should be mostly smooth sailing. Linux desktop apps distributed via docker would just be the same apps, but unnecessarily bigger, with probably less access to the filesystem. They would be harder to patch for security errata. And they would leave extra junk in your docker images cache. Docker's sandboxing would be nice though.

sz4kerto · on Dec 27, 2020

WSL2 mostly solves the problem of manual memory partitioning.

andrewprock · on Dec 27, 2020

But the 73m -> 500m is still true, right?

manigandham · on Dec 27, 2020

Depends entirely on how the image is built. You can have a Docker image that contains nothing but the application binaries, but then the question is why use Docker at all.

cratermoon · on Dec 27, 2020

For all the other reasons the author notes: the main one being that the application running in the docker image has no access to the host system other than what the user explicitly gives it. It's a very minimal sandbox and often all the application needs.

You can't really reproduce that with any popular desktop operating system. Even if you could the interesting thing about docker is that starts with a default deny environment built on the principle of least privilege.

edbob · on Dec 27, 2020

I'm not an expert in this area, but I've seen plenty of accounts of how Docker can be very insecure. Perhaps it's possible to configure Docker so that it is very secure, but even Google has had people break out of their containers, so these claims about container security should probably come with a disclaimer: "Docker is very secure as long as you are one of the top 0.1% in the field and never mess up". VMs seem to still be the proper tool for securely isolating processes, and should perhaps be the recommendation for the other 99.9%.

A few comments on how insecure the typical Docker recommendations are:

> Sandboxed - security claims about Docker have always been controversial. Simple Unix/BSD constructs like chroot/jails are far simpler and they are reliable

https://news.ycombinator.com/item?id=25547629

> However, I don’t agree with using it for sandboxing for security. Especially if you are giving it access to the X11 socket, as the author does in the examples. It might not be obvious, but the app will have access to all your other open windows.

https://news.ycombinator.com/item?id=25547551

> Fun fact, most docker hosts will allow access to all your files anyway! (specially true on docker for mac, which all the cool kids(tm) here are using). Even if you restrict container host-FS access to a source repo dir, mind rogue code changing your .git hook scripts in there or you might run code outside of the container when committing ;) > Another slightly relevant fun fact, USB is a bus. That means that any device can listen in on any other device. And USB access is given by default to some X-enabled docker (--tty something), and to most virtualbox machines (including the hidden one running the fake docker linux host on docker-for-mac), and more recently Google-Chrome. ;)

https://news.ycombinator.com/item?id=25547833

hda111 · on Dec 27, 2020

Podman and LXD have far more secure defaults. It's a terrifying to see that Docker is still around. jail(8) from FreeBSD seems to be the best (from security standpoint), it's also the first container system.

cratermoon · on Dec 27, 2020

I think the experts in docker would note that sandboxing and privilege isolation are technical features which have many uses, but that is not at all the same as making broad-based claims about the security of docker or the virtual machine and host operating system technology upon which it runs.

edbob · on Dec 27, 2020

It's obvious that any system with faulty sandboxing or privilege isolation is quite likely to be insecure. If the default "sandboxing" of Docker allows access to nearly everything of importance on the host, then we can indeed make "broad-based claims" about the insecurity of the system as a whole.

sjy · on Dec 27, 2020

> You can't really reproduce that with any popular desktop operating system.

If you’re running Docker, you’re running Linux, and you can use the same kernel features as Docker to sandbox applications yourself. This way, you only incur the overhead of a separate network namespace and filesystem if you actually need it. Services can be sandboxed with a few lines of configuration in a drop-in unit, and applications can be run with a shell script that calls `systemd-run`, just like the `docker run` shell scripts suggested in the article.

remexre · on Dec 27, 2020

Do you know of a tool that can do this for arbitrary programs? Maybe something like CARE [0] to find what the program should be accessing, then building the appropriate systemd configuration files?

[0]: https://proot-me.github.io/care/

jcelerier · on Dec 27, 2020

> If you’re running Docker, you’re running Linux

As slow as it is some people apparently enjoy running docker from win / mac

mulmen · on Dec 27, 2020

Docker needs Linux to work. On Mac/Windows containers run in a Linux virtual machine.

slavik81 · on Dec 27, 2020

I thought so too, but apparently support for native Windows containers has existed for a while now. e.g. https://poweruser.blog/lightweight-windows-containers-using-...

mulmen · on Dec 27, 2020

Neat! I had no idea this was a thing, thanks for sharing.

jcelerier · on Dec 27, 2020

I wouldn't call "running an app in a linux VM on windows", "running linux". As a user, your main interface with your computer is windows, which is what "i'm running windows" means, and obviously "you can use the same kernel features as Docker to sandbox applications yourself" wouldn't work, e.g. you couldn't sandbox photoshop like that.

infogulch · on Dec 27, 2020

Isolation and reproducibility

eptcyka · on Dec 27, 2020

At the end of the day, all the docker images depend on a lot of ./configure && make install, if not a lot of apt-get installs, which are often not reproducible. You don't get reproducibility unless you've done it all the way down the chain.

Spivak · on Dec 27, 2020

Not the most useful for software intended to be used on your local machine. You don’t get the isolation benefits because the whole point is to give the container broad privilege so you can use it as if it was a native utility. Docker also doesn’t make a non-reproducible build process reproducible.

At the end of the day if you have a precompiled self-contained artifact Docker is just a convoluted way of calling exec.

keyle · on Dec 27, 2020

I am so glad I can read something like this in public.

When I say stuff like this in job interviews, I get eye rolls and don't end up getting a job.

The industry is so far up its ass lately, no one even dare stating the facts in public. /rant

pram · on Dec 27, 2020

Now talk about deploying something NOT using Kubernetes. The double whammy. Now you're looking like an insane 100 year old graybeard fossil.

klodolph · on Dec 27, 2020

I had a conversation a couple months ago where some guy was telling me that I should run my personal website using Kubernetes. It’s a static website served with Nginx! I just about lost my mind trying to talk to this guy, and he kept trying to convince me to try Kubernetes for my personal website.

indemnity · on Dec 27, 2020

Why even run nginx? My personal website is served out of a private S3 bucket, with CloudFront sitting in front of it.

As a former sysadmin, the less actual admin I need to do the better.

d3nj4l · on Dec 27, 2020

You could have other, self-hosted services there. That's my setup: I have self hosted trilium and used to self host matrix. The personal nginx served website was just a cherry on top of the cake.

cobbzilla · on Dec 27, 2020

> I have self hosted trilium and used to self host matrix.

Wait, you’re self hosting multiple apps on one system, and not using K8?!? Your apps can’t autoscale or do blue/green upgrades or any of the cool stuff you are now mandated to do by the cargo cult. How do you sleep at night??

/s

Frost1x · on Dec 27, 2020

>or any of the cool stuff you are now mandated to do by the cargo cult.

This is ultimately what drives me insane about the tech sector: so many choices are trend based and cult like when any sort of technical discipline shouldn't be.

klodolph · on Dec 27, 2020

S3 has no https and unsatisfactory performance.

My website used to be in an S3 bucket, I was unhappy with it.

jbboehr · on Dec 28, 2020

If you run it behind CloudFront it does support HTTPS.

klodolph · on Dec 28, 2020

Anything “supports HTTPS” if you count putting an HTTPS proxy in front of it. In other words, S3 does not support HTTPS, but you can work around that limitation by combining it with other services.

My personal website is low-traffic, it’s not going to be very hot in any CDN caches. Based on my experiences with S3 performance, adding another layer of cache misses in front of it is probably just going to slow things down.

jbboehr · on Dec 30, 2020

Fair enough.

The suggestion was more for the HTTPS part and not the performance part of your complaint. It's simple enough to set up, pay-as-you-go, only requires one vendor, and should be very scalable.

I'd just as soon throw up nginx on a Linode, myself.

biglost · on Dec 27, 2020

Or overuse of microservices

TimTheTinker · on Dec 27, 2020

^ this. Proliferation of many microservices and repos is a trend I’m trying (and failing) to buck at my current job.

The next time I have to copy/paste the same utility to a new microservice because [insert corporate reason] I’m going to scream.

pjmlp · on Dec 27, 2020

Fossil here, happily using VMs.

keyle · on Dec 27, 2020

At this rate we should make a fossil of deployments moto, where we claim there is nothing wrong with makefiles and bash scripts :)

like http://programming-motherfucker.com but against triple-buffered-virtualization.

XorNot · on Dec 27, 2020

I guess that makes me fossilizing, because this is the direction I think we should really be heading.

When I stare at docker long enough I wind up at "why couldn't this be a static binary" or "this would be easier to secure if it was it's own VM".

pjmlp · on Dec 27, 2020

Many people are basically misusing Docker to work around Python and Ruby dependency problems.

keyle · on Dec 27, 2020

^ this is a very good point. Someone else's mess, let me put it all together, in a nice pot plant.

scaryclam · on Dec 27, 2020

9/10 times I see the problems getting solved in the same way you can do without docker, except now the application is also running as root and several more containers are added to work around other issues created by running in docker :(

piaste · on Dec 27, 2020

A Docker container is a lot simpler to maintain than a VM, for just about any task you care to mention.

A static binary is much better than a Docker container, I agree. Unfortunately a lot of useful apps can't or won't ship as a static binary (Java, Python, PHP apps), and Docker makes them behave much more like a static binary.

pjmlp · on Dec 28, 2020

Java can certainly be shipped as single static binary, AOT compilers exist since around 2000.

Same applies to Python bundlers like py2exe.

sverhagen · on Dec 27, 2020

Just want to call out: this is on Windows and macOS. So, these remarks are certainly true for probably a majority of Hacker News readers, but if you were on Linux these limitations do not exist. I'm also not sure how much these limitations hinder you in reality, I hear few complaints from the macOS users around me, and we use Docker a lot.

remexre · on Dec 27, 2020

> Its called a statically linked executable. They work great.

Doesn't this make OpenGL, Vulkan, and parts of glibc break?

rleigh · on Dec 27, 2020

Yes. In the case of glibc, it's long past time that flaw was corrected by moving NSS modules out of process and using a pipe or socket to run queries. IIRC OpenBSD already does this. They already cause problems with pulling in their transitive dependencies into your process' address space and potentially causing nasty conflicts. Same goes for PAM modules.

systemvoltage · on Dec 27, 2020

That's absolutely terrifying. Thanks for the insight.

peteretep · on Dec 27, 2020

wtf are you doing to create such large Docker containers? Are you installing each app on top of a fully-fledged Ubuntu? Are you leaving the build tools in the final container?

fantod · on Dec 27, 2020

I mean it could just be very many containers that add up. And in many cases you'll have loads of images close to 1GB unless you go to great lengths to try and shrink them (e.g. anything that uses Python).

peteretep · on Dec 27, 2020

I can get an Alpine Linux container running Perl with a chunk of CPAN on it to 60MB by just stripping the build tools once I've installed everything I want. Is Python really a GB bigger, or are people not taking the most basic of steps?

tomrod · on Dec 27, 2020

Basic steps. Welcome to the long September of Docker. Whereas most folks who use it regularly know basic maintenance patterns, new users (myself included) must wade through inscrutable documentation or (worse) poorly written blogfarm posts in order to bootstrap up to a level of proficiency that passes interview smell tests.

pferde · on Dec 27, 2020

Do you have any documentation (or blogs) on a higher level that you would recommend? I too have been using Docker for a while, and recently also for certain programs on my desktop (mainly firefox, github cli, azure cli, aws cli, terraform).

I like to think I am somewhat proficient in using Docker correctly by now, but I am still discovering new tidbits of practical knowledge, tips and good practices every few weeks. And I always want more. :)

fantod · on Dec 27, 2020

Itamar Turner-Trauring's blog has a number of good articles on Docker best practices:

https://pythonspeed.com/docker/

tomrod · on Dec 27, 2020

I do not. I'm a new user who has found resource identification difficult.

fantod · on Dec 27, 2020

So maybe I was quick to comment, but I glanced at the images I have sitting around and python:3.7 is 876MB. However, python:3.7-slim is 112MB (still nearly twice the size of the image you describe, though). But it's conceivable that many people running into limitations with the slim image would just opt for the full Python image instead.

Maybe a better example is the Tensorflow image, which is close to 4GB for the GPU version. It's also the kind of thing that would be a pain to rebuild from scratch if you wanted to find a way to save space.

peteretep · on Dec 27, 2020

Do you also understand that once you have any image built on top of python:3.7 then that 876MB doesn't need to be downloaded again for subsequent images?

> the Tensorflow image is really big

This is kind of meaningless without comparing it to how large Tensorflow is with related dependencies if you just install it centrally?

fantod · on Dec 27, 2020

Yes, I understand that Docker caches layers. Like I said, if docker system prune is clearing a huge amount of space, it's like a large number of different images and containers that might not share layers. If you're build, running, starting, and stopping containers all day, you might not be terribly careful about exactly what you're doing each time. That's why system prune is so useful.

systemvoltage · on Dec 27, 2020

That's what I am trying to ask myself too! I am sorry that I am a developer and I need to run a bunch of docker compose files.

manigandham · on Dec 27, 2020

If you're building docker images FROM SCRATCH then there's no improvement over a zip file or statically-linked single binary (which is what packages and installers already use).

Meanwhile Docker adds many more restrictions and limitations and is not a good fit for consumer-focused interactive GUI applications which are not the web and console apps that work best with Docker.

Too · on Dec 27, 2020

One improvement is the container is sandboxed and dependencies to the host must be added explicitly, eg --volume, --env, --net.

Sure you could do the same with chroot manually and there are many other tools that does the same thing, but somehow the docker way of representing these concepts are more graspable for average user, and much more widespread use.

cowsandmilk · on Dec 27, 2020

OS X apps share frameworks in the user space, while docker images don’t...and docker images have all the constraints everyone hates about sandbox, but only make it worse. E.g. the file open dialog in macOS will grant apps access to the file when the user chooses the file. In docker, I have to manually add a path whenever I want tools to have access.

peteretep · on Dec 27, 2020

That doesn’t match my mental model of how that works. Can you give me an example of a shared 3rd party framework that a Mac app bundle will find on the file system rather than just include in the bundle?

josephg · on Dec 27, 2020

Sharing dynamically linked 3rd party frameworks on macos is relatively rare. But all the system built-in frameworks are shared via dynamic linking in userspace. (Eg Cocoa, UIKit, Foundation, etc). Native macos apps feel native because they dynamically link to the system's standard set of UI libraries. (And more recently font sets like SF Symbols.)

For many reasons, none of that is available via docker.

The GP is also referencing the fact that even if you somehow had access to cocoa from your linux binary inside your docker VM, the docker image doesn't / shouldn't have access to your host (macos's) filesystem. So the save / load dialog wouldn't work properly anyway.

peteretep · on Dec 27, 2020

> none of that is available via docker

I'd expect someone building many tools like the author of the original piece is using the same shared image as the base for most of them?

josephg · on Dec 27, 2020

The base shared image can't embed macos's system & UI libraries because:

- They're proprietary. Copying them from macos onto docker hub violates the macos software license.

- They change with each version of MacOS. You can't mix & match them.

I guess you could mount them into docker's filesystem, but then whats the point of using docker? And even if you did that:

- They're macos mach executable files. Linux (and therefore docker) doesn't know how to run mach executable files.

- Even if you could somehow embed them and get them to run, the libraries wouldn't work because they expect to be making syscalls to Darwin. They can't do that from inside a linux virtual machine.

You could probably make a weird RPC proxy involving a native macos process receiving network commands. But it should take a herculean effort to make it work at all, and even if you got it working it would probably be buggy (since everything would suddenly become async) and slow.

saagarjha · on Dec 27, 2020

It’s rare, but if apps did want to do this the would typically place things at /Library/Frameworks for this purpose.

joshuanapoli · on Dec 27, 2020

Loading a 1GB image to run a 50MB executable can be trouble. I used to work while commuting on a train on a daily basis. Unexpectedly needing a big download could ruin my morning.

peteretep · on Dec 27, 2020

Why are you running such huge images for a 50MB executable?

cowsandmilk · on Dec 27, 2020

We expect every app developer to suddenly become an expert in efficiently producing docker layers and not just lazily use an Ubuntu desktop image?

peteretep · on Dec 27, 2020

I expect anyone distributing their app via Docker to learn how to use their tooling properly, yes?

fantod · on Dec 27, 2020

Agreed, there are many things to get right about using Docker. But if you're just deploying an app to a somewhere, does it really matter if the image is a bit bloated (within reason)?

joshuanapoli · on Dec 27, 2020

Probably because container authors learned from Docker's getting started video, which demonstrates a 1GB "hello world" image.

m463 · on Dec 27, 2020

I disagree.

I think the beauty of docker - and this should spread elsewhere - is that everything starts with one text file.

The crufty part is all the crap that has to be added to the docker run commandline.

sverhagen · on Dec 27, 2020

> one text file.

That! It is codified. If people had their main machine codified, Docker would maybe be less of a benefit. And I'm sure a lot of people here have that. But a lot of my colleagues don't. So, I give them a Docker image... instead of explaining the same Java developer for the umptied time how to make a virtual environment for my Python program.

kaba0 · on Dec 27, 2020

Most package manager formats are simple text files as well.

Also, docker doesn’t solve the DLL hell problem, just pushes it out of sight. The nix package manager is something that actually solves it and it should be promoted and leave docker to things it is good at - containerization.

the_duke · on Dec 26, 2020

For end user devices, I much prefer Nix/NixOS [1] for this kind of thing.

With Flakes [2] (experimental feature), you get full reproducibility.

The documentation is spotty and there is a considerable learning curve, but I've switched to NixOS on my laptop and desktop early this year and am mostly very happy with it.

That doesn't cover sandboxing though. I would actually agree that sandboxing / restricting applications (Mac OS style) would be sorely needed on Linux desktop.

Flatpak and Snap are trying to do this. A much saner approach than trying to find or maintain up to date, trustworthy Docker files for applications.

[1] https://nixos.org/

[2] https://nixos.wiki/wiki/Flakes

ris · on Dec 27, 2020

Absolutely. Most Dockerfiles I come across are very much not written in a robustly reproducible way. The stacking filesystems model is really not very flexible (if you want to compose two docker images with dissimilar bases to be used together you have to resort to manually specifying which files to copy across from each base). Use of OS features to achieve its ends means the docker daemon itself requires far more privileges than I'm comfortable with and makes weirdness like docker-in-docker necessary...

mikepurvis · on Dec 27, 2020

Yeah, any Docker file that starts with `apt update; apt dist-upgrade` is an instant fail.

You might be interested in this fascinating exploration of attempting to bend Docker into better caching and composition by injecting blocks of Nix packages as individual layers:

https://grahamc.com/blog/nix-and-layered-docker-images

jrockway · on Dec 27, 2020

I think the base images that are commonly used are just as bad; likely some random snapshot of upstream repositories at arbitrary times. You can refer to the sha256 of base images to avoid things changing, but I've never seen anyone do that. (The advice is "never use 'latest' because that could change out from under you", but other tags are just as mutable, and so that isn't real advice. You can follow the advice and have something just as bad happen.)

The thing the upstream Linux distributions are missing are a lockfile with the hashes of installed packages. Programming languages figured this out (go.sum, package-lock.json, etc.) but distributions have not. Thus, people are often running "whatever" in production, because they simply don't have the ability to lock dependencies properly.

I assume Nix solves this problem, and people should pay attention to how important it actually is.

eptcyka · on Dec 27, 2020

In production, the companies I work with certainly do rely on SHA256 checksums to specify the base images.

jrockway · on Dec 27, 2020

Do they just build their own images for open source things? I have never seen a dockerfile that does that.

eptcyka · on Dec 27, 2020

akhilcacharya · on Dec 27, 2020

Honestly given how little reproducibility there is in like 80% of the Dockerfiles I've seen in the wild it astounds me how this became a solution for packaging at all.

Are solutions like source2image, jib, or the nix dockerTools.buildImage method gaining steam?

mikepurvis · on Dec 27, 2020

I've examined this a number of times over the years for deploying a ROS (Robot Operating System) workspace to robots and laptops. Some of our competitors use snaps or various container technologies for this, but I've always felt that the isolation/sandboxing would be more of a barrier than a help— it's one more layer of udev rules and other indirection to have to punch through with another set of config files (and to no benefit as we're not trying to run multiple independent "apps" or anything).

So currently at my org we use basically the same scheme I advocated for at a developer conference in 2016 [1]: we just build it all into a gigantic bundled debian package, and deploy that to the host system with apt.

I keep revisiting these various technologies and all of them seem less mature than apt/dpkg, and mostly in service of features and capabilities which don't apply to my particular needs. Obviously I'm a bit of a niche case, and my needs aren't everyone's, but of all of them, nix is the one that seems to be doing the most that is truly interesting and different; being able to send a new nightly build out to users with requiring them to re-downloading an entirely new asset would be a major win.

[1]: https://vimeo.com/187705228

strogonoff · on Dec 27, 2020

I also suspect OS package managers are underused when it comes to deployments. I am guilty of that too, some irrational wariness of end-user tools being fit to deal with developer workflows.

That aside, there is one slight possible pitfall in discarding one tech because it is less mature than the other: if we assume maturity only increases with time, the oldest product (e.g. apt/dkpg) will always be “best”. Made a note for myself to prefer “not mature enough for my needs” over “less mature than X”.

mikepurvis · on Dec 27, 2020

I think for apt/dpkg, the system definitely does benefit from decades of careful, thoughtful design— for example you're pretty unlikely to come up with a version constraint between your packages which isn't expressible using the existing Debian scheme. And when I look at various modernish packaging systems, I often see a lot of this thrown out as unneeded complexity, only to be cobbled back in later when it's discovered how essential it really is (see for example Homebrew's long transition from being a dirt simple wrapper around "configure; make; make install" to becoming a full binary package manager).

There's also the ecosystem benefit of having loads of helpers and supplementary tooling applicable to the formats— even stuff like having first-class support in proprietary binary stores like Artifactory, vs Nix where it's basically a shrug and "well... it works with any WebDAV server, so take your pick I guess?"

The main complaint I have overall with Apt is the reliance on postinstall scripts, which means that even if you download and extract your packages in parallelized blocks, you still have a long serialized step when every single package needs to spawn a shell and run arbitrary commands, even if in most cases, the commands actually originate from a semi-declarative format (debhelpers, either invoked explicitly from the rules file, or implicitly by the presence of a corresponding debian/xyz file in the metadata). Anyway, if it were possible to somehow flag packages as atomic or configure-less, it might be possible to significantly speed up these operations, especially in environments like CI where you have everything mirrored in-network or possibly even on-machine so the overall install time is dominated by the package configure step.

m463 · on Dec 27, 2020

apt/.deb unfortunately has a much much higher barrier to entry.

Docker, is simple enough to get even the most inexperienced developer going in a very short time.

GrantZvolsky · on Dec 27, 2020

I also phased out Docker for local, non-isolated workloads in favour of nix. It's saved me many hours of figuring out how to install software whose official packages don't work out of the box / aren't compatible with my distribution.

pknopf · on Dec 27, 2020

It's worth mentioning my pet project, Darch.

https://godarch.com/

timClicks · on Dec 27, 2020

Snap provides a strong sandboxing layer, but IIRC Flatpak does not.

the_duke · on Dec 27, 2020

That is incorrect.

https://docs.flatpak.org/en/latest/sandbox-permissions.html

hendry · on Dec 27, 2020

Just some counter arguments to @jbergknoff's well put together page!

Docker is the best medium for distributing - A static file is far easier to share / distribute.

Cross-platform - You need an arguably complex and unstable Linux interface to run Docker images, cgroups et al

Sandboxed - security claims about Docker have always been controversial. Simple Unix/BSD constructs like chroot/jails are far simpler and they are reliable

Version pinning - a binary can embed a version and you can stick with it

Reproducible - Everyone gets confused about Docker image checksums. `sha1sum static-binary` is far far simpler.

Minimizes global state - wouldn't be a problem is people built static binaries.

jillesvangurp · on Dec 27, 2020

What's the alternative to docker? And by that, I mean a solution that a team can reasonably use across Linux, Windows, and Mac. The simple reality is that, there is Docker and absolutely nothing else that comes close to working everywhere. Yes, it's not perfect but everything else is far less perfect. Static binaries are far too limited; most software requires lots of files spread all around the file system.

I did a project last year at a company that had dockerized their build, CI, and CD infrastructure. They had dozens of git projects with make files that triggered actions using docker. It was great. No need to install anything complicated; just works everywhere with just a minimum of scripts installed from a single internal repository. They did some nice hacks to work around some of the things mentioned in the article. Including using virtual box on macs to work around the filesystem limitations. This really becomes a show stopper for large complicated builds that are very io intensive.

u801e · on Dec 27, 2020

> What's the alternative to docker? And by that, I mean a solution that a team can reasonably use across Linux, Windows, and Mac.

Virtual Machines. VirtualBox in combination with vagrant works reasonably well cross-platform.

jillesvangurp · on Dec 27, 2020

Doesn't solve the same problem. You can use that to run docker of course.

boublepop · on Dec 27, 2020

Care to elaborate on your point? How is that not the same problem?

u801e · on Dec 27, 2020

What problem does Docker solve that using a VM cannot solve?

jillesvangurp · on Dec 27, 2020

The infrastructure for creating and sharing e.g. virtual box images does not exist. With docker I can run just about any server product you can name with a single command line invocation. Those commands don't exist for vmware, virtual box or other common virtual machine. That infrastructure simply does not exist.

u801e · on Dec 27, 2020

> The infrastructure for creating and sharing e.g. virtual box images does not exist.

vagrant [1] provides this feature based on my understanding.

[1] https://www.vagrantup.com/docs/cli/cloud

AnIdiotOnTheNet · on Dec 27, 2020

> Static binaries are far too limited; most software requires lots of files spread all around the file system.

In my experience that is entirely and ludicrously false. That is merely how a lot of unix software does things by convention, but there are a lot of ways to make even poorly-thought-out unix software behave as a self-contained entity.

jillesvangurp · on Dec 27, 2020

Well, yes that's exactly what docker does.

In contrast, most linux package managers seem to make a big deal about doing all of this slightly different on just about every linux distribution and even between different versions of the same distributions. The fact package managers exist proves my point: deciding which files go where is a big deal and there seem to be an awful lot of opinionated package managers making different choices here. Whatever standards and conventions exist here seem to leave an awful lot of choice and wiggle room.

AnIdiotOnTheNet · on Dec 27, 2020

> The fact package managers exist proves my point: deciding which files go where is a big deal and there seem to be an awful lot of opinionated package managers making different choices here.

Consider that "package managers" only really existed in the UNIX world until relatively recently. Other OSs just didn't make everything so complicated to begin with.

"which files go where" just isn't really a problem if you don't make dependencies some third party's problem.

hendry · on Dec 27, 2020

ssh to an Archlinux or *BSD box?

FartyMcFarter · on Dec 27, 2020

Static linking is nice, but there are some licenses (notably the GPL and LGPL, which glibc uses) which don't easily allow that for closed-source software. On the technical side, as soon as some code needs to 'dlopen' something (e.g. a plugin or a system driver), you can run into trouble due to multiple instances of glibc or other dependencies running together.

But if none of those are a requirement for your use case (or you have workarounds), I agree that statically linked binaries can be a nicer solution than Docker.

foolmeonce · on Dec 27, 2020

You can statically link whatever you like, you just can't distribute it as one work. One of the crazy one executable docker containers strikes me as one work to whatever extent a static linked binary is.

FartyMcFarter · on Dec 27, 2020

> You can statically link whatever you like, you just can't distribute it as one work.

How do you distribute it then? Let's assume your statically linked binary contains both closed-source code and GPL/LGPL code.

> One of the crazy one executable docker containers strikes me as one work to whatever extent a static linked binary is.

I'm not a lawyer, but that's not my understanding.

A docker image is a glorified collection of files with some metadata, just as a tar file is. I think it's broadly agreed that you're allowed to distribute a tar file with unmodified LGPL dynamic libraries in it without having to open-source all of your code, and I think docker images are treated the same way?

rhn_mk1 · on Dec 27, 2020

The distinction is meant to depend not on the way of linking, but how intimately the pieces are joined together:

https://softwareengineering.stackexchange.com/a/167781

> If the program dynamically links plug-ins, and they make function calls to each other and share data structures, we believe they form a single program, which must be treated as an extension of both the main program and the plug-ins.

To extend this to archives of independent programs, they would be loosely bound, and therefore not form a single program. A docker container that exists only to package up libraries some executable is using would be closer to a single program than a collection of independent components.

foolmeonce · on Dec 27, 2020

> How do you distribute it then?

You distribute a script and whoever runs the potential violation assembles it themselves, like with zfs on Linux.

I don't really get the demarcations typically made since a proprietary media could conceivably be as hard to pull apart as using linking tools to break apart sections of a static binary again..

I get the general sense that people work around examples of what one interpretation says isn't allowed without getting many opinions on the work around.

hendry · on Dec 27, 2020

Golang makes it pretty painless to build static binaries. go:embed expected in 1.16 even more so.

FartyMcFarter · on Dec 27, 2020

If you're building on Linux, does it usually embed glibc in them? If it does, you'll need to comply with the LGPL when you distribute your statically linked binary.

saagarjha · on Dec 27, 2020

Doesn’t Go just make syscalls on Linux?

hendry · on Dec 27, 2020

I think you can escape that with CGO_ENABLED=0

bob1029 · on Dec 27, 2020

This is what we do. One binary output that has all native dependencies built-in. We also use SQLite so we don't have to waste time with administering hosted SQL instances.

Ramping new developers and environments is incredibly trivial with our stack, and we do not rely on any containerization tech. Just .NET Core, visual studio, Git[Hub] and SQLite.

rhn_mk1 · on Dec 27, 2020

> You need an arguably complex and unstable Linux interface to run Docker images, cgroups et al

What is unstable about it? As far as I can tell, only the Linux kernel interface is needed, and keeping that stable is an explicit goal of the kernel.

u801e · on Dec 27, 2020

I believe there was a period of about a year where it wasn't possible to run Docker on Fedora 31 or higher because the former did not support the cgroups v2 interface.

remexre · on Dec 27, 2020

> A static file is far easier to share / distribute.

Does glibc work with static linking these days? My understanding was that even with statically linked glibc, things tend to break when the host system has a different libc / a sufficiently newer glibc.

Also, how do you do OpenGL/Vulkan/etc statically? x11docker handles them more-or-less fine, but I'm fairly certain the GPU gods send you to Tartarus if you start trying to statically link in various vendors' libGLs...

> Simple Unix/BSD constructs like chroot/jails are far simpler and they are reliable.

_Fully_ agree about jails, especially nice since they're persistent. Though, for an X11-using application, I think you're screwed any way it comes out, since afaik there's no permissions difference between being able to create a window, and being able to steal keystrokes + send keystrokes to a terminal. Maybe the Qubes people have something?

eptcyka · on Dec 27, 2020

Qubes people do have this issue solved. Wayland too, you have to opt-in to have an application steal all the keystrokes.

As far as security on end user machines goes, there's no reason to use Docker over Podman, except for cases where one needs to run docker in docker, which is a farse in and of itself.

lrossi · on Dec 27, 2020

The advantage of running apps in docker is that you avoid the risk of breaking your system by adding 3rd party package repos or even worse, curl | sudo bash. Cleanup/uninstall is very simple as well.

However, I don’t agree with using it for sandboxing for security. Especially if you are giving it access to the X11 socket, as the author does in the examples. It might not be obvious, but the app will have access to all your other open windows.

gcb0 · on Dec 27, 2020

> It might not be obvious, but the app will have access to all your other open windows.

Fun fact, most docker hosts will allow access to all your files anyway! (specially true on docker for mac, which all the cool kids(tm) here are using). Even if you restrict container host-FS access to a source repo dir, mind rogue code changing your .git hook scripts in there or you might run code outside of the container when committing ;)

Another slightly relevant fun fact, USB is a bus. That means that any device can listen in on any other device. And USB access is given by default to some X-enabled docker (--tty something), and to most virtualbox machines (including the hidden one running the fake docker linux host on docker-for-mac), and more recently Google-Chrome. ;)

zapita · on Dec 27, 2020

> and to most virtualbox machines (including the hidden one running the fake docker linux host on docker-for-mac)

docker-for-mac does not use virtualbox.

t-writescode · on Dec 27, 2020

Docker For Mac does use a virtual machine to run the Docker machine, though. It's complicated to access the automatically generated mount point on Docker's mac when you create a virtual folder but don't bind it anywhere.

zapita · on Dec 28, 2020

But does Docker for Mac give unlimited access to the host’s usb bus like GP claims? I don’t see any evidence that it’s true, and the mistaken claim that d4m uses virtualbox increases my skepticism.

t-writescode · on Dec 28, 2020

I have no knowledge on that matter, I’m sorry

ridiculous_fish · on Dec 27, 2020

Docker is intimately tied to Linux. It it only "cross platform" in that it can use a VM (Mac, WSL2) or flaky compat layers (FreeBSD). If software embraces Docker, it effectively excludes other OSes, like Windows, BSDs, Haiku, Fuschia, etc.

ed25519FUUU · on Dec 27, 2020

Even though it doesn’t run cross platform natively, it does run cross platform acceptably. I have no issue developing, building, and shipping images on my Mac that are deployed to production.

chii · on Dec 27, 2020

in that case, why is it that docker is considered superior when you could virtualize the windows api and run native windows apps in linux instead?

zachrose · on Dec 27, 2020

And even then trying to run Docker on Fedora will prompt you to turn off Security-Enhanced Linux features or use Podman instead. So it’s really like, what, 1/2 or 2/3 or Linux support

cpuguy83 · on Dec 27, 2020

What? Why would it prompt this? Docker supports SELinux.

remexre · on Dec 27, 2020

I don't know the specific issue, but RH's been trying to push Podman fairly hard...

dajonker · on Dec 27, 2020

Fedora is using cgroups v2 by default since last year. This version of cgroups was not supported by docker until a couple of weeks ago. So as a Fedora user you could either use podman or modify your installation to use cgroups v1 instead.

zachrose · on Dec 28, 2020

Thank you, I may have misspoke but this is what I was referring to.

e12e · on Dec 27, 2020

> Docker is intimately tied to Linux.

Linux and windows:

https://hub.docker.com/_/microsoft-windows

eptcyka · on Dec 27, 2020

Have you used a windows only container? I've not seen one in the wild, but I might be terribly biased.

raesene9 · on Dec 27, 2020

Yep, I've also seen several corporates deploying Windows containers to production. As more "traditional" windows focused companies move to containerization/cloud, there's an increasing use of Windows containers, either as part of a Lift & Shift effort to migrate workloads, or because their developers are more comfortable with Windows and so they target that platform.

e12e · on Dec 27, 2020

No, not yet. But we might go that way in the future for windows software on azure, much along the lines of:

https://docs.microsoft.com/en-us/dotnet/architecture/moderni...

donmcronald · on Dec 27, 2020

I tried to make one with VS build tools a couple years ago. It’s probably not too bad to do by now, but MS licensing is a massive drawback IMO. I’d rather focus on tech than reading license agreements that could be changed on a whim.

physicsgraph · on Dec 26, 2020

A few years ago I set a goal to minimize the number of applications installed bare-metal on my laptop. Everything is containerized, just as the blog describes. There is some initial overhead, and some extra work associated with maintaining the infrastructure, but overall I can report improved reproducibility (due to an isolated and static environment per app).

I use a bash alias for each common application. For example, "jup" launches a jupyter notebook. I have a container with the Python package "Black" which runs using a git hook to clean up my code prior to commits.

Jugurtha · on Dec 27, 2020

>For example, "jup" launches a jupyter notebook.

Would you be willing to share what you do with Jupyter notebooks, your workflow, how you collaborate with your team, your frustrations?

qmmmur · on Dec 27, 2020

That just seems insane to be running black inside a container. Why dont you just configure your environment properly?

antoinealb · on Dec 27, 2020

Serious question: Why do you believe that to be insane? The Dockerfile for a simple application such as black must be very short (probably 3-4 lines), the alias is probably quite short and the container overhead time is minimal for native docker (the story might be different for things like docker mac).

On the other hand, you get some benefits from installing black through docker rather than through the system package manager: is is completely isolated from the host and the only way to break black is to update it, changing anything in the host will not break your black install.

I am not sure either what you mean by "just configure your environment properly", but I am going to assume you mean installing black under a virtual env or equivalent? Then it is also annoying for different reasons: you must reinstall it once for each project, updating python to a new (major) version breaks your formatter, you cannot move the env around, to name the ones that come on top of my head.

viraptor · on Dec 27, 2020

> you must reinstall it once for each project

You can install it either in a venv outside of all projects or even "pip install --user black".

> updating python to a new (major) version breaks your formatter

Uninstalling the old version breaks the formatter. Installing a new one does not. Either way, with asdf, pyenv, and others you can keep all relevant version around.

> you cannot move the env around

Sure you can. Use "--relocatable"

I agree docker may be nicer if you're not working day to day in python... but in that case black is not a great example. You probably want to have a specific version bound to the project so everyone uses it (including the ci platform).

antoinealb · on Dec 27, 2020

Did not know about --relocatable, thanks for the tip!

> But in that case black is not a great example. You probably want to have a specific version bound to the project so everyone uses it (including the ci platform).

That's what I do in my open source work, the CI will run the formatter and commit the result back in tree so it forces everyone on one version. At work this is handled by the developer tools' team.

> Uninstalling the old version breaks the formatter. Installing a new one does not. Either way, with asdf, pyenv, and others you can keep all relevant version around.

Yes but at some point you are basically making ad-hoc containers right? So why not use the generic one?

nemetroid · on Dec 27, 2020

> On the other hand, you get some benefits from installing black through docker rather than through the system package manager: is is completely isolated from the host and the only way to break black is to update it, changing anything in the host will not break your black install.

Why would a system package, packaged by experienced maintainers, randomly break?

eptcyka · on Dec 27, 2020

This requires you to give your normal user the ability to execute stuff via docker. This means that given the ability to execute code as the user, an attacker can trivially gain root access.

watermelon0 · on Dec 27, 2020

There are many ways of running containers as a normal user:

https://docs.docker.com/engine/security/rootless/

https://github.com/containers/podman#rootless

antoinealb · on Dec 27, 2020

I think in this context that's not a real concern, as on most laptop the human user has root equivalent permissions anyway (via sudo). In addition, Docker supports user namespaces, allowing you to have root in container being mapped to a non root user.

visarga · on Dec 27, 2020

But can you still trigger the containerized Black from VS Code?

Too · on Dec 27, 2020

Yes. Only caveat is depending on how it's invoked you might need some --volume tweaking in your alias. If it's just being sent stdin then there shouldn't be any problem. Mounting whole project is probably best anyway for it to find tox.ini or whatever other config files needed. Also make sure the image doesn't contain any excessive output that uncontainerized black doesn't print.

hansvm · on Dec 27, 2020

That just seems insane to be running black on any old system python. Why don't you just wrap it in a proper dockerfile?

t-writescode · on Dec 27, 2020

Using that pattern, can you still easily run Black automatically on file save in IntelliJ or similar?

There were a few configuration steps the last time I configured regular black in IntelliJ, but the documentation wasn't too bad to follow.

TimTheTinker · on Dec 27, 2020

> On a Mac, there is a major performance hit whenever you do disk IO in a bind mount (i.e. voluming a directory of the host system into the container). Working without bind mounts is extremely limiting.

In my opinion, this is a big enough issue as to throw the whole idea of “Docker as cross-platform platform/target” into major question.

When running more than a few containers on macOS, the performance is so bad it becomes almost unusable.

Spivak · on Dec 27, 2020

The only reason it works so well on Linux is because containers are bog standard Linux processes running with no real overhead sans the kernel structures needed to maintain the namespacing. Entries in the VFS are cheap.

Once you need a hypervisor most of the benefits are gone. But if you need the hypervisor anyway which might be the case for software that doesn’t have Mac builds then it starts to look attractive.

lloeki · on Dec 27, 2020

Volumes to macOS host performance has nothing to do with bind mounts. It’s the VM folder sharing that’s crap, doubled with the attempt at Docker for Mac to translate filesystem events from fseventsd to inotify and back.

Changing solutions (e.g Docker for Mac -> VirtualBox or Fusion via e.g docker-machine) makes a world of difference.

siscia · on Dec 27, 2020

I am trying to go in the opposite direction offering a read-only, mountable file system that offer a well crafted set-up of tools and utilities.

The idea is that you mount the filesystem and you got all the tools you need, and more, well installed, and that are lazily pulled from the network.

You win on the space side, but you need a bit more of trust.

You can find more info here: http://packages.redbeardlab.com

On the GitHub repo where you can ask for more packages to be installed:

https://github.com/RedBeardLab/packages.redbeardlab.com/

And in this pair or articles for specific languages

Golang: https://redbeardlab.com/2020/12/21/packages-redbeardlab-com-...

And for JavaScript/node: https://redbeardlab.com/2020/12/23/packages-redbeardlab-com-...

harryruhr · on Dec 27, 2020

Docker for every application? If you create and maintain the Docker image or Dockerfile for every applucation yourself, you must have plenty of time. If you rely on public images from Docker Hub, you must have plenty of trust in the creators of those images.

cortesoft · on Dec 27, 2020

Hmmm... what alternative does not either take time or trust?

harryruhr · on Dec 27, 2020

It's all relative, of course. But getting a signed package from the repo of the distro I'm using for years is something different than using a random image from hub.docker.com.

cortesoft · on Dec 27, 2020

Sure, but you are still relying on trust, and you are choosing to limit yourself to things released by your chosen distro. This is the same as if you were to pick a specific docker publisher that you trust, and only use their images.

t-writescode · on Dec 27, 2020

It's arguable that it's not quite the same. It all comes down to consequences.

If a distro messes up the trustworthiness of an application, they, the big and important company loses clout.

If the application developer messes up, they also lose clout - people may stop using their software.

Chances are, if you're using a third party for a third party piece of software that isn't officially dockerized by the company that developed it, nor a major distro, there's no real backlash if it doesn't work or if they get hacked, etc: "it was a third party trick, so _of course_ it wasn't trustworthy" would be the statement everyone makes.

Debian messing up, or Cisco or Oracle, etc, is a much bigger deal.

jillesvangurp · on Dec 27, 2020

Yep, the reality is that we all rely on hundreds of millions of lines of code of software (mostly OSS) that make up our OS, tool chains, libraries, etc. every day. Basically, it's not feasible to even review a meaningful fraction of a percent of that in a lifetime; assuming you even have the skill level to do such a review. In other words, mostly you are blindly trusting other people to have signed off on something and that those people who you don't know personally did a good job of that.

mfontani · on Dec 27, 2020

I'm running a similar setup, whereby I run most applications (even the browser I'm using to type this reply) in docker or podman containers, opportunely created.

Judging from the Git repo containing my dockerfiles, I've been doing so since ~mid June 2018.

I've since automated:

* checking new versions of Git repos, alpine versions, and short crawlers for tools (i.e. I run "perl latest.pl" and a bunch of stuff happens and eventually some dockerfiles might get updated)

* auto-committing any change made from the above step (i.e. ./autocommit.sh) with a meaningful message based on the directory the dockerfile resides, as well as which environment variable containing the version changed

* I use https://github.com/crazy-max/diun/ running on my dokku server to keep up with base images updates (i.e. I get an email in the morning stating alpine:3.12 has been updated or debian:buster-slim or whatever); when a base image changes I have to manually "dp alpine:3.12" to "docker pull" and "podman pull" it; after that, I "make base-images" and my local base images (each coming with a short line to enable a local apt-cache-ng proxy) to also get updated; then a simple "make" makes all of them (docker build -t .... and podman build -t ...)

* Quite a lot of (mostly small) bash scripts to run those images.

As an example, the Dockerfile I use to build hadolint:

    FROM local/mfontani/base:latest AS fetcher
    LABEL com.darkpan.github-check github.com/hadolint/hadolint HADOLINT_VERSION
    ENV HADOLINT_VERSION v1.19.0
    RUN curl -sSL "https://github.com/hadolint/hadolint/releases/download/$HADOLINT_VERSION/hadolint-Linux-x86_64" -o /usr/bin/hadolint
    RUN chmod +x /usr/bin/hadolint && \
        /usr/bin/hadolint --version
    FROM scratch
    COPY --from=fetcher /usr/bin/hadolint /usr/bin/hadolint
    ENTRYPOINT ["/usr/bin/hadolint"]

... and the shell script I use to run it:

    #!/bin/bash
    DOCKER_FLAGS=()
    [[ -t 0 ]] && DOCKER_FLAGS+=(-t)
    podman run --rm --init -i "${DOCKER_FLAGS[@]}" \
        --network none \
        -v "${PWD}:/usr/src:ro" \
        --workdir /usr/src \
        localhost/mfontani/hadolint "$@"

It's not that speedy doing this, but it's... okay:

    $ hadolint curl/Dockerfile
    Took: 0.837s (837ms)

_skel · on Dec 27, 2020

You also have to have plenty of trust in the creators of the applications. Using Docker isn't really different, especially when the creators of the applications have also provided the Dockerfiles. What you had before Docker was just as much based on trust.

eptcyka · on Dec 27, 2020

I have much more trust in the application developers. Thus, I'd trust first party docker images, but repackaged applicaton images would fall out of date more often and remain out of date for longer than most distro packages IMO. And, with distributions, there's a community of maintainers that try to package everything to a standard, with docker, I don't think that's the case.

oauea · on Dec 27, 2020

How is writing a dockerfile any different from installing an application normally? You just write down the steps you'd otherwise have to take.

pavlus · on Dec 27, 2020

TLDR: 1. You already have that on Linux. 2. Make your builds portable.

If you are running mainstream Linux(Debian-based/Arch-based, probably other), most of the Docker profits can be achieved with already installed and configured systemd and your distro's package manager.

Sandboxed? systemd.

Simple, uniform interface? Your distro has packages, and most likely services that can and should be sandboxed already run in systemd after installation, you can tune unit-file if you want, and systemd has security checker, that shows you what application in the sandbox can and can not do, without proxying things the Docker way.

Versions pinning? Pin version with your package manager. Want multiple versions? Check out DebianAlternatives system.

Reproducible? Fix your build/install configs, not the environment. If it builds on your machine, but not on the other, or run flawlessly on one, but not the other, it means you have implicit dependency on the environment, or wrong dependency versions constraints, which you likely don't know about. If you don't know your dependencies, you are definitely shooting blind.

Minimizes global state. Repo-based distros minimize global state by providing software that has most dependencies compatible, so you can have your minimal state and update your software too. Meanwhile with docker you need all dependencies and hope that they will match between images, so you can save on layers reusage. And if someone decides to update base image, and others dont... well, too bad, you have to have both versions of the base image.

Asooka · on Dec 27, 2020

The easy option always wins in the end. You could do all that, or you could write one Dockerfile and not care. Obviously this is not the goal of Docker, but it fulfills a need that many big companies have and will fund its continued development, while also kind of fitting in as a general reproducible environment service. It's not ideal, but the ideal version isn't as easy.

pavlus · on Dec 29, 2020

But then you have to write aliases for mounts and etc. meanwhile repositories come without any additional costs, and sysyemd units aren't more complex than dockerfiles, and usually already written by package maintainers. You are just one `apt install $packagename` away.