It may be quipped that Docker or Kubernetes is remote code execution (RCE) as a service!
The vulnerability pointed out here is server operators who have exposed Docker's API to the public internet, allowing anyone to run a container. The use of a privileged container is just icing on the cake, and probably not nessecary for a cryptominer.
A more subtle and interesting attack vector is genuine container imags that become compromised, much like any other package manager, particularly ones that search for the Docker API on the host's internal IP address, hoping only public network access is firewalled.
I know you can reference images via the hash which no one ever does.
It’d be nice if container registries where immutable so when you pull the image by tag, it’s the exact same image every time. As it stands the image you pull today might not be the image you pulled yesterday (unless you pull by hash).
For private non public apps I’m a huge fan of tagging containers with the git short hash. Every build produces a brand new artifact, no previous image gets modified. It’s easy to know precisely what is deployed by checking what was in the source repo for that commit. It means a bit more effort around deployment automation but generally seems less problematic than pulling the latest image for a mutable tag. As it’s by convention though a malicious actor can go overwrite a previous tag so doesn’t prevent poisoning.
Eh. This advice is less practical than it’s made to seem. Like it “works” but it’s not really usable for anything other than connecting two privileged apps over a hostile network.
* Docker doesn’t support CRLs so any compromised cert means reissuing everyone’s cert.
* Docker’s permissions are all or nothing without a plug-in. And if you’re going that route the plug-in probably has better authentication.
* Docker’s check is just “is the cert signed by the CA” so you have to do one CA per machine / group homogeneous machines.
* You either get access to the socket or not with no concept of users so you get zero auditing.
* Using SSH as transport helps but then you have to also lock down SSH which isn’t impossible but more work and surface area to cover than feels necessary. Also since your access is still via the Unix socket it’s all or non permissions again.*
CT (Certificate Transparency) is another approach to validating certs wherein x.509 cert logs are written to a consistent, available blockchain (or in e.g. google/trillian, a centralized db where one party has root and backup responsibilities also with Merkle hashes for verifying data integrity). https://certificate.transparency.dev/https://github.com/google/trillian
Does docker ever make the docker socket available over the network, over an un-firewalled port by default?
Docker Swarm is one config where the docker socket is configured to be available over TLS.
> Run `docker swarm ca --rotate` to generate a new CA certificate and key. If you prefer, you can pass the --ca-cert and --external-ca flags to specify the root certificate and to use a root CA external to the swarm. Alternately, you can pass the --ca-cert and --ca-key flags to specify the exact certificate and key you would like the swarm to use.
Docker ("moby") and podman v3 socket security could be improved:
> > > This role does exactly that: it launches two containers, a traefik one and another to securely provide limited access to the docker socket. It also provides the necessary configuration.
This looks like the attacker is just using a publicly exposed Docker API honeypot to run a new Docker container with `privileged: true`. I don't see why that's particularly interesting given that they could just bind-mount the host / filesystem with `pid: host` and do pretty much whatever they want?
It would be much more interesting to see a remote-code-execution attack against some vulnerable service deploying a container escape payload to escalate privileges from the Docker container to the host.
Exposed on the WAN is obviously bad, but how do you keep your own containers from calling those APIs? Yes, you don't mount the docker socket in the container, but what about the orchestrator APIs?
With kubernetes, you enable RBAC, and only allow each pod the absolute minimum of access required.
In my own setup, I have only five pods with any privileges: kubernetes-dashboard, nginx-ingress, prometheus, cert-manager, and the gitlab-runner, and of those only kubernetes-dashboard can modify existing pods in all spaces, cert-manager can only write secrets of its own custom type, and gitlab-runner can spawn non-privileged pods in its own namespace (but not mount anything from the host, nor any resources from any other namespace).
And when building docker images in the CI, I use google’s kaniko to build docker images from within docker without any privileges (it unpacks docker images for building and runs them inside the existing container, basically just chroot).
All these APIs are marked very clearly and obviously as DO NOT MAKE AVAILABLE PUBLICLY. If you still make them public, well it’s pretty much your own fault.
Does RBAC not limit these by default? Does cert-manger not already give itself restricted permission on install? Do I need to fix up my cluster right now? If so, do you have any example RBAC yamls? :D
> And when building docker images in the CI, I use google’s kaniko to build docker images from within docker without any privileges (it unpacks docker images for building and runs them inside the existing container, basically just chroot).
You can also use standalone buildkit which comes with the added benefit of being able to use the same builder locally natively.
No RBAC doesn't automatically do this. And many publicly available Helm charts are missing these basic security configurations. You should use Gatekeeper or similar to enforce these settings throughout your cluster.
I have, and it's great/you forget about it pretty quickly. I've actually gone (rootful?)docker->(rootless)podman->(rootless)docker, and I've written about it:
I still somewhat long after podman's lack of a daemon but some of the other issues with it currently leave it completely out for me (now that docker has caught up) -- for example lack of linking support.
Podman works great outside of those issues though, you just have to be careful if it does something in a way that expects docker semantics (usually the daemon or an unix socket on disk somewhere). You'll also find some cutting edge tools like dive[0] also have some support for podman[1] but it's not necessarily the usual case.
At $work, developers don't get root on their Linux workstations (for annoying legacy reasons that I'm sort of personally trying to fix, but the major one is NFS). We gave people rootless Docker and it seems to work pretty well for the goal of letting people try out software that's most easily packaged as a Docker container. If it seems promising we'll want to package it better, integrate it with our access control systems, etc., but rootless Docker gives users a way to evaluate whether that work is worth doing at all.
The default and oldest auth method for NFS, "sec=sys," was designed with the assumption that servers and clients are both trusted (sysadminned by the same people) and had the same set of user accounts. Servers enforce that client connections only come from privileged ports, and they trust whatever UID the client says it's using. This works in concert with the traditional UNIX restriction that only root can bind to ports under 1024, including initiating connections where the client port is under 1024.
In this model, giving users root would let them su to arbitrary UIDs on the client and impersonate other users to the server. (Alternatively, it would also let them run their own NFS client on a low port and do whatever they want, too.)
This does lend itself to a very simple and efficient design, since all you're doing is transmitting a single integer over the wire to identify yourself, and the whole connection is in plaintext, authenticated only by the source port. For the HPC / cluster computing use cases where NFS is popular, the efficiency and scalability of that scheme is important. There are better authentication methods (Kerberos, notably, which also adds optional encryption), and other ways to design your NFS architecture, but they're much more operationally complicated and commercial NAS devices tend to work best with the sec=sys approach. Also, public cloud NFS-as-a-service options tend to only support sec=sys (https://cloud.google.com/filestore/docs/access-control, https://docs.aws.amazon.com/efs/latest/ug/accessing-fs-nfs-p..., https://docs.microsoft.com/en-us/azure/storage/files/storage..., etc.).
We are trying to figure out how to solve this, as I mentioned, but when dealing with an organization that has decades of workflows and code assuming a traditional UNIX environment with shared multi-user machines, there's no instant solution to it. (In many cases our solution is going to be to stop using NFS and use something more object-storage shaped, which will also help us move to idiomatic public-cloud designs.)
Has anyone told them yet that just plugging your own computer into the network lets you get root anyway?
And yes you are right, there is no solution other than to stop using NFS. Maybe Samba with Kerberos domain-joined hosts, but still probably not a great solution.
Yes. That's why the NFS servers have IP ACLs, why the office networks have 802.1x to get onto the corporate VLAN, why access to the datacenters is physically restricted, and why getting to our cloud accounts requires authenticating to a bastion.
Setting up an IP ACL to known NFS clients is pretty straightforward and doesn't impact the performance characteristics of sec=sys.
(And you should be doing the remainder of those anyway - are you really telling me that in a non-NFS environment, you wouldn't mind an interviewee or guest plugging in their laptop and seeing what they can get to? There are no unauthenticated resources at all on your network?)
There are no unauthenticated resources on my network because all the resources are in the cloud. The only thing that's local is the network gear. There's still some security paranoia where security requires we make internal services not publically routable, but I'm pushing for a zero trust model (mainly because they have us using ZScaler which is a piece of garbage)
You can use standalone rootless buildkit building locally/natively and in cluster/container. You can use buildkit through docker too but docker is packing a runtime so standalone buildkit in cluster/container.
For normal web/console application development it isn't common to run containers with `--priviledged`. What are the use cases where doing this is required?
I think if you run a container which in turn runs docker in a way that it can modify "sibling" containers on the host, it needs to be privileged. This can be practical for things like CI/CD, as you can use a docker container to build an updated image and then restart the service that uses that image.
There might be a way to do the same without --privileged, I'm not sure.
So... None. If you need capabilities, you should add them back individually. Networking-related ones are pretty common (net admin and the raw socket one)
Does anyone know if there is a website that hosts the major containers like Docker openly with public shell access to prove security? Perhaps they could be assigned a rating of the number of hours they lasted before someone escaped to host. It seems to me that we need a registry of ONLY containers and images that have stood the test of time.
The page would provide a shell which anyone could use to try to break out of the container. They'd have use of any tools that an exploited container would have, including apt-get and even root access. Ideally all ports would be closed and there would be no host network or volume access. But there could be tiers for each level of communication available.
This would give us one of two outcomes:
1) Secure containers and images would be found.
Or more likely (due to Spectre, Heartbleed, Meltdown, Rowhammer, etc, etc, etc, not to mention the countless kernel exploits and unpublicized hacks):
2) We'd find that there are currently no secure containers or images and we're all living in a fantasy.
My feeling is that #2 will be the outcome, but that there's nothing mathematically insecure about containers. It's just that we've lived under conceptually incorrect paradigms for so long that it's easier to buy into denial than decide to start over and rewrite the bad parts.
I agree strongly with number 2, and characterize it as, “there’s no such thing as a container”.
It clarifies security thinking if you pretend you don’t have a container, but instead, you’ve got a new kind of tar file, some namespacing, some niceness, iptables, and some convenience aliases:
None of it’s magic, and none of it brings new security guarantees. There’s just the stuff hosts had, that’s what they still have, so however you secured a process under that stuff, you still must do it to a process under this stuff, just now against an extra pile of abstractions and duplicate OS cruft.
Devs are mad at the security team because they want to inexplicably pile another dubious OS into with apt-getted software of dubious third party libs somehow expecting the whole thing to be safer, and the security team reaction is to want to “scan” the pile of nonsense.
Both groups, and the vendors exploiting them, desperately do not want to grapple with the implications of your point 2.
This position is not popular.
As for point 1, it could be argued Google is continuing this experiment, and learning from it, as seen in more than one CVE last year addresses obtaining root on GKE:
While gVisor and Firecracker are fantastic, I’d argue for a real “belt and suspenders” instead of just more belts. Most likely it’s better to get actually outside the host metal, stop counting on software.
Perhaps the best known commercialization of a custom hardware approach that’s readily available to end users is AWS Nitro, with an ok-if-markety backgrounder here:
You might be surprised by the number of developers who get really upset if their workplace doesn't allow "pull random containers off the Internet" as part of their workflow.
Well if you don't want to let devs run arbitrary code off the internet on their machines, that cuts off more than Docker Hub, it cuts off almost every package manager under the sun.
If I had to work under such a restriction, I would ask for a cheap spare machine, running on a guest network and hosting no sensitive code, where I could download and try random packages off the internet before I could submit them for audit, approval and vendoring.
That's good. But I don't think the major cloud providers make it very obviously, either way. And when something's not clear, often the answer isn't good.
The vulnerability pointed out here is server operators who have exposed Docker's API to the public internet, allowing anyone to run a container. The use of a privileged container is just icing on the cake, and probably not nessecary for a cryptominer.
A more subtle and interesting attack vector is genuine container imags that become compromised, much like any other package manager, particularly ones that search for the Docker API on the host's internal IP address, hoping only public network access is firewalled.