> If your "highly secure" environment relies on docker for isolation (a software...

kentonv · on Feb 21, 2018

As the author of Sandstorm, I'm going to say: "ehh... maybe."

There's still some big differences between Sandstorm and Docker's sandboxes -- not as big as there used to be, but non-trivial. For example, Docker mounts /proc, albeit read-only; Sandstorm doesn't mount it at all. Docker lets the container talk to the network; Sandstorm gives it only a loopback device and forces all external communication through one inherited socket speaking Cap'n Proto. In general, Sandstorm is willing to break more things to reduce attack surface; Docker puts more priority on compatibility.

So is it "secure"? That's not a yes/no question. Security is about risk management; you can always reduce your risk further by jumping through more hoops. There will certainly be more container breakout bugs found in the future, including bugs affecting Docker and affecting Sandstorm. There will probably be more that affect Docker, because its attack surface is larger.

There will also be VM breakouts. VM breakouts appear to be less common than container breakouts, but not by as much as some people seem to think. Anyone who makes a blanket statement that VMs are secure and Docker is not does not know what they are talking about.

The only thing I'd feel comfortable saying is "totally secure" is a computer that is physically unable to receive or transmit information (airgapped, no microphone/speaker, etc.), but that's hardly useful.

In general, if you are going to run possibly-malicious code, the better way to reduce your risk is not to use a better sandbox, but to use multiple layered sandboxes. The effect is multiplicative: now the attacker must simultaneously obtain zero-days in both layers. For example, Google Chrome employs the V8 sandbox as well as a secondary container-based sandbox.

In Sandstorm's case, the "second layer" is that application packages are signed with the UI presenting to you the signer's public identity (e.g. Github profile) at install time, and the fact that most people only let relatively trustworthy users install apps on their Sandstorm servers. (For Sandstorm Oasis, the cloud version that anyone can sign up to, application code runs on physically separate machines from storage and other trusted code, for a different kind of second layer.)

bthornbury · on Feb 21, 2018

Thanks for commenting here!

I was inspired by sandstorm's supervisor.c++ while looking into container security which actually eventually led me to this issue.

> For example, Docker mounts /proc, albeit read-only;

I can't find too much information on this, does docker mount /proc from the host by default in each of the containers?

> In general, if you are going to run possibly-malicious code, the better way to reduce your risk is not to use a better sandbox, but to use multiple layered sandboxes. The effect is multiplicative: now the attacker must simultaneously obtain zero-days in both layers. For example, Google Chrome employs the V8 sandbox as well as a secondary container-based sandbox.

I think that sandboxing the language is pretty tough in the general case, since every language takes a lot of effort. In non-managed, compiled languages, it will be even tougher.

What is your opinion on tools like SELinux as a secondary layers?

paulfurtado · on Feb 21, 2018

> I can't find too much information on this, does docker mount /proc from the host by default in each of the containers?

I'm curious what kentonv has to say, but on modern kernels docker can make use of PID namespaces, so /proc only shows PIDs from the container's PID namespace. That said, it does still provide several information leaks like /proc/self/mountinfo showing which host directories are mounted where in the container.

In addition to PID namespaces, another isolation gotcha is users, and docker does not enable user namespaces by default since it's a relatively new kernel feature and it breaks backwards compatibility (ex: kubernetes doesn't yet support them). A good example of this in practice that many people hit issues with in the past was ulimits: if UID 1000 in a container exceeds a ulimit, it also affects UID 1000 in every other container. Docker solved this by setting ulimits to unlimited on the docker daemon process, which are then inherited by containers (this also happens to be good for performance). User namespaces are one of the big recent improvements to container security.

kentonv · on Feb 21, 2018

> I can't find too much information on this, does docker mount /proc from the host by default in each of the containers?

Sorry, I should have clarified: It's /proc for the specific PID namespace. So in theory it doesn't leak anything bad. The problem is that it's a huge attack surface; there have been bugs in /proc before.

> I think that sandboxing the language is pretty tough in the general case, since every language takes a lot of effort. In non-managed, compiled languages, it will be even tougher.

WebAssembly sandboxes non-managed compiled languages pretty well. :)

> What is your opinion on tools like SELinux as a secondary layers?

IMO it doesn't help very much, because many of the kinds of kernel bugs that allow you to escape a container tend to allow you to escape SELinux as well.