LinuxKit: A Toolkit for Building Secure, Lean and Portable Linux Subsystems

jdub · on April 18, 2017

To filter out some of the buzz words: It's a new minimal Linux distro designed to run containers, with a build system that supports various image outputs (VMDK, ISO, cloud images, etc). The "packages" are Docker images. It's pretty easy to configure and build your own images with… you guessed it… YAML.

Very much in the same ball park as Rancher, CoreOS, Atomic, etc.

(It's essentially a rebrand, rethink, and projectisation of "Moby", the distro that ships with Docker for Mac/Windows. Meanwhile, Moby has become the open source project around Docker or something... like Fedora / Red Hat.)

philips · on April 18, 2017

As the name suggests LinuxKit is more of a developer kit than an end-user product like CoreOS Container Linux, Rancher or Atomic.

Importantly updates are not handled by LinuxKit itself[1] but the concept is that that a higher level system or packager might take care of via CloudFormation and an out-of-band re-provisioning method.

It is obviously early days for the LinuxKit project and we will see how it goes. Overall, though I think there is a lot of interest in packaging up system level services inside of containers and I applaud the effort to keep exploring these ideas.

- CoreOS Container Linux has been continually pushing as many services into docker image containers including etcd, kubelet, etc. https://coreos.com/blog/introducing-the-kubelet-in-coreos.ht...

- Systemd community has been exploring portable services https://lwn.net/Articles/706025/

- RancherOS runs as much as possible underneath the Docker Engine and packages up services in docker images

- The Linux Desktop folks at Red Hat have been exploring ideas like http://flatpak.org/

[1] https://github.com/linuxkit/linuxkit/blob/master/docs/faq.md...

bigmac · on April 18, 2017

Importantly updates are not handled by LinuxKit itself[1] but the concept is that that a higher level system or packager might take care of via CloudFormation and an out-of-band re-provisioning method.

This was an explicit omission, at least for now. We left update out of scope because it's better handled by the infrastructure provisioning system (in our case, infrakit). We'll use infrakit to supply updates (and the dm-verity hash, for that matter). Thus we treat infrastructure provisioning system as the trusted 'bootloader' for a cluster of machines. Most datacenter clusters end up having an infrastructure provisioning system, so it makes more sense for those systems to have the OS update responsibility. This ends up meaning less attack surface on the host itself, and serves as a good separation of concerns and least privilege design.

evand · on April 28, 2017

There's also snapcraft: - https://www.youtube.com/watch?v=DLxqdf89hRo - https://snapcraft.io

mintplant · on April 18, 2017

If this is a "distro", is it correct to assume that it would run in a VM on the host system?

xahrepap · on April 18, 2017

Yes. You use the new Moby to build your own distro using a yaml file and you can choose to use LinuxKit as the base. Then you can install additional items on top of it and it outputs various images types: IMG, iso, Google cloud images, etc.

jdub · on April 18, 2017

Yes. Or, as they mention, on metal. Though I imagine they're focused on being a good guest OS for the moment. :-)

justincormack · on April 18, 2017

We have packet.net support that we are planning to use for CI (and they have been very helpful), and some more bare metal in progress (HPE will demo bare metal later today). But yes, we started with VMs, as all the Docker Editions were VM based.

bigmac · on April 18, 2017

For those interested in security in particular, we've outlined the opinions and design decisions here: https://github.com/linuxkit/linuxkit/blob/master/docs/securi...

In short:

Kernel Security Incubator - We want to push linux kernel security as much as possible. In service of that, we want linuxkit to be a place where leading-edge linux kernel security patches can land and incubate. Feature examples are Landlock, Wiregurd, okernel, etc. We'll also incubate KSPP and container hardening improvements, like hardening the kernel eBPF JIT and namespacing the IMA subsystem.

Modern and Securely Configured Kernels - Latest kernel, following all KSPP recommendations.

Minimal Base - No extra dependences, just what's needed to run containerd. Absolutely no package manager.

Type Safe, Containerized System Daemons - many linux privescs happen due to escalations using root system daemons. These daemons should be written in typesafe language like OCaml and Rust. We have an Ocaml dchpcd and look to invite more. If you're convinced by https://tonyarcieri.com/it-s-time-for-a-memory-safety-interv..., linuxkit is a place to contribute to the solution.

Built With Hardened Toolchains and Containers - uses notary signing for all dependencies and intermediate builds, uses musl libc for hardened libc implementation + hardened compiler options for building system packages.

Immutable Infrastructure - Linuxkit follows the principle of immutable infrasructure. The filesystem contains a read-only root FS and boots with dm-verity. Trusted boot via infrakit + notary hash lookup is a next step.

All in all, this multi-pronged approach should lead to a much more secure linux base. As is our tradition, we will track progress here: https://github.com/linuxkit/linuxkit/blob/master/docs/securi..., where we'll catalog Linux CVEs and how LinuxKit holds up.

zx2c4 · on April 18, 2017

For those who don't know, WireGuard -- https://www.wireguard.io/ -- is a next generation secure network tunnel for the Linux kernel. It was designed with many of Docker's requirements in mind, so I think it's a great match.

hdhzy · on April 19, 2017

This sounds just like RancherOS with strong security and focus on immutability. Are Linux kit truly and completely immutable? That means it's not good fit for a database host OS?

cpuguy83 · on April 19, 2017

A database should not affect the OS in any way.

To run a database, attach non-ephemeral storage and write data there.

0xbadcafebee · on April 18, 2017

tl;dr they made a 35MB busybox initrd for containers.

Basically they wanted to ship containers everywhere in a "lightweight" way, so they threw away the OS. They use multiple justifications for this. Using containers means "you don't need packages anymore". Using immutable servers means "you don't need configuration management anymore". Using an initrd and containers means "you don't need an operating system anymore". Spoiler alert: systemd is gone. Oh, and they use both YAML and JSON for different things.

They say that for security reasons, you can't upgrade the live initrd, you have to boot a brand new initrd. They don't mention how they're going to stop memory-resident attacks, so basically this is just an annoying way to avoid having to provide a way for the user to upgrade their initrd in real time.

"It is encouraged to consider the notion of "reverse uptime" when deploying LinuxKit - because LinuxKit is immutable, it should be acceptable and encouraged to frequently redeploy LinuxKit nodes."

Expect your nodes to go down all the time. Got it.

They talk up kernel security a lot but I don't see anything about live patching (but that's not immutable, so I guess it's bad by default). Oh, and they want to rewrite the initrd daemons in Go and Rust, because dhcp gets owned all the time, and buffer overflows are the only exploits that ever happen, and C is, like, really hard, man. Spoiler alert: expect your system apps to be buggy.

StreamBright · on April 18, 2017

I think the biggest reason is the systemd-docker skirmish. Alpine already offers many of the things you need as a lightweight host os, even though it has some shortcomings. I was surprised to find out that Amazon Linux is also systemd-free. For me it is the perfect host os.

djsumdog · on April 19, 2017

I agree this seems overly complicated, however I am for anything that helps get away from systemd.

> Expect your nodes to go down all the time. Got it.

When I deploy containers, our system (marathon on DCOS) does rolling updates where it scheduled new nodes and then gets rid of the old ones in a rolling/rotating fashion. That's how high availability works. If your nodes go down in a controlled fashion for updates, it shouldn't be a problem.

0xbadcafebee · on April 19, 2017

HA was never about replacing whole nodes just to make minor changes. Rolling updates, sure, when the system isn't complex enough to do updates in place.

The thing is, initrd's are always read-only anyway. But i'm annoyed that they're trying to justify this by saying it's good for security or more reliable, etc. It's just a legacy of the old design. And they have tools built into it to let it be managed remotely - but refuse to let this configure it live? So it can still be owned, but just not in a persistent state (unless you find persistent storage).

sagichmal · on April 19, 2017

Rolling updates of immutable artifacts is, on balance, vastly preferable to updating software in-place with e.g. patches.

0xbadcafebee · on April 19, 2017

And is there some evidence-based research or white paper that justifies that claim? Because it sounds like the cargo cult of Google container networks.

gtirloni · on April 18, 2017

If I understood this right, it's playing in the same area as RancherOS, Container Linux, Project Atomic, etc, but with a even smaller footprint and more customizable. I like it.

As an end user, I would like to understand what are the incentives to keep this going in the long term (basically a community around it and security updates).

secure · on April 18, 2017

If I understand correctly, this is very similar to https://gokrazy.org/, except for a different unit of granularity and scope: gokrazy packages up Go packages into a bootable image for the Raspberry Pi 3, whereas linuxkit uses containers and targets anything that can boot it.

(disclaimer: I wrote gokrazy)

esMazer · on April 18, 2017

out of topic: do you know if gocrazy works well on the Raspberry Zero?

DannyB2 · on April 18, 2017

The site says Pi 3 only.

secure · on April 18, 2017

That’s correct. See also https://www.reddit.com/r/golang/comments/5xgf8u/gokrazy_a_pu...

Notably, the Raspberry Pi Zero W has a different CPU (BCM2835, like the original Raspberry Pi) and hence architecture than the Raspberry Pi 3. See https://www.raspberrypi.org/magpi/pi-zero-w/ and https://en.wikipedia.org/wiki/ARM11

y2kenny · on April 19, 2017

I am not sure if it's apt to compare this to a distro. Isn't this more like the yocto project[0] or buildroot[1]? Can someone familiar with those two projects compare them with this one?

[0]https://www.yoctoproject.org/ [1]https://buildroot.org/

craig_peacock · on April 18, 2017

If I am understanding this right, this could be one of the most exciting developments for Linux in a very long time! The closest comparison I can draw is with one of my favourite OS's Genode, except this could be even better!:

https://genode.org/about/index

thresh · on April 18, 2017

So will this replace alpine which is for some reason much loved in docker world?

u320 · on April 18, 2017

No Alpine is mostly used as a guest OS, and this is meant to be used as the host OS.

vidarh · on April 19, 2017

The "some reason" is very simple: It's tiny. If you want to build a container image and want it to be as small as possible, but want a bit more flexibility than e.g. just busybox or just your app, it's very convenient.

StreamBright · on April 18, 2017

It is also loved in the security world, and in the systemd-free world.

djsumdog · on April 19, 2017

I switched to Void Linux to get back into the systemd free world. So far I'm pretty happy with it.

Jare · on April 18, 2017

Just went to try it and was somewhat disappointed that it requires docker installed locally in order to run moby (I do not have it on my Win10 due to HyperV). Is that just for pulling the containers from the docker registry?

justincormack · on April 18, 2017

Yes, the build currently requires Docker. We will remove the requirement for building the container root filesystems soon, by using the libraries for containerd. Some but not all of the output formats also need Docker at present, for example for building filesystem images.

Jare · on April 18, 2017

Makes sense, thanks!

bharatkhatri14 · on April 18, 2017

Is there a list of CPU architectures / SoCs supported?

BuuQu9hu · on April 18, 2017

Getting tired of this shit.

"Secure", as a word, does not make things secure. "Container", as a word, also does not make things secure.

Sandboxes come in two flavors: Correct by construction, and exploitable. Which flavor is this system?

What makes this better than NixOS? Than Genode? Than Qubes? Where is the actual security writeup? Where's the explanation of the security model for the system? How would I write code which takes advantage of structural security in the system?

Edit: There's a writeup here: https://github.com/linuxkit/linuxkit/blob/master/docs/securi... And the Mirage design is here: https://github.com/linuxkit/linuxkit/blob/master/projects/mi...

In short, to answer my earlier questions, the sandboxing is undocumented, comparisons to other security-oriented setups are omitted, the limit of structural security is OCaml's type system... This seems like an interesting effort but I am disappointed that it seems more like lip service than an attempt to actually improve on the state of security design.

CSDude · on April 18, 2017

I really thought of this before, we package apps to our customers with a USB because they work on the intranet. I always thought make something easy ISO installable without using Ubuntu Kickstart/Preseed nightmare. It would really solve many hours for us.