I'm quite interested in seeing where slim VM's go. Personally I don't use Kubernetes, it just doesn't fit my client work which is nearly all single-server and it makes more sense to just run podman systemd units or docker-compose setups.
So from that perspective, when I've peeked at firecracker, kata containers, etc, the "small dev dx" isn't quite there yet, or maybe never will get there since the players target other spaces (aws, fly.io, etc). Stuff like a way to share volumes isn't supported, etc. Personally I find Dockers architecture a bit distasteful and Podmans tooling isn't quite there yet (but very close).
Honestly I don't really care about containers vs VMs except the VM alleges better security which is nice, and I guess I like poking at things but they're were a little too rough for weekend poking.
Is anyone doing "small scale" lightweight vm deployments - maybe just in your homelab or toy projects? Have you found the experience better than containers?
> So from that perspective, when I've peeked at firecracker, kata containers, etc, the "small dev dx" isn't quite there yet, or maybe never will get there since the players target other spaces (aws, fly.io, etc). Stuff like a way to share volumes isn't supported, etc. Personally I find Dockers architecture a bit distasteful and Podmans tooling isn't quite there yet (but very close).
This is pretty much me and my homelab. I haven't visited it in a while, but Weave Ignite might be of interest here. https://github.com/weaveworks/ignite
They went to trash because containers are more convenient to use and saving few MBs of disk/memory is not what most users care.
The whole idea was pretty much either use custom kernel (which inevitably have way less info on how to debug anything in it), and re-do all of the network and storage plumbing containers already do via the OS they are running one.
OR just very slim linux one which at least people know how to use but STILL is more complexity than "just a blob with some namespaces in it" and STILL requires a bunch of config and data juggling between hypervisor and VM just to share some host files to the guest.
Either way to get to the level of "just a slim layer of code between hypervisor and your code" you need to do a quite a lot of deep plumbing and when anything goes wrong debugging is harder. All to get some perceived security and no better performance than just... running the binary in a container.
It did percolate into "slim containers" idea where the container is just statically compiled binary + few configs and while it does have same problems with debuggability, you can just attach sidecart to it
I guess next big hype will be "VM bUt YoU RuN WebAsSeMbLy In CuStOm KeRnEl"
Virtualization is not just "perceived" security over containerization. From CPU rings on down, it offers dramatically more isolation for security than containerization does.
This isn't about 'what most users care' about either. Most users don't really care about 99% of what container orchestration platforms offer. The providers do absolutely care that malicious users cannot punch out to get a shell on an Azure AKS controller or go digging around inside /proc to figure out what other tenants are doing unless the provider is on top of their configuration and regularly updates to match CVEs.
"most users" will end up using one of the frameworks written by a "big boy" for their stuff, and they'll end up using what's convenient for cloud providers.
The goal of microvms is ultimately to remove everything you're talking about from the equation. Kata and other microvm frameworks aim to be basically jsut another CRI which removes the "deep plumbing" you're talking about. The onus is on them to make this work, but there's an enormous financial payoff, and you'll end up with this whether you think it's worthwhile or not.
Running a VM is not necessary more secure than running a container. By definition VM allows to run an untrusted kernel code which makes exploiting curtain bugs more feasible. This is especially so with hardware bugs.
What is fundamentally more secure is running applications inside a VM which is what Amazon is doing. The attacker then has to first exploit the kernel before trying to escape the hypervisor.
in a related vein, most of the distinctions that are being brought up around containers vs vms (pricing, debugability, tooling, overhead) are nothing fundamental at all. they are both executable formats that cut at different layers, and there is really no reason why features of one can't be easily brought to the other.
operating above these abstractions can save us time, but please stop confusing the artifacts of implementation with some kind of fundamental truth. its really hindering our progress.
How would you compare the security of running in wasmer vs the other two options. I know it is a bit apples and oranges. Just curious if it would be harder to break out of a wasm sandbox, or a vm.
I'm not sure what you mean with regards to eBPF but the difference between a container and a VM is massive with regards to security. Incidentally, my company just published a writeup about Firecracker: https://news.ycombinator.com/item?id=32767784
Let me know when eBPF can probe into ring-1 hypercalls into a different kernel other than generically watching timing from vm_enter and vm_exit.
Yes, there is a difference between "eBPF can probe what is happening in L0 of the host kernel" and "you can probe what is happening in other kernels in privileged ring-1 calls".
It is pretty obviously not the case that eBPF means shared-kernel containers are comparably as secure as VMs; there have been recent Linux kernel LPEs that no system call scrubbing BPF code would have caught, without specifically knowing about the bug first.
Except stuff like Kata Containers exist exactly because containers alone is not enough.
WebAssembly on the cloud is only re-inventing what other bytecode based platforms have been doing for decades, but VC need ideas to invest into apparently.
I've been using containers since 2007 for isolating workloads. I don't really like Docker for production either because of the network overhead with the "docker-way" of doing things.
LXC/LXD use the same kernel isolation/security features Docker does - namespaces, cgroups, capabilities etc.
After all, it is the kernel functionality lets you run something as a container. Docker and LXC/LXD are different management / FS packaging layers on top of that.
It's the same stuff - namespaces, etc. But it doesn't shove greasy fingers into network config like docker. More a tooling question/approach than tech.
I do not think Docker is the end game. Firecracker/kata containers neither.
https://github.com/tensorchord/envd is my favorite container tool. It provides the build language based on Python and the build is optimized for this scenario.
There may be more runtime and image build tools for both containers and VMs. And different tools may be designed for different purposes. That's the future I believe.
It's still clickbaity, but the title implies a comparison between a very lightweight VM and a heavy-weight container (presumably a container based on a full Linux distro). You could imagine an analogous article about a tiny house titled "my house is smaller than your apartment".
Not to mention, in the paper, the lightvm only had an advantage on boot times. Menory usage was marginally worse than docker, even with the unikernel, and debian on lightvm was drastically worse for cpu usage than docker (the unikernel cpu usage was neck and neck with the debian docker contaner).
I could see it being an improvement over other VM control planes, but docker still wins in performance for any equivalant comparisons.
I would say that firecracker VMs are not more lightweight than Linux containers.
Linux containers are essentially the separation of Linux processes via various namespaces e.g. mount, cgroup, process, network etc. Because this separation is done by Linux internally there are not that many overheads.
VMs provide a different kind of separation one that is arguably more secure because it is backed up hardware -- each VM thinks it has the whole hardware to itself. When you switch between the VM and the host there is quite a heavyweight context switch (VMEXIT/VMENTER in Intel parlance). It can take a long time compared to just the usual context switch from one Linux container (process) to another host (process) or another Linux container (process).
But coming back to your point, no firecracker VMs are not lighter/lightweight than a Linux container. They are quite heavyweight actually. But the firecracker VMM is probably the most nimble of all VMMs.
This reminds me: in 2015 I went to Dockercon and one booth that was fun was VMWare's. Basically they had implemented the Docker APIs on top of VMWare so that they could build and deploy VMs using Dockerfiles, etc.
I've casually searched for it in the past and it seems to not exist anymore. For me, one of the best parts of Docker is building a docker-image (and sharing how it was done via git). It would be cool to be able to take the same Dockerfiles and pivot them to VMs easily.
They might have built it into Google Anthos as part of their partnership. I recall seeing a demo where you could deploy & run any* VMWare image on Kubernetes without any changes
What is your theory for why Docker won and Vagrant didn't?
Mine is that all of the previous options were too Turing Complete, while the Dockerfile format more closely follows the Principle of Least Power.
Power users always complain about how their awesome tool gets ignored while 'lesser' tools become popular. And then they put so much energy into apologizing for problems with the tool or deflecting by denigrating the people who complain. Maybe the problem isn't with 'everyone'. Maybe Power Users have control issues, and pandering to them is not a successful strategy.
What turned me off from Vagrant was that Vagrant machines were never fully reproducible.
Docker took the approach of specifying images in terms of how to create them from scratch. Vagrant, on the other hand, took the approach of specifying certain details about a machine, then trying to apply changes to an existing machine to get it into the desired state. Since the Vagrantfile didn't (and couldn't) specify everything about that state, you'd inevitably end up with some drift as you applied changes to a machine over time -- a development team using Vagrant could often end up in situations where code behaved differently on two developers' machines because their respective Vagrant machines had gotten into different states.
It helped that Docker images can be used in production. Vagrant was only ever pitched as a solution for development; you'd be crazy to try to use it in production.
Docker is not fully reproducible either. Try building a Docker image from two different machines and then pushing it to a registry. It will always overwrite.
Ooooh, “it will always overwrite.”: this is like saying an indirect way of saying that your executable will behave exactly the same if it got overwritten (by a same set of bytes).
I’ve read bug threads in the moby GitHub where they reject a feature because it’s not repeatable, while people point out that Docker files start with apt-get so they aren’t repeatable from layer 2 anyway. The team members don’t seem to hear them and it’s frustrating to watch.
Images are repeatable. We like repeatable images. That’s enough for most of us. Don’t break that and we’re good. Just fix build time bullshit please.
I mean ... yes Vagrant does offer that, but no would I ever consider Vagrant configuration anything approaching a replacement for docker configuration.
In fairness to this paper, it was written and published before that Firecracker article (2017 vs 2018). From another paper on Firecracker providing a bit of history:
> When we first built AWS Lambda, we chose to use Linux containers to isolate functions, and virtualization to isolate between customer accounts. In other words, multiple functions for the same customer would run inside a single VM, but workloads for different customers always run in different VMs. We were unsatisfied with this approach for several reasons, including the necessity of trading off between security and compatibility that containers represent, and the difficulties of efficiently packing workloads onto fixed-size VMs.
And a bit about the timeline:
> Firecracker has been used in production in Lambda since 2018, where it powers millions of workloads and trillions of requests per month.
No, it's using a modified version of the Xen hypervisor and the numbers they show are boot times and memory usage for both unikernels and pared down Linux systems (via tinyx). It's described in the abstract:
> We achieve lightweight VMs by using unikernels for specialized applications and with Tinyx, a tool that enables creating tailor-made, trimmed-down Linux virtual machines.
The issue with unikernels and things like Firecracker are that you can't run them on already-virtualized platforms
I researched Firecracker when I was looking for an alternative to Docker for deploying FaaS functions on an OpenFaaS-like clone I was building
It would have worked great if the target deployment was bare metal but if you're asking a user to deploy on IE an EC2 or Fargate or whatnot, you can't use these things so all points are moot
This is relevant if you're self-hosting or you ARE a service provider I guess.
(Yes, I know about Firecracker-in-Docker, but I mean real production use)
This is a limitation of whatever virtualized instance you're running on, not Firecracker itself. Firecracker depends on KVM, and AWS EC2 virtualized instances don't enable KVM. But not all virtualized instance services disable KVM.
Obviously, Firecracker being developed by AWS and AWS disabling KVM is not ideal :)
Google Cloud, for instance, allows nested virtualization, IIRC.
Ive used GCP nested virtualization. You pay for that overhead in performance so I wouldn't recommend it without more investigation. We were trying to simulate using luks and the physical is key insert / removal. Would have used it more if we could get GPU passthrough working
Yeah but imagine trying to convince people to use an OSS tool where the catch is that you have to deploy it on special instances, only on providers that support nested virtualization
Not a great DX, haha
I wound up using GraalVM's "Polyglot" abilities alongside it's WASM stuff
eh, I think most of the reason we don't see NV support on most public cloud instances is that nobody has come up with a compelling OSS tool that needs it. afaik most providers disable it because they want to discourage you from subletting their instances lol.
> The issue with unikernels and things like Firecracker are that you can't run them on already-virtualized platforms
I'm not sure about Firecrackers and it has to be enabled on the platform level but there are tricks you can use to run pseudo-nested VMs that look like they are nested without actually incurring into any nested virtualization overhead. This is something that some of us (crosvm) are exploring and playing around with a virtio-vhost-user (VVU) [0] proxy to basically set up a "sibling" (not nested) VM that runs its virtio device drivers into another VM without incurring into nested virtualization issues.
Firecrackers being originally based on a fork of crosvm might even benefit from this if they ever decide to play around this space, but also it'd need to be enabled on the platform level.
It's funny seeing stuff like a VVU implementation of vhost-vsock effectively run on a second VM and provide what seems to be a nested virtualization environment between two completely separate VMs. You get funky stuff like the other sibling VM's host address (cid = 2) becomes a non-parent VM so it looks 100% like a nested VM without nested virtualization performance hits.
This is a very common misunderstanding in how these actually get deployed in real life.
Disclosure: I work with the OPS/Nanos toolchain so work with people that deploy unikernels in production.
When we deploy them to AWS/GCP/Azure/etc. we are not managing the networking/storage/etc. like a k8s would do - we push all that responsibility back onto the cloud layer itself. So when you spin up a Nanos instance it spins up as its own EC2 instance with only your application - no linux, no k8s, nothing. The networking used is the networking provided by the vpc. You can configure it all you want but you aren't managing it. Now if you have your own infrastructure - knock yourselves out but for those already in the public clouds this is the preferred route. We essentially treat the vm as the application and the cloud as the operating system.
This allows you to have a lot better performance/security and it removes a ton of devops/sysadmin work.
Containers should really be viewed as an extension of packages (like RPM) with a bit of extra sauce with the layered filesystem, a chroot/jail and cgroups for some isolation between different software running on the same server.
Back in 2003 or so we tried doing this with microservices that didn't need an entire server with multiple different software teams running apps on the same physical image to try to avoid giving entire servers to teams that would be only using a few percent of the metal. This failed pretty quickly as software bugs would blow up the whole image and different software teams got really grounchy at each other. With containerization the chroot means that the software carries along all its own deps and the underlying server/metal image can be managed seperately, and the cgroups means that software groups are less likely to stomp on each other due to bugs.
This isn't a cloud model of course, it was all on-prem. I don't know how kubernetes works in the cloud where you can conceivably be running containers on metal sharing with other customers. I would tend to assume that under the covers those cloud vendors are using Containers on VMs on Metal to provide better security guarantees than just containers can offer.
Containers really shouldn't be viewed as competing with VMs in a strict XOR sense.
I don’t remember where i read it but as far as i know when using Fargate to run containers (with k8s or ecs) AWS will just allocate an ec2 instance for you. Your container will never run on the same vm as other customer. This explain i think the latency you can have to start a container. To improve that you need to handle your own ec2 cluster with an autoscaling group
ASlso, docker to me seems quite leaky in terms of abstractions (especially the networking side) compared to something like a freeBSD jail.
why does docker networking need to be infinitely complex? freebsd solved this issue far more cleanly two decades ago without having to emulate entirely new networking spaces.
When I attended Infiltrate a few years ago, there was a talk about unikernels. The speaker showed off how incredibly insecure many of them were, not even offering support for basic modern security features like DEP and ALSR.
Have they changed? Or did the speaker likely just cherry-pick some especially bad ones?
In short - not a fundamental limitation - just that kernels (even if they are small) have a ton of work that goes into them. Nanos for instance has page protections, ASLR, virtio-rng (if on GCP), etc.
What I see happening now on the cloud is containers from different companies and different security domain running on the same VM. I have to think this is fundamentally insecure and that VMs are underrated.
I hear people advocate QubesOS for security which is based on XEN when it comes to running my client. They say my banking should be done in a different VM than my email for instance. Well if that’s the case, why do we run many containers doing different security sensitive functions on the same VM when containers are not really considered a very good security boundary?
From a security design perspective I imagine hardware being exclusive to a person/organization, vms being exclusive to some security function, and containers existing on top of that makes more sense from a security function but we seem to be playing things more loosely on the server side.
It's not clear to me that VMs actually do offer better isolation than well-designed containers (i.e. not docker).
It's basically a question of: do you trust the safety of kernel-mode drivers (for e.g. PV network devices or emulated hardware) for VMs, or do you trust the safety of userland APIs + the limited set of kernel APIs available to containers.
On my FreeBSD server, I kind of trust jails with strict device rules (i.e. there are only like 5 things in /dev/) over a VM with virtualized graphics, networking, etc.
I think it gets even more complicated with something like firecracker where they recommend you run firecracker in a jail (and provide a utility to set that up)
Not surprising that VMs running unikernels are as nimble as containers, but not quite useful either, at least in general. Much easier to just use a stock docker image.
The article is light on detail. Containers and VMs have different use cases. If you self host lightweight VMs is likely the better path, however, once you in the cloud most managed services only provide support for containers.
There is a huge difference running on VMs that you have zero access to, and actually owning your own VM infrastructure. Yes AWS Lambda runs on Firecracker, however, it could as well running on a FireCheese VM platform and you would be none the wiser, unless AWS publishes this somewhere.
I am also not running on Kubernetes, because Kubernetes. AWS ECS and AWS Batch also only handle containerised applications. Even when deploying on EC2 I tend to use containers, as it ensures they keep working consistently if you apply patches to your EC2 environment.
Yes, when you custom engineer a specific, complex solution for a specific use case, it is generally more performative than a general-use solution that's simple.
Containers and VMs are totally not the same thing.
They serve a complete other purpose , as multiple containers can be combined to create an application/service , VMs always use a complete os etc etc anyway the internet is full of the true purpose of containers , they were never meant to use as a "VM" and about security.. meh everything is insecure until proven differently
Maybe you should read the article? The article basically covers running a tiny kernel that as exactly what you need to run one single binary, which is your application. No printer drivers, video drivers. Probably have network drivers, as that would be quite handy. Kinda up to you on what kernel you pick for this, and what drivers are included. Multiple VMs with networking can also be "combined"...
Containers are great because they are small and fast, low overhead etc. And that is why they replaced virtual machines. This article however proves that this is also achievable with virtual machines.
heck, when running something in production with multiple underlying servers, you need a system that builds an overlay network regardless if you use containers or Vm's..
Why does anyone ever write "two orders of magnitude" when 100x is shorter?
Of course, this presumes 10 as the magnitude and the N orders to be the exponent, but I don't think I've ever, since the 90s, seen that stilted phrasing ever used for a base other than 10.
It's nothing to do with big-O; it's about logarithms. But really I think most people using it just think of it like: "which of these is it closest to? 10x, 100x or 1000x?"
So from that perspective, when I've peeked at firecracker, kata containers, etc, the "small dev dx" isn't quite there yet, or maybe never will get there since the players target other spaces (aws, fly.io, etc). Stuff like a way to share volumes isn't supported, etc. Personally I find Dockers architecture a bit distasteful and Podmans tooling isn't quite there yet (but very close).
Honestly I don't really care about containers vs VMs except the VM alleges better security which is nice, and I guess I like poking at things but they're were a little too rough for weekend poking.
Is anyone doing "small scale" lightweight vm deployments - maybe just in your homelab or toy projects? Have you found the experience better than containers?