Hacker News new | past | comments | ask | show | jobs | submit login
My VM is lighter (and safer) than your container (2017) (acm.org)
309 points by gaocegege on Sept 8, 2022 | hide | past | favorite | 105 comments



I'm quite interested in seeing where slim VM's go. Personally I don't use Kubernetes, it just doesn't fit my client work which is nearly all single-server and it makes more sense to just run podman systemd units or docker-compose setups.

So from that perspective, when I've peeked at firecracker, kata containers, etc, the "small dev dx" isn't quite there yet, or maybe never will get there since the players target other spaces (aws, fly.io, etc). Stuff like a way to share volumes isn't supported, etc. Personally I find Dockers architecture a bit distasteful and Podmans tooling isn't quite there yet (but very close).

Honestly I don't really care about containers vs VMs except the VM alleges better security which is nice, and I guess I like poking at things but they're were a little too rough for weekend poking.

Is anyone doing "small scale" lightweight vm deployments - maybe just in your homelab or toy projects? Have you found the experience better than containers?


> So from that perspective, when I've peeked at firecracker, kata containers, etc, the "small dev dx" isn't quite there yet, or maybe never will get there since the players target other spaces (aws, fly.io, etc). Stuff like a way to share volumes isn't supported, etc. Personally I find Dockers architecture a bit distasteful and Podmans tooling isn't quite there yet (but very close).

This is pretty much me and my homelab. I haven't visited it in a while, but Weave Ignite might be of interest here. https://github.com/weaveworks/ignite


They went to trash because containers are more convenient to use and saving few MBs of disk/memory is not what most users care.

The whole idea was pretty much either use custom kernel (which inevitably have way less info on how to debug anything in it), and re-do all of the network and storage plumbing containers already do via the OS they are running one.

OR just very slim linux one which at least people know how to use but STILL is more complexity than "just a blob with some namespaces in it" and STILL requires a bunch of config and data juggling between hypervisor and VM just to share some host files to the guest.

Either way to get to the level of "just a slim layer of code between hypervisor and your code" you need to do a quite a lot of deep plumbing and when anything goes wrong debugging is harder. All to get some perceived security and no better performance than just... running the binary in a container.

It did percolate into "slim containers" idea where the container is just statically compiled binary + few configs and while it does have same problems with debuggability, you can just attach sidecart to it

I guess next big hype will be "VM bUt YoU RuN WebAsSeMbLy In CuStOm KeRnEl"


Virtualization is not just "perceived" security over containerization. From CPU rings on down, it offers dramatically more isolation for security than containerization does.

This isn't about 'what most users care' about either. Most users don't really care about 99% of what container orchestration platforms offer. The providers do absolutely care that malicious users cannot punch out to get a shell on an Azure AKS controller or go digging around inside /proc to figure out what other tenants are doing unless the provider is on top of their configuration and regularly updates to match CVEs.

"most users" will end up using one of the frameworks written by a "big boy" for their stuff, and they'll end up using what's convenient for cloud providers.

The goal of microvms is ultimately to remove everything you're talking about from the equation. Kata and other microvm frameworks aim to be basically jsut another CRI which removes the "deep plumbing" you're talking about. The onus is on them to make this work, but there's an enormous financial payoff, and you'll end up with this whether you think it's worthwhile or not.


Running a VM is not necessary more secure than running a container. By definition VM allows to run an untrusted kernel code which makes exploiting curtain bugs more feasible. This is especially so with hardware bugs.

What is fundamentally more secure is running applications inside a VM which is what Amazon is doing. The attacker then has to first exploit the kernel before trying to escape the hypervisor.


in a related vein, most of the distinctions that are being brought up around containers vs vms (pricing, debugability, tooling, overhead) are nothing fundamental at all. they are both executable formats that cut at different layers, and there is really no reason why features of one can't be easily brought to the other.

operating above these abstractions can save us time, but please stop confusing the artifacts of implementation with some kind of fundamental truth. its really hindering our progress.


How would you compare the security of running in wasmer vs the other two options. I know it is a bit apples and oranges. Just curious if it would be harder to break out of a wasm sandbox, or a vm.


Bringing the features of one to the other is exactly what microvms means.


With eBPF there is really not much to argue about in security space.

You can do everything.

New toolset for containers covers pretty much every possible use-case you could even imagine.

The trend will continue in favor of containers and k8s.


I'm not sure what you mean with regards to eBPF but the difference between a container and a VM is massive with regards to security. Incidentally, my company just published a writeup about Firecracker: https://news.ycombinator.com/item?id=32767784


Let me know when eBPF can probe into ring-1 hypercalls into a different kernel other than generically watching timing from vm_enter and vm_exit.

Yes, there is a difference between "eBPF can probe what is happening in L0 of the host kernel" and "you can probe what is happening in other kernels in privileged ring-1 calls".

No, this is not what you think it is.


It is pretty obviously not the case that eBPF means shared-kernel containers are comparably as secure as VMs; there have been recent Linux kernel LPEs that no system call scrubbing BPF code would have caught, without specifically knowing about the bug first.


Except stuff like Kata Containers exist exactly because containers alone is not enough.

WebAssembly on the cloud is only re-inventing what other bytecode based platforms have been doing for decades, but VC need ideas to invest into apparently.


I've been using containers since 2007 for isolating workloads. I don't really like Docker for production either because of the network overhead with the "docker-way" of doing things.

LXD is definetly my favorite container tool.



How differently LXD manages isolation in comparison to docker ?

I suppose both create netns, bridge, ifs ?


LXC/LXD use the same kernel isolation/security features Docker does - namespaces, cgroups, capabilities etc.

After all, it is the kernel functionality lets you run something as a container. Docker and LXC/LXD are different management / FS packaging layers on top of that.


I assume it's not using seccomp, which Docker uses, although seccomp is not Docker specific and you can go grab their policy.


It's the same stuff - namespaces, etc. But it doesn't shove greasy fingers into network config like docker. More a tooling question/approach than tech.


Have similar feelings about docker. LXD containers through a bridged interface fit my mental model/use case.


I was in the same boat as you and built something simple that I really like:

https://gitlab.com/stavros/harbormaster

It'll just pull some repos, make sure the containers are up, and make your configuration simple and discoverable. It really works great at that.


I do not think Docker is the end game. Firecracker/kata containers neither.

https://github.com/tensorchord/envd is my favorite container tool. It provides the build language based on Python and the build is optimized for this scenario.

There may be more runtime and image build tools for both containers and VMs. And different tools may be designed for different purposes. That's the future I believe.


Title is kinda clickbaity (wha-? how can a VM be lighter than a container). It's about unikernels.


> how can a VM be lighter than a container

It's still clickbaity, but the title implies a comparison between a very lightweight VM and a heavy-weight container (presumably a container based on a full Linux distro). You could imagine an analogous article about a tiny house titled "my house is smaller than your apartment".


Not to mention, in the paper, the lightvm only had an advantage on boot times. Menory usage was marginally worse than docker, even with the unikernel, and debian on lightvm was drastically worse for cpu usage than docker (the unikernel cpu usage was neck and neck with the debian docker contaner).

I could see it being an improvement over other VM control planes, but docker still wins in performance for any equivalant comparisons.


It is still lighter in memory only. CPU is also a relevant thing to compare them.


Exactly, unikernels are great for performance and isolation, but that can't be compared to a full application stack running in a container or VM.


Not all containers are a "full application stack"

A unikernel can definitely be compared.


Firecracker VMs are considered lighter than a container and are pretty old at this point.


I would say that firecracker VMs are not more lightweight than Linux containers.

Linux containers are essentially the separation of Linux processes via various namespaces e.g. mount, cgroup, process, network etc. Because this separation is done by Linux internally there are not that many overheads.

VMs provide a different kind of separation one that is arguably more secure because it is backed up hardware -- each VM thinks it has the whole hardware to itself. When you switch between the VM and the host there is quite a heavyweight context switch (VMEXIT/VMENTER in Intel parlance). It can take a long time compared to just the usual context switch from one Linux container (process) to another host (process) or another Linux container (process).

But coming back to your point, no firecracker VMs are not lighter/lightweight than a Linux container. They are quite heavyweight actually. But the firecracker VMM is probably the most nimble of all VMMs.


Which in case someone has failed to notice, is pretty much how serverless happens to work out from development point of view.

No one cares how the language runtime is actually running.


This reminds me: in 2015 I went to Dockercon and one booth that was fun was VMWare's. Basically they had implemented the Docker APIs on top of VMWare so that they could build and deploy VMs using Dockerfiles, etc.

I've casually searched for it in the past and it seems to not exist anymore. For me, one of the best parts of Docker is building a docker-image (and sharing how it was done via git). It would be cool to be able to take the same Dockerfiles and pivot them to VMs easily.


https://github.com/weaveworks/ignite can apparently do this. It looks exciting!


This is really cool! Thanks for sharing.

Though the naming is strange since Apache Ignite exists already. But I’m definitely going to play with this.


They might have built it into Google Anthos as part of their partnership. I recall seeing a demo where you could deploy & run any* VMWare image on Kubernetes without any changes


Isn't that essentially what Vagrant and Vagrantfiles do?


What is your theory for why Docker won and Vagrant didn't?

Mine is that all of the previous options were too Turing Complete, while the Dockerfile format more closely follows the Principle of Least Power.

Power users always complain about how their awesome tool gets ignored while 'lesser' tools become popular. And then they put so much energy into apologizing for problems with the tool or deflecting by denigrating the people who complain. Maybe the problem isn't with 'everyone'. Maybe Power Users have control issues, and pandering to them is not a successful strategy.


What turned me off from Vagrant was that Vagrant machines were never fully reproducible.

Docker took the approach of specifying images in terms of how to create them from scratch. Vagrant, on the other hand, took the approach of specifying certain details about a machine, then trying to apply changes to an existing machine to get it into the desired state. Since the Vagrantfile didn't (and couldn't) specify everything about that state, you'd inevitably end up with some drift as you applied changes to a machine over time -- a development team using Vagrant could often end up in situations where code behaved differently on two developers' machines because their respective Vagrant machines had gotten into different states.

It helped that Docker images can be used in production. Vagrant was only ever pitched as a solution for development; you'd be crazy to try to use it in production.


Docker is not fully reproducible either. Try building a Docker image from two different machines and then pushing it to a registry. It will always overwrite.


Ooooh, “it will always overwrite.”: this is like saying an indirect way of saying that your executable will behave exactly the same if it got overwritten (by a same set of bytes).

Good one.


Dockerfiles pervasively use things like `apt-update` and that makes them not reproducible.

Even of your image pins apt or yum, images you depend on likely don't.

Docker is repeatable, not reproducible.


I’ve read bug threads in the moby GitHub where they reject a feature because it’s not repeatable, while people point out that Docker files start with apt-get so they aren’t repeatable from layer 2 anyway. The team members don’t seem to hear them and it’s frustrating to watch.

Images are repeatable. We like repeatable images. That’s enough for most of us. Don’t break that and we’re good. Just fix build time bullshit please.


Yep


The easier it is for a technology to accidentally make it into production, the more popular it will be


Vagrant images are always huge. But docker images are small. You can deliver and share docker images with ease.


Vagrant never targeted production so not much money/resource/interest invested?


Yes, which is what I'm using now. But it doesn't use the Docker APIs to allow you to (mostly) reuse a dockerfile to build a VM or a container.

not sure if it would be better than Vagrant. But it was still very interesting.


You are talking about declarative configuration of VMs. Vagrant offers that, right?


eeeeeh.......

yes, but then again ... no.

I mean ... yes Vagrant does offer that, but no would I ever consider Vagrant configuration anything approaching a replacement for docker configuration.


github.com/cirruslabs/tart does this for macos VMs. you push them to and pull them from OCI registries.


This paper references consistently mischaracterizes AWS Lambda as a "Container as a Service" technology, when in fact it is exactly the sort of lightweight VM that they are describing - https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir...


In fairness to this paper, it was written and published before that Firecracker article (2017 vs 2018). From another paper on Firecracker providing a bit of history:

> When we first built AWS Lambda, we chose to use Linux containers to isolate functions, and virtualization to isolate between customer accounts. In other words, multiple functions for the same customer would run inside a single VM, but workloads for different customers always run in different VMs. We were unsatisfied with this approach for several reasons, including the necessity of trading off between security and compatibility that containers represent, and the difficulties of efficiently packing workloads onto fixed-size VMs.

And a bit about the timeline:

> Firecracker has been used in production in Lambda since 2018, where it powers millions of workloads and trillions of requests per month.

https://www.usenix.org/system/files/nsdi20-paper-agache.pdf


Thank you for this detail!


For what it’s worth, Google’s cloud functions are a container service. You can even download the final docker container.


KVM gVisor is a hybrid model in this context. It shares properties with both containers and lightweight VMs.


AWS "just" runs linux but this is using unikernels tho ?


No, it's using a modified version of the Xen hypervisor and the numbers they show are boot times and memory usage for both unikernels and pared down Linux systems (via tinyx). It's described in the abstract:

> We achieve lightweight VMs by using unikernels for specialized applications and with Tinyx, a tool that enables creating tailor-made, trimmed-down Linux virtual machines.


The issue with unikernels and things like Firecracker are that you can't run them on already-virtualized platforms

I researched Firecracker when I was looking for an alternative to Docker for deploying FaaS functions on an OpenFaaS-like clone I was building

It would have worked great if the target deployment was bare metal but if you're asking a user to deploy on IE an EC2 or Fargate or whatnot, you can't use these things so all points are moot

This is relevant if you're self-hosting or you ARE a service provider I guess.

(Yes, I know about Firecracker-in-Docker, but I mean real production use)


This is a limitation of whatever virtualized instance you're running on, not Firecracker itself. Firecracker depends on KVM, and AWS EC2 virtualized instances don't enable KVM. But not all virtualized instance services disable KVM.

Obviously, Firecracker being developed by AWS and AWS disabling KVM is not ideal :)

Google Cloud, for instance, allows nested virtualization, IIRC.


Ive used GCP nested virtualization. You pay for that overhead in performance so I wouldn't recommend it without more investigation. We were trying to simulate using luks and the physical is key insert / removal. Would have used it more if we could get GPU passthrough working


Yeah but imagine trying to convince people to use an OSS tool where the catch is that you have to deploy it on special instances, only on providers that support nested virtualization

Not a great DX, haha I wound up using GraalVM's "Polyglot" abilities alongside it's WASM stuff


eh, I think most of the reason we don't see NV support on most public cloud instances is that nobody has come up with a compelling OSS tool that needs it. afaik most providers disable it because they want to discourage you from subletting their instances lol.


Azure and Digital Ocean allowed nested virt as well!


> The issue with unikernels and things like Firecracker are that you can't run them on already-virtualized platforms

I'm not sure about Firecrackers and it has to be enabled on the platform level but there are tricks you can use to run pseudo-nested VMs that look like they are nested without actually incurring into any nested virtualization overhead. This is something that some of us (crosvm) are exploring and playing around with a virtio-vhost-user (VVU) [0] proxy to basically set up a "sibling" (not nested) VM that runs its virtio device drivers into another VM without incurring into nested virtualization issues.

Firecrackers being originally based on a fork of crosvm might even benefit from this if they ever decide to play around this space, but also it'd need to be enabled on the platform level.

It's funny seeing stuff like a VVU implementation of vhost-vsock effectively run on a second VM and provide what seems to be a nested virtualization environment between two completely separate VMs. You get funky stuff like the other sibling VM's host address (cid = 2) becomes a non-parent VM so it looks 100% like a nested VM without nested virtualization performance hits.

[0] - https://crosvm.dev/book/devices/vvu.html


Firecracker runs in hosts that support nested virtualization. GCP and Github Codespaces do, but unfortunately EC2 and Macs don't.


This is a very common misunderstanding in how these actually get deployed in real life.

Disclosure: I work with the OPS/Nanos toolchain so work with people that deploy unikernels in production.

When we deploy them to AWS/GCP/Azure/etc. we are not managing the networking/storage/etc. like a k8s would do - we push all that responsibility back onto the cloud layer itself. So when you spin up a Nanos instance it spins up as its own EC2 instance with only your application - no linux, no k8s, nothing. The networking used is the networking provided by the vpc. You can configure it all you want but you aren't managing it. Now if you have your own infrastructure - knock yourselves out but for those already in the public clouds this is the preferred route. We essentially treat the vm as the application and the cloud as the operating system.

This allows you to have a lot better performance/security and it removes a ton of devops/sysadmin work.


Containers should really be viewed as an extension of packages (like RPM) with a bit of extra sauce with the layered filesystem, a chroot/jail and cgroups for some isolation between different software running on the same server.

Back in 2003 or so we tried doing this with microservices that didn't need an entire server with multiple different software teams running apps on the same physical image to try to avoid giving entire servers to teams that would be only using a few percent of the metal. This failed pretty quickly as software bugs would blow up the whole image and different software teams got really grounchy at each other. With containerization the chroot means that the software carries along all its own deps and the underlying server/metal image can be managed seperately, and the cgroups means that software groups are less likely to stomp on each other due to bugs.

This isn't a cloud model of course, it was all on-prem. I don't know how kubernetes works in the cloud where you can conceivably be running containers on metal sharing with other customers. I would tend to assume that under the covers those cloud vendors are using Containers on VMs on Metal to provide better security guarantees than just containers can offer.

Containers really shouldn't be viewed as competing with VMs in a strict XOR sense.


I don’t remember where i read it but as far as i know when using Fargate to run containers (with k8s or ecs) AWS will just allocate an ec2 instance for you. Your container will never run on the same vm as other customer. This explain i think the latency you can have to start a container. To improve that you need to handle your own ec2 cluster with an autoscaling group


Back in 2000 we were doing this with HP-UX Vaults, IT keeps going in circles.


ASlso, docker to me seems quite leaky in terms of abstractions (especially the networking side) compared to something like a freeBSD jail.

why does docker networking need to be infinitely complex? freebsd solved this issue far more cleanly two decades ago without having to emulate entirely new networking spaces.


> We achieve lightweight VMs by using unikernels

When I attended Infiltrate a few years ago, there was a talk about unikernels. The speaker showed off how incredibly insecure many of them were, not even offering support for basic modern security features like DEP and ALSR.

Have they changed? Or did the speaker likely just cherry-pick some especially bad ones?


You are probably talking about this: https://research.nccgroup.com/wp-content/uploads/2020/07/ncc...

In short - not a fundamental limitation - just that kernels (even if they are small) have a ton of work that goes into them. Nanos for instance has page protections, ASLR, virtio-rng (if on GCP), etc.


I thought that presentation was a little like looking at a hobby OS of a type, then attempting to draw security conclusions for all of that type.

The NanoVMs unikernel for example, is pretty small, DoD supported, and has:

ASLR

    Stack Randomization
    Heap Randomization
    Library Randomization
    Executable Randomization
Page Protections

    Stack Execution off by Default
    Heap Execution off by Default
    Null Page is Not Mapped
    Stack Cookies/Canaries
    Rodata no execute
    Text no write
STIG


The headline reads like a reddit post so I'm going to assume the same still holds true.


What I see happening now on the cloud is containers from different companies and different security domain running on the same VM. I have to think this is fundamentally insecure and that VMs are underrated.

I hear people advocate QubesOS for security which is based on XEN when it comes to running my client. They say my banking should be done in a different VM than my email for instance. Well if that’s the case, why do we run many containers doing different security sensitive functions on the same VM when containers are not really considered a very good security boundary?

From a security design perspective I imagine hardware being exclusive to a person/organization, vms being exclusive to some security function, and containers existing on top of that makes more sense from a security function but we seem to be playing things more loosely on the server side.


Doesn’t AWS use firecracker with its Fargate container service (and lambda too)?


Sometimes the less strict separation is a feature, not a bug.

Without folder sharing with dockers for example, it would be pretty useless.


While a flawed comparison, WSL does use a VM in conjunction with the 9p protocol to achieve folder sharing.


9p-based folder sharing is (used to be?) possible with qemu, too.


It looks like it still is supported [1]. I noticed while reading the Lima documentation that they're planning on switching from SSHFS to 9P [2].

[1] https://wiki.qemu.org/Documentation/9psetup

[2] https://github.com/lima-vm/lima/blob/3401b97e602083cfc55b34e...


Yep, can confirm.

NixOS configurations can be built and run as a qemu VM.

I very recently was using one and found they automatically make a 9p mount to /tmp.


It's not clear to me that VMs actually do offer better isolation than well-designed containers (i.e. not docker).

It's basically a question of: do you trust the safety of kernel-mode drivers (for e.g. PV network devices or emulated hardware) for VMs, or do you trust the safety of userland APIs + the limited set of kernel APIs available to containers.

On my FreeBSD server, I kind of trust jails with strict device rules (i.e. there are only like 5 things in /dev/) over a VM with virtualized graphics, networking, etc.


I think it gets even more complicated with something like firecracker where they recommend you run firecracker in a jail (and provide a utility to set that up)


This is 5 years old. What's the current state of the art?


[2017]


How does LightVM compare to Firecracker VMs? Could it be used for on-demand cloud VMs?


Not surprising that VMs running unikernels are as nimble as containers, but not quite useful either, at least in general. Much easier to just use a stock docker image.


Kubernetes says no...

The article is light on detail. Containers and VMs have different use cases. If you self host lightweight VMs is likely the better path, however, once you in the cloud most managed services only provide support for containers.


You can also use a firecracker runner in k8s to wrap each container in a VM for high isolation and security.


> in the cloud most managed services only provide support for containers.

Respectfully, comments like these are the reason for Kubernetes becoming a meme.


There is a huge difference running on VMs that you have zero access to, and actually owning your own VM infrastructure. Yes AWS Lambda runs on Firecracker, however, it could as well running on a FireCheese VM platform and you would be none the wiser, unless AWS publishes this somewhere.

I am also not running on Kubernetes, because Kubernetes. AWS ECS and AWS Batch also only handle containerised applications. Even when deploying on EC2 I tend to use containers, as it ensures they keep working consistently if you apply patches to your EC2 environment.


Why is a 5yr old article being posted now? If this were to catch on, it would've. I just dont see it being used anywhere.

Having a full Linux kernel available is a major benefit that you lose, right?


Yes, when you custom engineer a specific, complex solution for a specific use case, it is generally more performative than a general-use solution that's simple.


(2017)


Tell me you don't understand containers wihout telling me you dont understand containers.


You don't understand vm's neither. Ever used virtual network interfaces?


Containers and VMs are totally not the same thing. They serve a complete other purpose , as multiple containers can be combined to create an application/service , VMs always use a complete os etc etc anyway the internet is full of the true purpose of containers , they were never meant to use as a "VM" and about security.. meh everything is insecure until proven differently


Maybe you should read the article? The article basically covers running a tiny kernel that as exactly what you need to run one single binary, which is your application. No printer drivers, video drivers. Probably have network drivers, as that would be quite handy. Kinda up to you on what kernel you pick for this, and what drivers are included. Multiple VMs with networking can also be "combined"...

Containers are great because they are small and fast, low overhead etc. And that is why they replaced virtual machines. This article however proves that this is also achievable with virtual machines.


VMs don't need a full OS. You can run a single process directly from the kernel with no init system or other userland


VMs can have private networks between each other just as containers do. That's pretty much what EC2 is about.


heck, when running something in production with multiple underlying servers, you need a system that builds an overlay network regardless if you use containers or Vm's..


"orders of magnitude" :

Why does anyone ever write "two orders of magnitude" when 100x is shorter?

Of course, this presumes 10 as the magnitude and the N orders to be the exponent, but I don't think I've ever, since the 90s, seen that stilted phrasing ever used for a base other than 10.


Because two orders of magnitude does not mean 100x. It means on the same order as 100x.


Do you mean folks using the phrase know big-O, big-omega, big-theta, and are thinking along those lines?


It's nothing to do with big-O; it's about logarithms. But really I think most people using it just think of it like: "which of these is it closest to? 10x, 100x or 1000x?"




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: