Hacker News new | past | comments | ask | show | jobs | submit login
Firecracker internals: Inside the technology powering AWS Lambda (2021) (talhoffman.com)
300 points by mattrighetti on Feb 28, 2023 | hide | past | favorite | 52 comments



Great overview!

> It is highly recommended to read the source code of this amazing project and explore it yourselves - https://github1s.com/firecracker-microvm/firecracker.

Yes! The Firecracker source is super readable, and a great way to learn about this stuff in detail. There's very little magic there, partially thanks to the efforts of the team to keep things accessible and well documented, and partially thanks to how Linux's KVM APIs abstract away some of the hard and hardware-dependent stuff.

For more on the context of how we use Firecracker at AWS, check out:

- Our NSDI'20 paper, on Firecracker and Lambda: https://www.usenix.org/conference/nsdi20/presentation/agache

- Chris and Julien talking about Lambda internals at reInvent'22 (including details of the Firecracker-derived technology behind "snapstart"): https://www.youtube.com/watch?v=EplOzQqgstA

- "Lightweight Virtualization, Opportunities and Challenges", a talk I gave in 2020 which covers some serverless/Firecracker topics: https://www.youtube.com/watch?v=ADOfX2LiEns

- The Security Overview of AWS Lambda, which looks at this stuff from the security perspective: https://docs.aws.amazon.com/whitepapers/latest/security-over...


Thanks for linking these additional talks!

Tangent but that github1s link from the OP (to an in-browser VS Code instance) makes a clear call out that it's not officially from GitHub. Which is fine. But for completion, GitHub does have an official feature for this by changing the url to `github.dev` or pressing '.' on any GitHub repo:

https://github.dev/firecracker-microvm/firecracker


While github.dev may be better in some circumstances, github1s works in an incognito window, without being logged in to GitHub; the github.dev version appears briefly to boot up, then redirects to the login page for GitHub


Does making it open source mean you get a lot of competition? I believe Cloudflare's lambda offerings are pretty popular for example.

Then again, large companies that already invested in AWS are probably not likely to switch providers for specific workloads like lambdas.


Iirc it's based on crosvm, so AWS didn't do much "make it open source" as "adapted an existing piece of open source".


Cloudflare uses V8 for sandboxing, not virtual machines. It’s much lighter weight, but it’s also limited to just what V8 can run (JavaScript and web assembly).

Quite a different offering to lambda. It shares no source in common.


Is this from Dresden or from Seattle or some other kernel team?


Bucharest/Iași, Romania.

https://ro.linkedin.com/in/raduweiss

They have quite massive R&D offices in Romania, several thousand folks there.


Whow, impressive!

I visited the Iasi math university years ago. They were very good then. Eg lot of Prolog, where others still did Java or Pascal.


curious: for the blockIO engine can a qcow2 image be specified as a path? how do i run python in this? can i get to the BPF engine?


Questions for the firecracker users out there:

what version of the kernel do you use (the github page says 5.10 but isn't that quite old?)

what (extremely minimal, I imagine?) kernel configuration

What do you use to build the 'micro' images (I'm guessing many won't even have a classic pid-1 such as systemd, but put their software as pid-1?).

How do you keep timesync of you're not using a timesync daemon? Can you make one of these daemons work on af_vsock (I know firecracker gives a virtio-backed Ethernet device but what if you only want af_vsocks?).

Handle kernel and app logs without adding an log daemon, and same through vsocks, etc?


At CodeSandbox we use Firecracker to run our VMs (more info here: https://codesandbox.io/blog/how-we-clone-a-running-vm-in-2-s...).

To answer the questions:

> what version of the kernel do you use (the github page says 5.10 but isn't that quite old?)

Right, they have tested with 5.10, but it also works with higher kernel versions. Our host currently runs 5.19 and we're planning to upgrade to 6.1 soon. The guest runs 5.15.63, we use a config very similar to the recommended config by FC team (it's in the FC repo). It's important to mention that we had to disable async pagefaulting (a KVM feature) with more modern kernel versions, as VMs could get stuck waiting for an PF resolve.

> What do you use to build the 'micro' images

We created a CLI that creates a rootfs from a Docker image. It pulls the image, creates a container and then extracts the fs from it to an ext4 disk. For the init, we forked the open sourced init from the Fly team (https://github.com/superfly/init-snapshot) and changed/added some functionality.

> How do you keep timesync of you're not using a timesync daemon?

IIRC we expose the time as a PTP device (handled by kvm) and run phc2sys to sync the time in an interval. Firecracker has some documentation on this, where it recommends chrony. It can also be done with vsock, but it would be more manual.

> Handle kernel and app logs without adding an log daemon, and same through vsocks, etc?

The init forwards stdout/stderr of the command it runs to its own stdout, which Firecracker then logs out by itself. A supervisor reads these and writes the logs to files.


> We created a CLI that creates a rootfs from a Docker image. It pulls the image, creates a container and then extracts the fs from it to an ext4 disk.

You guys don't happen to have a public writeup about how this works, do you? Maybe it's as simple as it sounds, but Fly and CodeSandbox both have some magic to turn Docker images into VM disks that I'd like to know how to build :)


Fly is doing fancy stuff to avoid using docker entirely, but with docker you can just run "docker export" to dump an image to a .tar file that contains the whole filesystem. Built-in feature. I use this as a convenient way to grab a foreign platform sysroot for clang cross-compilation; just pick a Docker image and rip the filesystem out.


There's been a writeup on this topic by the Fly team -- https://fly.io/blog/docker-without-docker/


Oh thanks a lot. PTP! I need to try this out. And thanks for the init from fly.io too! And... I think I know now why my vms would get stuck thanks for the tip!


6.2 kernels work fine with Firecracker, and you can compile out a lot compared to a normal x86-64 defconfig: PCI, USB, SCSI, ATA, md, non-virtio ethernet, etc. You probably still want CONFIG_SERIAL_8250 and CONFIG_SERIAL_8250_CONSOLE but everything else can be virtio net and block. (Firecracker uses memory-mapped virtio not PCI, which enabled you to compile out PCI altogether, unlike qemu.) 100ms boot time to exec()ing init is definitely realistic.


Ah yes thanks, I tried going down the absolute slimming down of the kernel rabbit hole, but couldn't find a write-up on the actual best practice on this for firecracker and micro/lightweight VMs, I found some things in the embedded/microcontroller side of things, but it still has lots of devices and features. Is there a minimal, canonical conf somewhere for this?


Julia Evans has a nice write up discussed here on HN about your first three questions

https://news.ycombinator.com/item?id=25883253


That is actually a very nice post (and far more advanced than the usual Julian Evans post I see - not judging here, I really enjoy reading everything she publishes) and you posted the HN link with a comment by Thomas Ptacek too, with interesting links about performance and a discussion of actual isolation.

Thanks :-)


If you think Firecracker is interesting, checkout Cloud Hypervisor [0]. The difference according to Cloud Hypervisor team:

> A large part of the Cloud Hypervisor code is based on either the Firecracker or the crosvm project's implementations. Both of these are VMMs written in Rust with a focus on safety and security, like Cloud Hypervisor.

> The goal of the Cloud Hypervisor project differs from the aforementioned projects in that it aims to be a general purpose VMM for Cloud Workloads and not limited to container/serverless or client workloads.

Firecracker is such a great piece of technology. I'm amazed that AWS actually open-sourced it. All kudos to them. We're using Firecracker at our company to allow API companies build interactive demos like this one we built for Prisma [1].

[0] https://github.com/cloud-hypervisor/cloud-hypervisor

[1] https://playground.prisma.io


They're both built on the same rust-vmm crates, the code is very similar, as are the interfaces. Both very cool projects.


I discovered the other day that the LXC project CI produces rootfs for many interesting distros [1]. Makes it very convenient to spin up a firecracker VM.

[1] https://uk.lxd.images.canonical.com/images/


OG here. Thanks for posting! Can’t believe it got to 5th place on HN :)


Great write-up! One extra detail that might be worth including in the block device/storage section is that Firecracker doesn't yet pass through discard (trim) from guest to host (as fallocate(FALLOC_FL_PUNCH_HOLE) if file-backed) like qemu?

The fast/efficient asynchronous io_uring backend now in released Firecracker makes it a good alternative for full-sized VMs as well as the original short-lived micro-VM use.


I like this one about some of the design decisions and development challenges.

"[2019] Firecracker: Lessons from the Trenches by Andreea Florescu and Alexandra Iordache" - https://youtu.be/yULy6IFy49o

Image is out of focus for most of the presentation, but not the slides and audio is good.


Legit wondering why under the hood, how come Google Cloud Functions handle promises and promise chaining so much better than AWS Lambda

I have to offload requests to state machines in AWS Lambdas which is a pain when prototyping


Can you expand on what you mean by this? You mean that Lambda forces you into an async style of programming where you queue events to handle completion of long running subrequests?


Actually what I’m saying is lambdas, when doing tens of thousands of invocations, will handle async requests much worse than GCP Cloud Functions or KNative

Basically the lambdas, even when you return promises, or set long timeouts, will conclude before completing a request

If you’re doing 5 lambda invocations you don’t really see this, but when you’re doing a lot of them you can see in the statistics a huge failure compared to GCloud Functions UNLESS you offload each fetch to its own state machines, which is technically the correct way to do it but is way over engineered compared to GCloud


Fly.io is also running firecracker vm's


We do very little interesting stuff on the VMM side; this is an Amazon project we're happy to draft off of.


That’s cool. As an old school unix sysadmin that’s aware of the “containers don’t contain”, people ditching proper isolation for containers everywhere for performance reasons has been alarming. now with Firecracker we have isolation and performance.


I’ve always thought Firecracker would be great for CI runners.


This is basically what CodeBuild does.

The default Docker containers that CodeBuild uses (you can create your own) and the shell script it uses to parse the yaml configuration file (mostly a list of shell scripts) are all open source and the entire process can be run locally.

https://github.com/aws/aws-codebuild-docker-images

https://docs.aws.amazon.com/codebuild/latest/userguide/use-c...

Disclaimer: I work for AWS. But nowhere near the team that developed Firecracker


I agree. I prototyped integrating AWS Lambda into our CI pipeline last year, and was very impressed. Unfortunately, I hit a wall when it came to limiting concurrent executing lambdas (which isn't firecracker's fault),but it was very promising up to then. I'm also curious about using fly.io's machines API [0] for the same thing.

[0] https://fly.io/docs/machines/


Great read! Thank you!

Does anyone know what open source competitors exist? For example is there a GCP version to check out and compare?

There are many companies offering such services, would be interesting to know who is using what!


An analogous project from Google with similar use cases is gvisor, which IIRC underlies Cloud Run: https://gvisor.dev/


Super interesting read!

I’ve been playing with Firecracker on a Raspberry Pi 4, but never could get Docker inside a Firecracker uVM to work. Should this be supported at all?


Depends how your starting the VM. I’ve run Docker on Firecracker with a Raspberry PI 4 before but it needed some fixes.

One possibly is if your running directly from a Initramfs without a block device then docker needs DOCKER_RAMDISK set as a environment variable.

Otherwise it’s possible the minimal kernel your Firecracker config uses doesn’t support it out of the box. You can use a regular kernel but you need to make sure modules can be loaded from somewhere.


Thanks for your reply! Do you have a guide or more information somewhere? I found that while Firecracker seems popular, it's sparsely documented.


Does anyone know of a similar explanation for S3?


Any tooling like Docker/Podman to make it as easy to kick off Firecracker VMs as it is to kick off containers?




Kata containers sound interesting, but the first thing that hits you when checking the "learn more" section is the fact that the latest entry in their "news" section dates back to January 2019. Is the project still maintained?




Not a drop-in replacement: the OCI image entry point is not automatically executed.

Issue opened in 2021: https://github.com/weaveworks/ignite/issues/874


Doesn't seem to be actively developed. Last commit is over 4 months


It's pretty much unmaintained, but it does still work!


Yeah, I think it's strange that no one wants to improve on something like this. I can see a lot of use cases where you would want to just use one kubernetes cluster but have support for multi-tenancy with pods being isolated.


There is this project, which I have never used, but seems promising. https://github.com/firecracker-microvm/firecracker-container...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: