FreeBSD on Firecracker

garganzol · on Aug 24, 2023

I never really realized that Firecracker VM is a full-blown machine and not just some sort of a Linux container tech. At first, it may sound like an ineffective approach, but if you take a closer look on a real-world usage example such as fly.io, you will be surprised: micro-VMs are very small and capable.

mjb · on Aug 24, 2023

If you're interested in learning more about that, check out our NSDI'20 paper on how we chose this direction (https://www.usenix.org/conference/nsdi20/presentation/agache) and the Firecracker source and docs (https://github.com/firecracker-microvm/firecracker).

Thanks to KVM, and to the minimal hardware support (no PCI, no ACPI, etc), Firecracker's source is rather simple and even relatively readable for non-experts.

cperciva · on Aug 24, 2023

Firecracker's source is rather simple and even relatively readable for non-experts.

... as long as they're experienced at writing Rust. As a Rust newbie it took me a long time to figure out simple things like "where is foo implemented", due to the twisty maze of crates and uses directives.

I totally get why this code is written in Rust, but it would have made my life much easier if it were written in C. ;-)

codetrotter · on Aug 24, 2023

> it took me a long time to figure out simple things like "where is foo implemented"

Out of curiosity, what development setup do you use?

I imagine that with vanilla EMacs or vanilla Vim you’d have to do quite a bit of spelunking to answer that sort of question.

With a full-blown IDE like for example JetBrains CLion with Rust plug-in installed, it is most of the time a matter of right-click -> go to definition / go to type declaration. (Although heavy use of Rust macros can in some cases confuse the system and make it unable to resolve definitions/declarations.)

And with JetBrains CLion you still have Vim keybindings available as a plug-in.

I switched from Vim to CLion + plug-ins years ago and haven’t looked back since. (Vanilla Vim is still on my servers though so that when I ssh in and want to edit some config files or whatever I can do so in the terminal.)

cperciva · on Aug 25, 2023

My development environment for this was mostly "nano in an SSH session". (Among other reasons, I can't even build Firecracker locally.) For FreeBSD work that's just fine since grep can find things for me. It didn't work so well for Firecracker.

danielheath · on Aug 25, 2023

I'm surprised `grep` didn't work for finding implementations. Did you find out why? Was it eg searching first-party source code but not crates?

cperciva · on Aug 25, 2023

Figuring out how to get all the crate source code extracted was the first step, yes. But when after that, the object oriented nature meant that there were often many different foo functions, so I had to dig through the code to figure out what type of object I was dealing with and which type each of the different foo implementations dealt with.

Whereas in FreeBSD I just grep for ^foo and the one and only line returned is where foo is implemented -- because if there's different versions of foo, they have different names.

Namespaces sound good in principle but they impose a mental load of "developers need to know what namespace they're currently in" -- which is fine for the original developer but much harder for someone jumping into the code for the first time.

rstuart4133 · on Aug 25, 2023

If you go to the web site for the crate (or the standard library), and find the doco for the module / function / trait / ..., you find an handy "source" button. It will take you straight to the definition.

Eg, (me picking a random crate on crates.io): https://docs.rs/syn/2.0.29/syn/ or the standard library: https://doc.rust-lang.org/std/option/index.html

It's all generated by the same system from comments in the source. You can generate the same thing for your code.

cperciva · on Aug 25, 2023

Sure, but that doesn't help me when I want to go into that code and add extras debugging code so I can figure out where things are going wrong.

JoachimSchipper · on Aug 25, 2023

Yes! After a bit of playing around, I can follow Rust reasonably well... but only with an IDE, which I've never used for C or even really for the bits of Java I wrote. I understand that many more experienced Rust developers are similarly IDE-reliant.

I do think it's a deliberate tradeoff, having e.g. .push() do something useful for quite a few similar (Vec-like) data structures means you can often refactor Rust code to a similar data structure by changing one line... but it certainly doesn't make things as grep-friendly as C.

wbl · on Aug 25, 2023

Etags makes answering that fairly simple in a C codebase and emacs can be an lsp client.

nonsense_stream · on Aug 25, 2023

In comparison, I remember spending much less time finding "where is foo implemented" in Rust than in C++, and also found Rust std to be much more readable than C's when I wasn't familiar with each language. But I can see how rust with all the procedural macros, crates, traits could become a maze for people most familiar with C, and I probably don't feel that because of my C++ background.

eptcyka · on Aug 29, 2023

If you use a language server, go-to-definition just works.

tmpX7dMeXU · on Aug 24, 2023

“Writing things in C to appease the C people” is reasoning that can’t possibly die soon enough. You’ve had a good run.

NovemberWhiskey · on Aug 24, 2023

There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (for ECS, Lambda etc) from different customers within a single VM - it's the reason Firecracker exists.

nilptr · on Aug 24, 2023

> There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (for ECS, Lambda etc) from different customers within a single VM - it's the reason Firecracker exists.

I won't speak for AWS, but your assumption about what "enterprise grade" cloud vendors do is dead wrong. I know, because I'm working on maintaining one of these systems.

lima · on Aug 24, 2023

Lots of enterprise grade cloud vendors trust the Linux kernel boundary WAY too much...

jen20 · on Aug 24, 2023

“Enterprise grade” deserves scare quotes for those people of course!

abwizz · on Aug 25, 2023

i read it like "military grade" meaning it's on the side of over-provisioned/-engineered and will not break in obvious ways

monocasa · on Aug 24, 2023

Does google? I know they use gvisor in production, which is ultimately enforced by a normal kernel (with a ton of sandboxing on top of it).

eddythompson80 · on Aug 24, 2023

Google is moving away from gvisor as well.

The "process sandbox" wars are over. Everybody lost, hypervisors won. That's it. It feels incredibly wasteful after all. Hypervisors don't share mm, scheduler, etc. It's a lot of wasted resources. Google came in with gvisor at the last minute to try to say "no, sandboxes aren't dead. Look at our approach with gvisor". They lost too and are now moving away from it.

Rapzid · on Aug 24, 2023

Really? Has gvisor ever been popped? Has there ever even been a single high-profile compromise caused by a container escape? Shared hosting was a thing and considered "safe enough" for decades and that's all process isolation.

Can't help but feel the security concerns are overblown. To support my claim; Well, Google IS using gvisor as part of their GKE sandboxing security..

tptacek · on Aug 24, 2023

I don't know what "popped" means here, but so far as I know there's never been a major incident caused by a flaw in gvisor. But gvisor is a much more intricate and carefully controlled system than standard Linux containers. Obviously, there have been tons of container escape compromises.

eddythompson80 · on Aug 24, 2023

shared this link on another reply, but google moved away from gvisor to hypervisor for cloud run. It won't be long before they do for GKE as well

https://cloud.google.com/blog/products/serverless/cloud-run-...

mikehotel · on Aug 25, 2023

It doesn’t look like the moved away from gVisor due to security reasons. “We were able to achieve these improvements because the second generation execution environment is based on a micro VM. This means that unlike the first generation execution environment, which uses gVisor, a container running in the second generation execution environment has access to a full Linux kernel.”

eddythompson80 · on Aug 25, 2023

The reason you go with process isolation over VM isolation is performance. If you share a kernel, you share memory managers and pages, scheduler, limits, groups, etc. If you get better performance running VMs vs running processes, then what was even your isolation layer for?

But at the end of the day, there is a line in the sand around hypervisors vs proc/kernel isolation models. I challenge you to go to a financial or medical institute and tell their CTO "yeah, we have this super bullet proof shared-kernel-inproc isolation model"

The first question you'd get is "Why is this not just part of upstream linux?" Answer that question and realize why you should just use a hypervisor.

jsolson · on Aug 25, 2023

Note that GKE Sandbox allows GKE users to sandbox the workloads running on GKE nodes. The GKE nodes themselves are still GCE VMs.

RainbowFriends · on Aug 24, 2023

Citation needed. gvisor seems to be under active development and just added support for the systrap platform, deprecating ptrace: https://gvisor.dev/blog/2023/04/28/systrap-release/

eddythompson80 · on Aug 24, 2023

Cloud run has abandoned gvisor in their "second generation" execution environment for containers

https://cloud.google.com/blog/products/serverless/cloud-run-...

Obviously there might be many reasons for that, but as someone who worked on a similar gvisor tech for another company, it's dead in the water. No security expert or consultant will ever sign off on a process isolation model. Despite of architecture, audits, reviews, etc. There is just too much surface area for anyone to feel comfortable signing off on hostile multi-tenants with process isolation regardless of the sandboxing tech.

Not saying that there are no bugs in hypervisors, but the surface area is so so much smaller.

coryrc · on Aug 25, 2023

The first sentence pretty much sums it up: "Cloud Run’s new execution environment provides increased CPU and network performance and lets you mount network file systems." It's not a secret that performance is slower under gvisor and there are compatibility issues: https://gvisor.dev/docs/architecture_guide/performance/

Disclaimer: I work on this product but wasn't involved in this decision.

tptacek · on Aug 25, 2023

gvisor isn't simply a process isolation model. Security experts will certainly sign off on gvisor for some multitenant workloads. The reason Google is moving from it, to the extent they are, is that hypervisors are more performant for more common workloads.

yencabulator · on Aug 25, 2023

I read "we got tired of reimplementing Linux kernel syscalls and functionality" as the reason. Like network file systems. The Cloud Run client base kept asking for more and more features, and they punted to just running the Linux kernel.

lima · on Aug 24, 2023

> Google is moving away from gvisor as well.

I've been wondering about this - are they really?

beardedwizard · on Aug 24, 2023

I have seen zero evidence of this; but if it's true I would love to learn more. The real action is in side channel vulnerabilities bypassing all manner of protections.

eddythompson80 · on Aug 24, 2023

see

https://cloud.google.com/blog/products/serverless/cloud-run-...

beardedwizard · on Aug 25, 2023

But this is because the workloads they execute changed, right? Http only before, to more general code today. I didn't see anything there that said gvisor was inferior, only that a new requirement was full kernel api access. For latency sensitive ephemeral and constrained workloads gvisor/seccomp can make a lot of sense and in the case of google handle multi-tenancy.

Now if workloads become less ephemeral and more general purpose, tolerance for startup latency goes up, annd probability of bespoke needs goes up making VM more palatable.

pjmlp · on Aug 24, 2023

In a way, it feels like a sweet revenge for microkernels.

intelVISA · on Aug 25, 2023

Tbf gvisor was pretty much DOA by design. Hypervisors are alright, but nowadays security expectations go much lower than ring-1.

tptacek · on Aug 25, 2023

Can you expand on this? What do you mean "security expectations go lower than ring-1", and how does that relate to gvisor?

pjmlp · on Aug 25, 2023

For example what Microsoft is doing at firmware level for Azure.

monocasa · on Aug 25, 2023

What is Microsoft doing at the firmware level for azure?

pjmlp · on Aug 25, 2023

For example,

https://techcommunity.microsoft.com/t5/microsoft-defender-fo...

https://techcommunity.microsoft.com/t5/azure-infrastructure-...

yencabulator · on Aug 25, 2023

gVisor uses KVM or ptrace as its sandbox layer, and there's some indications that Google's internal fork uses an unpublished kernel mechanism, perhaps by extending seccomp (EDIT: It seems this has made its way to the outside world since I last looked. `systrap` is now default: https://gvisor.dev/docs/architecture_guide/platforms/ ). It's fake-kernel-in-userspace then sandboxed by seccomp.

Saying gVisor is "ultimately enforced by a normal kernel" is about as misleading & accurate as "KVM is enforced by a normal kernel" -- it is, but it's a very narrow boundary, not the usual syscall ABI.

sharts · on Aug 25, 2023

I think bryan cantrill founded a company (joyent? or triton?) to do just that several years ago. It may have been based on solaris/smartos zones which is that exact use case w/ very secure/isolated containers.

abwizz · on Aug 25, 2023

althou it came with linux binary compat (of unknown quality) i think the solaris thing was just too off putting for most customers and the company did not do very well

NexRebular · on Aug 25, 2023

Triton is now being developed by MNX Solutions and seems to be doing quite well.

We run Triton and SmartOS in production and the linux compatibility works via lx-zones just fine. Only some of the linux-locked software, which usually means docker, needs to go inside a bhyve VM.

basique · on Aug 24, 2023

Aren't Cloudflare Workers multitenant? Although, if you want to be cynical, that could be a reason they aren't 'enterprise grade™'.

tyingq · on Aug 24, 2023

They are using v8 isolates, which is maybe easier to do in a sound way than the whole broad space of containers. Previous discussion: https://news.ycombinator.com/item?id=31740885

rewmie · on Aug 24, 2023

> There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (...)

I don't think your beliefs are well founded. AWS's EC2 by default only supoprts shared tenancy, and dedicated instances are a premium service.

tptacek · on Aug 24, 2023

I take them to mean shared kernels.

cthalupa · on Aug 25, 2023

But the parent specifically called out co-tenancy of /containers/. EC2 instances are not containers.

mochomocha · on Aug 24, 2023

Both Lambda firecracker VMs and t2 instances are multi-tenant and oversubscribed.

tptacek · on Aug 24, 2023

I take them to mean "multiple tenants sharing a kernel"; I think everyone understands that AWS and GCP have multitenant hypervisor hosts.

seabrookmx · on Aug 24, 2023

Well they're not a "full-blown" machine, in that they do cut out a lot of things unnecessary for Lambda's (and incidentally, fly.io's) use case. ACPI is one example given in the article.

But yes, they do virtualize hardware not the kernel. I'm willing to bet you could swap out vanilla containerd with firecracker-containerd for most users and they wouldn't notice a difference given they initialize so fast.

yencabulator · on Aug 25, 2023

The difference is mostly noticeable in that the guest kernel takes up some RAM[1]. If you were really packing things tight, wasting some megabytes per container could start to hurt.

[1]: And for the non-hyperscalers with less tuning, you may be buffering I/O pages both in the guest and the host.

dikei · on Aug 25, 2023

KVM is amazing.

Beside Firecracker, there're all sorts of micro-VM being developed right now, such as crosvm, cloud-hypervisor, Kata's Dragonball, all on top of KVM.

touisteur · on Aug 25, 2023

Most new vmm projects seem based on rust-vmm and hopefully the big players keep contributing back to the project and its many satellites.

packetlost · on Aug 24, 2023

I'm very surprised the standard isn't to build a microkernel that emulates Linux userspace (or *NIX userspace) and is tailored towards the subset of virtual hardware that Firecracker and QEMU provide. I don't get the impression that implementing a new target for a PL is all that difficult, so if you create a psuedo-OS like WASI/WASM and send PRs to the supported languages you could cut out most of the overhead.

The "hardest" part is probably sufficiently emulating Linux userspace accurately: it's a big surface area. That's why I think creating a pseudo-OS target is the best route.

tptacek · on Aug 24, 2023

You're describing gvisor.

packetlost · on Aug 24, 2023

No, I'm not. Gvisor is a security layer around Linux containers that emulates and constrains syscalls. It specifically runs on top of a container platform and kernel. What I'm suggesting is a stripped down Linux-like kernel that is really good at running exactly one process. I'm describing a microkernel.

tptacek · on Aug 24, 2023

gvisor emulates a big chunk of the Linux system call interface, and, remarkably, a further large chunk of Linux kernel state. It's architecturally similar to uml (though not as complete; like Firecracker, it's optimized to a specific set of workloads).

gvisor is not like a seccomp-bpf process sandbox that just ACLs system calls.

packetlost · on Aug 24, 2023

Ok, I oversimplified a bit. Regardless, I'm suggesting something that still runs in emulated hardware isolation and implements drivers for Firecracker/QEMU's subset of hardware.

tptacek · on Aug 24, 2023

gvisor does emulate some hardware. See, for instance, its network stack.

At any rate: why is this better than just using KVM and Firecracker? The big problem with gvisor is that the emulation you're talking about has pretty tough overhead.

packetlost · on Aug 25, 2023

Let's you get the best of both gvisor and Firecracker: efficient use of resources (ie. not running a full Linux kernel + scheduler, and most importantly, network stack for every lambda) while getting the isolation that comes from virtualization. You can achieve this in one of 2 ways: make a new kernel and add support for targeting it in the supported languages, or strip the Linux kernel down and reimplement the parts that aren't optimized for you short-lived VM lifecycle (scheduler, network stack, etc.).

tptacek · on Aug 25, 2023

Stripping the Linux kernel down is what people do with Firecracker. I'm curious what savings you see in the Linux networking stack. You could compile it out and just rely on vsocks, but now you're breaking everyone's code and you're not winning anything on performance.

packetlost · on Aug 25, 2023

Perhaps I'm off base (I'm not an expert in this area), but I recall reading that one of the major challenges with Lambda was the latency that initializing the network stack introduces. Perhaps that's been solved by now, but my naive idea is to have the guest not really run it's own network stack (at least the MAC/IP portion of it) and instead delegate the entire stack (IP and all) to the virtual device, which can be implemented by Firecracker/QEMU/whatever. I guess at that point, the amount of mangling you'd need to do to the kernel probably isn't worth it and you should just use Gvisor... ah oh well.

Regardless, I'm still surprised microkernels aren't more popular in this space, but perhaps the losing the ecosystem of Linux libs/applications is a non-starter.

Even if the idea wasn't fruitful, the conversation was fun. Thanks for engaging and challenging my bad ideas!

Edit: I've also realized I was thinking of Unikernels, not microkernels and I've been calling it the wrong thing all night. *sigh*

monocasa · on Aug 25, 2023

FWIW, Linux itself has plenty of support for TCP offload engines. I don't think Firecracker uses that at the moment, but there's no reason why it has to be that way if that's a true bottleneck in the system.

touisteur · on Aug 25, 2023

I think the Firecracker team has stated that PCIe bypass wasn't something they wanted to do, so I don't see how they'd open up to other accelerators' bypass method. But seems like building a vmm from the rust-vmm toolbox is 'not that hard' and there are some PCIe bypass crates already, so... Have fun?

monocasa · on Aug 25, 2023

I'm not saying an actual NIC with TCP offload, but instead something like adding TCP offload to the virtio nic if for some reason initialization of the network stack was a latency problem. If the VM is only running at layer 3/4, most init latency problems disappear.

packetlost · on Aug 25, 2023

This is exactly what I'm referring to. You have a pool of virtual NICs on the host in user-space created by the VM runtime that get assigned to the guest on provision-time, which just passes through any commands/syscalls (bind/connect/etc.) via a funky driver interface. You'd have to mangle the guest kernel syscalls or libc though, it might be really ugly.

monocasa · on Aug 25, 2023

> You'd have to mangle the guest kernel syscalls or libc though, it might be really ugly.

You wouldn't have to. There's patches for hardware TCP offload using the normal socket syscall ABI. The kernel net stack maintainer is pretty ideologically against them so they're not mainlined, but plenty of people have been running them in production for decades and they're quite mature.

packetlost · on Aug 25, 2023

Oh neat!

caskstrength · on Aug 25, 2023

> Linux itself has plenty of support for TCP offload engines

Could you link to any specific Linux kernel source that implements support for TCP offload? AFAIK networking subsystem maintainers were always opposed to accommodate TCP offload because it is A Bad Idea.

yencabulator · on Aug 25, 2023

At its simplest, TCP offload can be just letting the hardware chunk a large packet into smaller ones on the wire. I don't think anything trying to offload more than that has really seen much daylight outside of very proprietary "smart NICs".

https://www.kernel.org/doc/Documentation/networking/segmenta...

https://wiki.linuxfoundation.org/networking/tso

packetlost · on Aug 25, 2023

I know there's a few userspace networking APIs such as DPDK [0]. Or maybe there's something in KVM code that implements it. If so, it's news to me!

[0]: https://www.linuxjournal.com/content/userspace-networking-dp...

talideon · on Aug 25, 2023

That's not what a microkernel is. A microkernel is a kernel that pushes services traditionally included in the kernel, such as networking, out into userspace.

The closest things to what you're describing are unikernels and NetBSD rump kernels.

packetlost · on Aug 25, 2023

Yeah, further down I realized I was crossing wires and was thinking of a Unikernel. My bad!

reachableceo · on Aug 25, 2023

I believe the term you are looking for is unikernel, not micro kernel.

yencabulator · on Aug 25, 2023

It's a unikernel really only if you rip out any security boundaries inside the VM and link the kernel into the app. If you still maintain a syscall boundary inside the VM, it's just another kernel.

yencabulator · on Aug 25, 2023

Sort of. Let's say gVisor is three things:

1. Mechanism to capture syscalls (systrap)

2. Reimplementation of parts Linux kernel

3. Narrow set of calls to outside using Linux syscalls

This thing could be envisioned as

1. Linux syscall ABI as-is, same mechanism[a]

2. Reimplementation of parts Linux kernel

3. virt-io hardware drivers for calls to outside

So the middle part of the sandwich could look the same.

Also, I think it's worth saying that I think the work in maintaining #2 there is exactly why Google Cloud Run migrated away from gVisor. People just kept asking for more and more kernel features.

[a]: Alternatively, #1 could be replaced with unikernel like linking directly with #2.

Me personally, I think HTTP/3's move away from TCP could be really interesting for this sort of stuff. The responsibilities of the kernel could be hugely simplified if all you had were UDP/IP directly hardcoded to virtio (no need for routing, address configuration, ARP, etc), no paging etc, and the only filesystems were EROFS & tmpfs. Of course, Cloud Run's move away from gVisor shows that Enterprise clients would hate it.

Thaxll · on Aug 24, 2023

It's not a full blown VM, it has limitations.

tptacek · on Aug 24, 2023

It is a full blown VM, it has limitations.

ekianjo · on Aug 25, 2023

Thats what powers lambda on aws

nocarrier · on Aug 25, 2023

Once Colin's patches land on FreeBSD and Firecracker, the total boot time for the kernel is under 20ms. Absolutely incredible times that we live in.

arp242 · on Aug 25, 2023

How does this compare to Linux on Firecracker? I can find some numbers with a basic internet search, but I'm not sure if these numbers are comparable for various reasons (they're a few years old, it's unclear if they use the same method to measure boot times, or even have the same definition of "boot time").

mnsc · on Aug 24, 2023

Here's the recent BSDCan talk from Colin that was posted a couple of days ago.

https://youtu.be/MT3cdeuRTzs?si=l6baNriUjcvy0ZOE

cperciva · on Aug 24, 2023

FWIW, this is basically the same material -- after my BSDCan talk the FreeBSD Journal said "hey, that was a great talk, can you turn it into an article for us", and after the FreeBSD Journal article was published ;login: asked if they could republish it.

andrewstuart · on Aug 24, 2023

qemu has microvm, inspired by firecracker

https://qemu.readthedocs.io/en/latest/system/i386/microvm.ht...

bonzini · on Aug 24, 2023

I wonder how many of these workarounds are needed with QEMU! Some of course will be needed because they are fixes for bugs in FreeBSD.

tedunangst · on Aug 24, 2023

It's funny how many one second pauses turn out to be less than necessary. How many sysadmins took meaningful action because the system paused when they had an invalid machine uuid?

cperciva · on Aug 24, 2023

Probably a significant proportion of the sysadmins who experienced that one-second pause.

The "print a message telling the user that we're rebooting, then wait a second to let them read the console before we go ahead and reboot", on the other hand...

tedunangst · on Aug 25, 2023

Maybe I'm just different. I've watched a great many openbsd boot sequences, which tend to have a great many pauses, and I've never paid any special attention to the lines that come before pauses vs lines that come after pauses.

cperciva · on Aug 25, 2023

I suspect that FreeBSD has fewer pauses than OpenBSD... especially after the work I've done over the past few years to speed it up.

If anyone in the OpenBSD world is interested in speeding up your boot process I'd be happy to share tips and code. It's a bit daunting to start with but with some good tools it becomes a lot of fun.

pseudostem · on Aug 25, 2023

Thank you! That would be me. I am clueless on amd64 on how to speed up the boot process and maybe change a few fonts during boot.

I understand the reasons for no "how-tos". But sometimes they make sense for people like me. I wouldn't mind delving a bit deeper given some direction.

cperciva · on Aug 25, 2023

The first thing you'll want to do is port my TSLOG code to the OpenBSD kernel and start instrumenting things there. Send me an email, there's too much detail to get into for an HN comment.

slim · on Aug 25, 2023

if the machine boots in 20ms, I think that message is actually useful, because something would reboot your machine and you'd think you got logged out because you blinked

doublerabbit · on Aug 24, 2023

I'm not wanting to sound snoody. What use-cases do firecracker instances and the likes chime?

I use FreeBSD for everything from my colocated servers, to my own PC. By no means am I developer; seasoned Unix Admin at best. Bare-metal forever but welcome to the future. Especially anything that contributes to the OS.

However I hear buzz words like Lambda and Firecracker and really have no idea where the usage is. I get docker, containers, barely understand k8s but why do you need to spin up a VM only to tear it down compared to where you could just spin up a VM and use it when you really need to. Always there, always when.

Is it purely a cloud experience, cost saving exercise?

r3trohack3r · on Aug 24, 2023

Instances of an application are created as part of the request/response lifecycle.

Allows you to build a compute plane where any node in the plane can service the traffic for any application.

Any one application can dynamically grow to consume the available free compute of the plane as needed in response to changes in traffic patterns.

Applications use no resources when they aren't handling traffic.

Growing the capacity of the compute plane means bringing more nodes online.

Can't come up with a use case for this beyond managing many large-scale deployments. If you aren't working "at scale" this is something that would sit below a vendor boundary for you.

lproven · on Aug 24, 2023

... "snooty"?

https://www.merriam-webster.com/dictionary/snooty

zaps · on Aug 25, 2023

snotty

jedberg · on Aug 24, 2023

The main use case if for seldom used APIs. If I run a service where the API isn't used often, but I need it quick when it is, Lambdas or something like it are perfect.

As it turns out, a lot of APIs for phone apps fit this category. You don't want a machine sitting around idle 99% of the time to answer those API calls.

alberth · on Aug 24, 2023

Dumb question, then why not stick that API on a machine/service that does need to be used 99% of time.

nonsense_stream · on Aug 25, 2023

Because the API also needs some separation. The same reason you would want your services isolated in VMs instead of all running in one bare metal.

assimpleaspossi · on Aug 24, 2023

This sounds like a hack around a programming problem that needs to be fixed.

jedberg · on Aug 25, 2023

I don't even understand what that means. What programming problem?

assimpleaspossi · on Aug 25, 2023

If you rarely need an API but set something up like this just to rarely use it, it seems one needs to write their own code for this functionality and not go through hoops to run someone else's. That just sounds so bizarre.

jedberg · on Aug 25, 2023

Not someone else's API, your own. You make an app. It needs an API that you create (perhaps to sync up scores or something). You don't want to run a machine full time just to accept one score update a day.

assimpleaspossi · on Aug 26, 2023

Same issue. If one needs to spin this up just for a one-off usage, there's something wrong that needs to be fixed.

jedberg · on Aug 26, 2023

I’m not sure how to explain this so you’ll understand. It’s basically how all phone apps are written. It’s a common pattern.

If you made a iPhone game and wanted the high scores to sync to a global scoreboard when the game is over, how would you build that?

What if you only expected 10 players a day?

wnolens · on Aug 24, 2023

> just spin up a VM and use it when you really need to. Always there, always when

and always charging you :)

turtlebits · on Aug 24, 2023

Just about every single company can benefit from scaling as traffic is never consistent 24/7. Most don't bother as the effort outweighs the savings, but the potential is there. Things like lambda and firecracker make it much easier.

JoachimSchipper · on Aug 25, 2023

It's partly a cost saving exercise, but also: running "chroot /var/empty /some/shitty/code" or putting "chroot /var/empty /some/shitty/code" in inetd.conf is useful. (On today's super-fast machines,) Firecracker starts fast enough to support such interactive uses, while giving you the extra security of a VM (i.e. greatly restricts what parts of the kernel and/or localhost the shitty code can talk to).

sangnoir · on Aug 24, 2023

> why do you need to spin up a VM only to tear it down compared to where you could just spin up a VM and use it when you really need to. Always there, always when

Firecracker has a much amaller overhead compared to regular VMs - which makes the (time and compute) costs of spinning up new VMs really low. This can be an advantage, depending on how chunky your workloads are - the less chunky they are - the more they can take advantage of finer-grained scaling.

Datagenerator · on Aug 24, 2023

IoT devices can execute short lived actions by calling remote Functions. The provider wants complete isolation and wipes these micro VMs after every few seconds and let's the user pay for use. The response from these can be anything, voice, data or API responses.

artificial · on Aug 24, 2023

FaaS, function as a service. Depending on how software is packaged and the expectations the richness a VM, like Firecraker, provides may be useful. Many of these tradeoffs are for velocity, I can run X easily on Y.

znpy · on Aug 24, 2023

> However I hear buzz words like Lambda and Firecracker and and really have no idea where the usage is.

Sometimes you just want to slap some lines pf code together and run them from time to time, and don’t need a whole server (physical or virtual) for that.

Sometimes you have no idea if you’ll have to run a piece of code 100 times a day or 10’000’000 times a day.

Sometimes you don’t feel like paying a whole month for and maintaining a whole instance for a cronjob that lasts 20 seconds, and maybe it runs once a week.

willsmith72 · on Aug 24, 2023

Instead of (or alongside) a CDN, you can deploy mini services around the world at the "edge"

laurencerowe · on Aug 24, 2023

It's a shame neither AWS nor macOS on ARM support nested virtualization. It would would make it far easier to develop and deploy Firecracker based tech.

znpy · on Aug 24, 2023

Afaik you can do virtualisation on the .metal variants.

Actually you can do virtualisation on any instance type afaik, but only with .metal instances you can use hardware acceleration.

saurik · on Aug 24, 2023

FWIW, AWS a1.metal instances are pretty small and thereby reasonably cost effective for working with virtualization tech.

Rapzid · on Aug 24, 2023

Their metal offerings are puny in general though(as in, not a ton of options).

ryanrussell · on Aug 25, 2023

Firecracker is amazing, but has a lot of edge cases that need documentation.

A huge thank you to Colin Percival for sharing this.

Particularly love the "Once the low-hanging fruit was out of the way" line... which to Colin means custom bus_dma patch(es).

Now anyone can now enjoy for free:

"with 1 CPU and 128 MB of RAM, the FreeBSD kernel can boot in under 20 ms"

If you're used to devops with k8s clusters or lots of docker, this is absolutely amazing.

Havoc · on Aug 25, 2023

Toyed around with firecracker a bit. Does what it promises on boot times, but still a pretty gnarly experience. e.g. After doing a victory dance for getting it to boot I was rather deflated to find out that getting networking takes another lengthy tutorial

bborud · on Aug 25, 2023

I think there is definitively room for someone to add a lot of value to this by creating some automation tools. It would be really nice to be able to download a single binary, fire it up, have both a web interface and an API available, be able to configure it quickly, have it download whatever it needs for you etc.

ruslan · on Aug 25, 2023

> on a virtual machine with 1 CPU and 128 MB of RAM, the FreeBSD kernel can boot in under 20 ms;

Oh, my... how could I achieve the same on real hardware without VMs ? ;)

dikei · on Aug 25, 2023

Kernel boot is plenty fast with real hardware, usually under 1 seconds.

It's everything else that's slow. For example, this is my machine

Startup finished in 14.552s (firmware) + 2.885s (loader) + 741ms (kernel) + 23.116s (initrd) + 11.191s (userspace) = 52.488s

yencabulator · on Aug 25, 2023

The loader+initrd+userspace time there sounds unreasonable.

I've had libvirt bog standard qemu-kvm (not a microvm) creating a new Ubuntu VM from a disk image & booting to a login prompt in under 10 seconds for more than a decade. This is without fiddling with virtio, doing hardware scans for PCI, VGA, SATA and such, and booting via Grub (your "loader"). Those should be pretty comparable!

dikei · on Aug 26, 2023

Ah, yes, you can ignore the loader + initrd. I'm multi-boot so the loader has 2 seconds timeout, and initrd is waiting for me to input my LUKS password.

getcrunk · on Aug 24, 2023

So firecracker vs v8 isolates if only doing js or wasm?