Hacker News new | past | comments | ask | show | jobs | submit login
FreeBSD on Firecracker (usenix.org)
346 points by jmmv on Aug 24, 2023 | hide | past | favorite | 129 comments



I never really realized that Firecracker VM is a full-blown machine and not just some sort of a Linux container tech. At first, it may sound like an ineffective approach, but if you take a closer look on a real-world usage example such as fly.io, you will be surprised: micro-VMs are very small and capable.


If you're interested in learning more about that, check out our NSDI'20 paper on how we chose this direction (https://www.usenix.org/conference/nsdi20/presentation/agache) and the Firecracker source and docs (https://github.com/firecracker-microvm/firecracker).

Thanks to KVM, and to the minimal hardware support (no PCI, no ACPI, etc), Firecracker's source is rather simple and even relatively readable for non-experts.


Firecracker's source is rather simple and even relatively readable for non-experts.

... as long as they're experienced at writing Rust. As a Rust newbie it took me a long time to figure out simple things like "where is foo implemented", due to the twisty maze of crates and uses directives.

I totally get why this code is written in Rust, but it would have made my life much easier if it were written in C. ;-)


> it took me a long time to figure out simple things like "where is foo implemented"

Out of curiosity, what development setup do you use?

I imagine that with vanilla EMacs or vanilla Vim you’d have to do quite a bit of spelunking to answer that sort of question.

With a full-blown IDE like for example JetBrains CLion with Rust plug-in installed, it is most of the time a matter of right-click -> go to definition / go to type declaration. (Although heavy use of Rust macros can in some cases confuse the system and make it unable to resolve definitions/declarations.)

And with JetBrains CLion you still have Vim keybindings available as a plug-in.

I switched from Vim to CLion + plug-ins years ago and haven’t looked back since. (Vanilla Vim is still on my servers though so that when I ssh in and want to edit some config files or whatever I can do so in the terminal.)


My development environment for this was mostly "nano in an SSH session". (Among other reasons, I can't even build Firecracker locally.) For FreeBSD work that's just fine since grep can find things for me. It didn't work so well for Firecracker.


I'm surprised `grep` didn't work for finding implementations. Did you find out why? Was it eg searching first-party source code but not crates?


Figuring out how to get all the crate source code extracted was the first step, yes. But when after that, the object oriented nature meant that there were often many different foo functions, so I had to dig through the code to figure out what type of object I was dealing with and which type each of the different foo implementations dealt with.

Whereas in FreeBSD I just grep for ^foo and the one and only line returned is where foo is implemented -- because if there's different versions of foo, they have different names.

Namespaces sound good in principle but they impose a mental load of "developers need to know what namespace they're currently in" -- which is fine for the original developer but much harder for someone jumping into the code for the first time.


If you go to the web site for the crate (or the standard library), and find the doco for the module / function / trait / ..., you find an handy "source" button. It will take you straight to the definition.

Eg, (me picking a random crate on crates.io): https://docs.rs/syn/2.0.29/syn/ or the standard library: https://doc.rust-lang.org/std/option/index.html

It's all generated by the same system from comments in the source. You can generate the same thing for your code.


Sure, but that doesn't help me when I want to go into that code and add extras debugging code so I can figure out where things are going wrong.


Yes! After a bit of playing around, I can follow Rust reasonably well... but only with an IDE, which I've never used for C or even really for the bits of Java I wrote. I understand that many more experienced Rust developers are similarly IDE-reliant.

I do think it's a deliberate tradeoff, having e.g. .push() do something useful for quite a few similar (Vec-like) data structures means you can often refactor Rust code to a similar data structure by changing one line... but it certainly doesn't make things as grep-friendly as C.


Etags makes answering that fairly simple in a C codebase and emacs can be an lsp client.


In comparison, I remember spending much less time finding "where is foo implemented" in Rust than in C++, and also found Rust std to be much more readable than C's when I wasn't familiar with each language. But I can see how rust with all the procedural macros, crates, traits could become a maze for people most familiar with C, and I probably don't feel that because of my C++ background.


If you use a language server, go-to-definition just works.


“Writing things in C to appease the C people” is reasoning that can’t possibly die soon enough. You’ve had a good run.


There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (for ECS, Lambda etc) from different customers within a single VM - it's the reason Firecracker exists.


> There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (for ECS, Lambda etc) from different customers within a single VM - it's the reason Firecracker exists.

I won't speak for AWS, but your assumption about what "enterprise grade" cloud vendors do is dead wrong. I know, because I'm working on maintaining one of these systems.


Lots of enterprise grade cloud vendors trust the Linux kernel boundary WAY too much...


“Enterprise grade” deserves scare quotes for those people of course!


i read it like "military grade" meaning it's on the side of over-provisioned/-engineered and will not break in obvious ways


Does google? I know they use gvisor in production, which is ultimately enforced by a normal kernel (with a ton of sandboxing on top of it).


Google is moving away from gvisor as well.

The "process sandbox" wars are over. Everybody lost, hypervisors won. That's it. It feels incredibly wasteful after all. Hypervisors don't share mm, scheduler, etc. It's a lot of wasted resources. Google came in with gvisor at the last minute to try to say "no, sandboxes aren't dead. Look at our approach with gvisor". They lost too and are now moving away from it.


Really? Has gvisor ever been popped? Has there ever even been a single high-profile compromise caused by a container escape? Shared hosting was a thing and considered "safe enough" for decades and that's all process isolation.

Can't help but feel the security concerns are overblown. To support my claim; Well, Google IS using gvisor as part of their GKE sandboxing security..


I don't know what "popped" means here, but so far as I know there's never been a major incident caused by a flaw in gvisor. But gvisor is a much more intricate and carefully controlled system than standard Linux containers. Obviously, there have been tons of container escape compromises.


shared this link on another reply, but google moved away from gvisor to hypervisor for cloud run. It won't be long before they do for GKE as well

https://cloud.google.com/blog/products/serverless/cloud-run-...


It doesn’t look like the moved away from gVisor due to security reasons. “We were able to achieve these improvements because the second generation execution environment is based on a micro VM. This means that unlike the first generation execution environment, which uses gVisor, a container running in the second generation execution environment has access to a full Linux kernel.”


The reason you go with process isolation over VM isolation is performance. If you share a kernel, you share memory managers and pages, scheduler, limits, groups, etc. If you get better performance running VMs vs running processes, then what was even your isolation layer for?

But at the end of the day, there is a line in the sand around hypervisors vs proc/kernel isolation models. I challenge you to go to a financial or medical institute and tell their CTO "yeah, we have this super bullet proof shared-kernel-inproc isolation model"

The first question you'd get is "Why is this not just part of upstream linux?" Answer that question and realize why you should just use a hypervisor.


Note that GKE Sandbox allows GKE users to sandbox the workloads running on GKE nodes. The GKE nodes themselves are still GCE VMs.


Citation needed. gvisor seems to be under active development and just added support for the systrap platform, deprecating ptrace: https://gvisor.dev/blog/2023/04/28/systrap-release/


Cloud run has abandoned gvisor in their "second generation" execution environment for containers

https://cloud.google.com/blog/products/serverless/cloud-run-...

Obviously there might be many reasons for that, but as someone who worked on a similar gvisor tech for another company, it's dead in the water. No security expert or consultant will ever sign off on a process isolation model. Despite of architecture, audits, reviews, etc. There is just too much surface area for anyone to feel comfortable signing off on hostile multi-tenants with process isolation regardless of the sandboxing tech.

Not saying that there are no bugs in hypervisors, but the surface area is so so much smaller.


The first sentence pretty much sums it up: "Cloud Run’s new execution environment provides increased CPU and network performance and lets you mount network file systems." It's not a secret that performance is slower under gvisor and there are compatibility issues: https://gvisor.dev/docs/architecture_guide/performance/

Disclaimer: I work on this product but wasn't involved in this decision.


gvisor isn't simply a process isolation model. Security experts will certainly sign off on gvisor for some multitenant workloads. The reason Google is moving from it, to the extent they are, is that hypervisors are more performant for more common workloads.


I read "we got tired of reimplementing Linux kernel syscalls and functionality" as the reason. Like network file systems. The Cloud Run client base kept asking for more and more features, and they punted to just running the Linux kernel.


> Google is moving away from gvisor as well.

I've been wondering about this - are they really?


I have seen zero evidence of this; but if it's true I would love to learn more. The real action is in side channel vulnerabilities bypassing all manner of protections.



But this is because the workloads they execute changed, right? Http only before, to more general code today. I didn't see anything there that said gvisor was inferior, only that a new requirement was full kernel api access. For latency sensitive ephemeral and constrained workloads gvisor/seccomp can make a lot of sense and in the case of google handle multi-tenancy.

Now if workloads become less ephemeral and more general purpose, tolerance for startup latency goes up, annd probability of bespoke needs goes up making VM more palatable.


In a way, it feels like a sweet revenge for microkernels.


Tbf gvisor was pretty much DOA by design. Hypervisors are alright, but nowadays security expectations go much lower than ring-1.


Can you expand on this? What do you mean "security expectations go lower than ring-1", and how does that relate to gvisor?


For example what Microsoft is doing at firmware level for Azure.


What is Microsoft doing at the firmware level for azure?



gVisor uses KVM or ptrace as its sandbox layer, and there's some indications that Google's internal fork uses an unpublished kernel mechanism, perhaps by extending seccomp (EDIT: It seems this has made its way to the outside world since I last looked. `systrap` is now default: https://gvisor.dev/docs/architecture_guide/platforms/ ). It's fake-kernel-in-userspace then sandboxed by seccomp.

Saying gVisor is "ultimately enforced by a normal kernel" is about as misleading & accurate as "KVM is enforced by a normal kernel" -- it is, but it's a very narrow boundary, not the usual syscall ABI.


I think bryan cantrill founded a company (joyent? or triton?) to do just that several years ago. It may have been based on solaris/smartos zones which is that exact use case w/ very secure/isolated containers.


althou it came with linux binary compat (of unknown quality) i think the solaris thing was just too off putting for most customers and the company did not do very well


Triton is now being developed by MNX Solutions and seems to be doing quite well.

We run Triton and SmartOS in production and the linux compatibility works via lx-zones just fine. Only some of the linux-locked software, which usually means docker, needs to go inside a bhyve VM.


Aren't Cloudflare Workers multitenant? Although, if you want to be cynical, that could be a reason they aren't 'enterprise grade™'.


They are using v8 isolates, which is maybe easier to do in a sound way than the whole broad space of containers. Previous discussion: https://news.ycombinator.com/item?id=31740885


> There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (...)

I don't think your beliefs are well founded. AWS's EC2 by default only supoprts shared tenancy, and dedicated instances are a premium service.


I take them to mean shared kernels.


But the parent specifically called out co-tenancy of /containers/. EC2 instances are not containers.


Both Lambda firecracker VMs and t2 instances are multi-tenant and oversubscribed.


I take them to mean "multiple tenants sharing a kernel"; I think everyone understands that AWS and GCP have multitenant hypervisor hosts.


Well they're not a "full-blown" machine, in that they do cut out a lot of things unnecessary for Lambda's (and incidentally, fly.io's) use case. ACPI is one example given in the article.

But yes, they do virtualize hardware not the kernel. I'm willing to bet you could swap out vanilla containerd with firecracker-containerd for most users and they wouldn't notice a difference given they initialize so fast.


The difference is mostly noticeable in that the guest kernel takes up some RAM[1]. If you were really packing things tight, wasting some megabytes per container could start to hurt.

[1]: And for the non-hyperscalers with less tuning, you may be buffering I/O pages both in the guest and the host.


KVM is amazing.

Beside Firecracker, there're all sorts of micro-VM being developed right now, such as crosvm, cloud-hypervisor, Kata's Dragonball, all on top of KVM.


Most new vmm projects seem based on rust-vmm and hopefully the big players keep contributing back to the project and its many satellites.


I'm very surprised the standard isn't to build a microkernel that emulates Linux userspace (or *NIX userspace) and is tailored towards the subset of virtual hardware that Firecracker and QEMU provide. I don't get the impression that implementing a new target for a PL is all that difficult, so if you create a psuedo-OS like WASI/WASM and send PRs to the supported languages you could cut out most of the overhead.

The "hardest" part is probably sufficiently emulating Linux userspace accurately: it's a big surface area. That's why I think creating a pseudo-OS target is the best route.


You're describing gvisor.


No, I'm not. Gvisor is a security layer around Linux containers that emulates and constrains syscalls. It specifically runs on top of a container platform and kernel. What I'm suggesting is a stripped down Linux-like kernel that is really good at running exactly one process. I'm describing a microkernel.


gvisor emulates a big chunk of the Linux system call interface, and, remarkably, a further large chunk of Linux kernel state. It's architecturally similar to uml (though not as complete; like Firecracker, it's optimized to a specific set of workloads).

gvisor is not like a seccomp-bpf process sandbox that just ACLs system calls.


Ok, I oversimplified a bit. Regardless, I'm suggesting something that still runs in emulated hardware isolation and implements drivers for Firecracker/QEMU's subset of hardware.


gvisor does emulate some hardware. See, for instance, its network stack.

At any rate: why is this better than just using KVM and Firecracker? The big problem with gvisor is that the emulation you're talking about has pretty tough overhead.


Let's you get the best of both gvisor and Firecracker: efficient use of resources (ie. not running a full Linux kernel + scheduler, and most importantly, network stack for every lambda) while getting the isolation that comes from virtualization. You can achieve this in one of 2 ways: make a new kernel and add support for targeting it in the supported languages, or strip the Linux kernel down and reimplement the parts that aren't optimized for you short-lived VM lifecycle (scheduler, network stack, etc.).


Stripping the Linux kernel down is what people do with Firecracker. I'm curious what savings you see in the Linux networking stack. You could compile it out and just rely on vsocks, but now you're breaking everyone's code and you're not winning anything on performance.


Perhaps I'm off base (I'm not an expert in this area), but I recall reading that one of the major challenges with Lambda was the latency that initializing the network stack introduces. Perhaps that's been solved by now, but my naive idea is to have the guest not really run it's own network stack (at least the MAC/IP portion of it) and instead delegate the entire stack (IP and all) to the virtual device, which can be implemented by Firecracker/QEMU/whatever. I guess at that point, the amount of mangling you'd need to do to the kernel probably isn't worth it and you should just use Gvisor... ah oh well.

Regardless, I'm still surprised microkernels aren't more popular in this space, but perhaps the losing the ecosystem of Linux libs/applications is a non-starter.

Even if the idea wasn't fruitful, the conversation was fun. Thanks for engaging and challenging my bad ideas!

Edit: I've also realized I was thinking of Unikernels, not microkernels and I've been calling it the wrong thing all night. *sigh*


FWIW, Linux itself has plenty of support for TCP offload engines. I don't think Firecracker uses that at the moment, but there's no reason why it has to be that way if that's a true bottleneck in the system.


I think the Firecracker team has stated that PCIe bypass wasn't something they wanted to do, so I don't see how they'd open up to other accelerators' bypass method. But seems like building a vmm from the rust-vmm toolbox is 'not that hard' and there are some PCIe bypass crates already, so... Have fun?


I'm not saying an actual NIC with TCP offload, but instead something like adding TCP offload to the virtio nic if for some reason initialization of the network stack was a latency problem. If the VM is only running at layer 3/4, most init latency problems disappear.


This is exactly what I'm referring to. You have a pool of virtual NICs on the host in user-space created by the VM runtime that get assigned to the guest on provision-time, which just passes through any commands/syscalls (bind/connect/etc.) via a funky driver interface. You'd have to mangle the guest kernel syscalls or libc though, it might be really ugly.


> You'd have to mangle the guest kernel syscalls or libc though, it might be really ugly.

You wouldn't have to. There's patches for hardware TCP offload using the normal socket syscall ABI. The kernel net stack maintainer is pretty ideologically against them so they're not mainlined, but plenty of people have been running them in production for decades and they're quite mature.


Oh neat!


> Linux itself has plenty of support for TCP offload engines

Could you link to any specific Linux kernel source that implements support for TCP offload? AFAIK networking subsystem maintainers were always opposed to accommodate TCP offload because it is A Bad Idea.


At its simplest, TCP offload can be just letting the hardware chunk a large packet into smaller ones on the wire. I don't think anything trying to offload more than that has really seen much daylight outside of very proprietary "smart NICs".

https://www.kernel.org/doc/Documentation/networking/segmenta...

https://wiki.linuxfoundation.org/networking/tso


I know there's a few userspace networking APIs such as DPDK [0]. Or maybe there's something in KVM code that implements it. If so, it's news to me!

[0]: https://www.linuxjournal.com/content/userspace-networking-dp...


That's not what a microkernel is. A microkernel is a kernel that pushes services traditionally included in the kernel, such as networking, out into userspace.

The closest things to what you're describing are unikernels and NetBSD rump kernels.


Yeah, further down I realized I was crossing wires and was thinking of a Unikernel. My bad!


I believe the term you are looking for is unikernel, not micro kernel.


It's a unikernel really only if you rip out any security boundaries inside the VM and link the kernel into the app. If you still maintain a syscall boundary inside the VM, it's just another kernel.


Sort of. Let's say gVisor is three things:

1. Mechanism to capture syscalls (systrap)

2. Reimplementation of parts Linux kernel

3. Narrow set of calls to outside using Linux syscalls

This thing could be envisioned as

1. Linux syscall ABI as-is, same mechanism[a]

2. Reimplementation of parts Linux kernel

3. virt-io hardware drivers for calls to outside

So the middle part of the sandwich could look the same.

Also, I think it's worth saying that I think the work in maintaining #2 there is exactly why Google Cloud Run migrated away from gVisor. People just kept asking for more and more kernel features.

[a]: Alternatively, #1 could be replaced with unikernel like linking directly with #2.

Me personally, I think HTTP/3's move away from TCP could be really interesting for this sort of stuff. The responsibilities of the kernel could be hugely simplified if all you had were UDP/IP directly hardcoded to virtio (no need for routing, address configuration, ARP, etc), no paging etc, and the only filesystems were EROFS & tmpfs. Of course, Cloud Run's move away from gVisor shows that Enterprise clients would hate it.


It's not a full blown VM, it has limitations.


It is a full blown VM, it has limitations.


Thats what powers lambda on aws


Once Colin's patches land on FreeBSD and Firecracker, the total boot time for the kernel is under 20ms. Absolutely incredible times that we live in.


How does this compare to Linux on Firecracker? I can find some numbers with a basic internet search, but I'm not sure if these numbers are comparable for various reasons (they're a few years old, it's unclear if they use the same method to measure boot times, or even have the same definition of "boot time").


Here's the recent BSDCan talk from Colin that was posted a couple of days ago.

https://youtu.be/MT3cdeuRTzs?si=l6baNriUjcvy0ZOE


FWIW, this is basically the same material -- after my BSDCan talk the FreeBSD Journal said "hey, that was a great talk, can you turn it into an article for us", and after the FreeBSD Journal article was published ;login: asked if they could republish it.



I wonder how many of these workarounds are needed with QEMU! Some of course will be needed because they are fixes for bugs in FreeBSD.


It's funny how many one second pauses turn out to be less than necessary. How many sysadmins took meaningful action because the system paused when they had an invalid machine uuid?


Probably a significant proportion of the sysadmins who experienced that one-second pause.

The "print a message telling the user that we're rebooting, then wait a second to let them read the console before we go ahead and reboot", on the other hand...


Maybe I'm just different. I've watched a great many openbsd boot sequences, which tend to have a great many pauses, and I've never paid any special attention to the lines that come before pauses vs lines that come after pauses.


I suspect that FreeBSD has fewer pauses than OpenBSD... especially after the work I've done over the past few years to speed it up.

If anyone in the OpenBSD world is interested in speeding up your boot process I'd be happy to share tips and code. It's a bit daunting to start with but with some good tools it becomes a lot of fun.


Thank you! That would be me. I am clueless on amd64 on how to speed up the boot process and maybe change a few fonts during boot.

I understand the reasons for no "how-tos". But sometimes they make sense for people like me. I wouldn't mind delving a bit deeper given some direction.


The first thing you'll want to do is port my TSLOG code to the OpenBSD kernel and start instrumenting things there. Send me an email, there's too much detail to get into for an HN comment.


if the machine boots in 20ms, I think that message is actually useful, because something would reboot your machine and you'd think you got logged out because you blinked


I'm not wanting to sound snoody. What use-cases do firecracker instances and the likes chime?

I use FreeBSD for everything from my colocated servers, to my own PC. By no means am I developer; seasoned Unix Admin at best. Bare-metal forever but welcome to the future. Especially anything that contributes to the OS.

However I hear buzz words like Lambda and Firecracker and really have no idea where the usage is. I get docker, containers, barely understand k8s but why do you need to spin up a VM only to tear it down compared to where you could just spin up a VM and use it when you really need to. Always there, always when.

Is it purely a cloud experience, cost saving exercise?


Instances of an application are created as part of the request/response lifecycle.

Allows you to build a compute plane where any node in the plane can service the traffic for any application.

Any one application can dynamically grow to consume the available free compute of the plane as needed in response to changes in traffic patterns.

Applications use no resources when they aren't handling traffic.

Growing the capacity of the compute plane means bringing more nodes online.

Can't come up with a use case for this beyond managing many large-scale deployments. If you aren't working "at scale" this is something that would sit below a vendor boundary for you.



snotty


The main use case if for seldom used APIs. If I run a service where the API isn't used often, but I need it quick when it is, Lambdas or something like it are perfect.

As it turns out, a lot of APIs for phone apps fit this category. You don't want a machine sitting around idle 99% of the time to answer those API calls.


Dumb question, then why not stick that API on a machine/service that does need to be used 99% of time.


Because the API also needs some separation. The same reason you would want your services isolated in VMs instead of all running in one bare metal.


This sounds like a hack around a programming problem that needs to be fixed.


I don't even understand what that means. What programming problem?


If you rarely need an API but set something up like this just to rarely use it, it seems one needs to write their own code for this functionality and not go through hoops to run someone else's. That just sounds so bizarre.


Not someone else's API, your own. You make an app. It needs an API that you create (perhaps to sync up scores or something). You don't want to run a machine full time just to accept one score update a day.


Same issue. If one needs to spin this up just for a one-off usage, there's something wrong that needs to be fixed.


I’m not sure how to explain this so you’ll understand. It’s basically how all phone apps are written. It’s a common pattern.

If you made a iPhone game and wanted the high scores to sync to a global scoreboard when the game is over, how would you build that?

What if you only expected 10 players a day?


> just spin up a VM and use it when you really need to. Always there, always when

and always charging you :)


Just about every single company can benefit from scaling as traffic is never consistent 24/7. Most don't bother as the effort outweighs the savings, but the potential is there. Things like lambda and firecracker make it much easier.


It's partly a cost saving exercise, but also: running "chroot /var/empty /some/shitty/code" or putting "chroot /var/empty /some/shitty/code" in inetd.conf is useful. (On today's super-fast machines,) Firecracker starts fast enough to support such interactive uses, while giving you the extra security of a VM (i.e. greatly restricts what parts of the kernel and/or localhost the shitty code can talk to).


> why do you need to spin up a VM only to tear it down compared to where you could just spin up a VM and use it when you really need to. Always there, always when

Firecracker has a much amaller overhead compared to regular VMs - which makes the (time and compute) costs of spinning up new VMs really low. This can be an advantage, depending on how chunky your workloads are - the less chunky they are - the more they can take advantage of finer-grained scaling.


IoT devices can execute short lived actions by calling remote Functions. The provider wants complete isolation and wipes these micro VMs after every few seconds and let's the user pay for use. The response from these can be anything, voice, data or API responses.


FaaS, function as a service. Depending on how software is packaged and the expectations the richness a VM, like Firecraker, provides may be useful. Many of these tradeoffs are for velocity, I can run X easily on Y.


> However I hear buzz words like Lambda and Firecracker and and really have no idea where the usage is.

Sometimes you just want to slap some lines pf code together and run them from time to time, and don’t need a whole server (physical or virtual) for that.

Sometimes you have no idea if you’ll have to run a piece of code 100 times a day or 10’000’000 times a day.

Sometimes you don’t feel like paying a whole month for and maintaining a whole instance for a cronjob that lasts 20 seconds, and maybe it runs once a week.


Instead of (or alongside) a CDN, you can deploy mini services around the world at the "edge"


It's a shame neither AWS nor macOS on ARM support nested virtualization. It would would make it far easier to develop and deploy Firecracker based tech.


Afaik you can do virtualisation on the .metal variants.

Actually you can do virtualisation on any instance type afaik, but only with .metal instances you can use hardware acceleration.


FWIW, AWS a1.metal instances are pretty small and thereby reasonably cost effective for working with virtualization tech.


Their metal offerings are puny in general though(as in, not a ton of options).


Firecracker is amazing, but has a lot of edge cases that need documentation.

A huge thank you to Colin Percival for sharing this.

Particularly love the "Once the low-hanging fruit was out of the way" line... which to Colin means custom bus_dma patch(es).

Now anyone can now enjoy for free:

"with 1 CPU and 128 MB of RAM, the FreeBSD kernel can boot in under 20 ms"

If you're used to devops with k8s clusters or lots of docker, this is absolutely amazing.


Toyed around with firecracker a bit. Does what it promises on boot times, but still a pretty gnarly experience. e.g. After doing a victory dance for getting it to boot I was rather deflated to find out that getting networking takes another lengthy tutorial


I think there is definitively room for someone to add a lot of value to this by creating some automation tools. It would be really nice to be able to download a single binary, fire it up, have both a web interface and an API available, be able to configure it quickly, have it download whatever it needs for you etc.


> on a virtual machine with 1 CPU and 128 MB of RAM, the FreeBSD kernel can boot in under 20 ms;

Oh, my... how could I achieve the same on real hardware without VMs ? ;)


Kernel boot is plenty fast with real hardware, usually under 1 seconds.

It's everything else that's slow. For example, this is my machine

Startup finished in 14.552s (firmware) + 2.885s (loader) + 741ms (kernel) + 23.116s (initrd) + 11.191s (userspace) = 52.488s


The loader+initrd+userspace time there sounds unreasonable.

I've had libvirt bog standard qemu-kvm (not a microvm) creating a new Ubuntu VM from a disk image & booting to a login prompt in under 10 seconds for more than a decade. This is without fiddling with virtio, doing hardware scans for PCI, VGA, SATA and such, and booting via Grub (your "loader"). Those should be pretty comparable!


Ah, yes, you can ignore the loader + initrd. I'm multi-boot so the loader has 2 seconds timeout, and initrd is waiting for me to input my LUKS password.


So firecracker vs v8 isolates if only doing js or wasm?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: