Hacker News new | past | comments | ask | show | jobs | submit login
Distri: 20x faster initramfs (initrd) from scratch (stapelberg.ch)
215 points by based2 on Jan 26, 2020 | hide | past | favorite | 64 comments



For those who haven't dug that deep.

initramfs is essentially a tiny Linux operating root image that Linux boots to RAM, before it goes on to boot from hard disk.

You can actually configure Linux to never go onto the next step of booting from hard disk, and run the operating system purely from RAM. Indeed there are distros such as Tinycore Linux that are designed exactly this way which can make things very fast compact and secure and diskless.


If there isn't enough memory to fully unpack the initramfs, Linux 4.x kernels will log a message and silently proceed with a partial filesystem, instead of panic/halt. The resulting failures can take time to root-cause. Haven't tested on 5.x kernels, but similar code is still present:

https://github.com/torvalds/linux/blob/master/init/initramfs...


It's essential on some embedded boards as a first step because a well crafted initramfs can fit in a few MB of flash, and it's sometimes the case that the onboard bootloader can read the flash but has no other capabilites (well, aside from usually tftp).

So you read the kernel and initramfs from there, then load the real root FS from whatever storage you need to.


What happens when out of memory on Tinycore.^1,2

1. On NetBSD, I could make RAM-only, no swap distributions that might thrash but would gracefully recover once some RAM was freed.

2. On Ubuntu, when running from a tmpfs root, I have found that running out of memory causes unrepairable filesystem errors.


You can also go in reverse, and escape the running distribution's mounted / by pivoting to a tmpfs /, kill all processes and unmount the old root. Figuring out how to do this was quite fun. Systemd makes it actually fairly easy, because you can easily hijack the PID 1 if it's systemd, which is necessary to be able to umount the old /.


systemd makes it the most easy. all you need is systemctl switch-root ROOT. on other inits, you usually need to replace /sbin/init, then use some non-portable (sysvinit is not the only competitor) method of telling init to re-exec itself. sometimes that method doesn't even stop the running services, so you need to either manually stop them, or kill them.


You can do it non-destructively by bind mounting a binary from ramfs over /bin/init

When it reexecs, the new binary can kill and umount everything and switch roots. If it fails somehow, you just reboot and everything is back to normal.


In what situations would you expect a re-exec to stop the services? The normal case is that the reexec preserves state.


IIRC, some init implementations have this capability. I'm not saying that it's good in general, but just that it facilitates this particular case.


It is also useful for some other interesting use cases. For example I have done what I call "initram shims" to give me a way to enter FDE passwords on remote systems during boot.


IIRC Chrome OS was designed to do exactly this (in its early days, anyway).


I’m thoroughly convinced google will never be able to produce light weight consumer software. That was their second try at a “light weight operating system” and they so wildly missed the mark both times.


Well remember what happens. Engineers get it working. It’s a success. Now let’s make some money. Happens with every job. I always seem to mosey on step 3 to another job :)


I think they just failed to maintain what they accomplished (I remember when they really did have sub-one-second boot). Google's culture has been dying slowly and painfully over the decade; the less people care, the sadder the products will get.


A lot of its key people ended up at a startup which was acquired by another of the FAANGs.


> I remember when they really did have sub-one-second boot

Wow, as someone who never owned a Chromebook, >1s boot times seem enviable, even if booting into Chrome is rather restrictive. What are boot times like nowadays?


They still advertise sub-8 second boot, but it seems like that's possible with any decently light weight Linux system nowadays.


Yea but most people are comparing Chromebook boot their Windows laptop or equivalent.

No idea what Windows boot time is these days but I'd be shocked of it's under 20 seconds.


For my windows systems, a majority of boot time seems to be in pre-boot environment (bios/uefi/whatever), once it hits the disk it's pretty fast. They're all ssds though, because Windows 10 on spinning disks seems intollerable.

I'm pretty sure Microsoft cheats the boot process, and it's a special hibernation restore, but either way, it's pretty fast. Chrome devices benefit from also having a much faster firmware than most, so even if Windows booted as fast as Chrome OS, the whole device would still be much slower to boot.


And then you have your corporate-mandated image where Windows is bogged down by a metric crapton of garbage enterprise software. My work notebook takes about a minute to come to a state where I can actually enter my username and password on the login screen. Once on the desktop, it takes about 8-10 minutes to get settled (i.e. CPU load goes to baseline).


The way to win in the corporate world is to push for whatever OS they least manage. At Yahoo, getting a mac meant less corpware on your desktop; at Facebook, you would be better off with Windows (or maybe Linux), cause IT is proficient at running random garbage on macos.


Everyone "cheated" that by making going to sleep or hibernating the new norm.

My chromebook wakes back to playing the youtube video I was watching quicker than opening the lid.


My 9-year old desktop on Windows 10 boots in <15 seconds.

SSDs do make a huge difference.

But I'm unsure which class of hardware were you talking about.


On my HP Spectre dual-boot Windows 10 / Fedora Workstation 31, it's roughly 6s and 12s respectively.


If the article reads as a so-what matter, it is worth remembering that the major Linux distributions often rebuild the ramdisk on kernel upgrade as a post-installation step, which is semi-frequent. Worse is how slow rebuilding is.


On Arch it's like ten seconds tops. How slow is it on others?


FWIW Arch does not use dracut, though it might start doing so in the future: https://lists.archlinux.org/pipermail/arch-dev-public/2019-M...


Same order of magnitude. But that’s kinda ridiculous, no?


Windows updates can take 10-30 minutes with computer unusable for anything else. 10 seconds of background activity does not sound too bad. Why do you care? You should be able to work on your tasks at this time.


"It could be worse" is a bad argument against a possible improvement.


While Windows updates are nowhere near 10s, nowadays they are also nowhere near 30m.

Years ago, rarely, you'd get a Windows update that took up to 30m to to complete. These days it's more like a few minutes - certainly noticeable, but let's not exaggerate.


If the upgrade fails it's very annoying though. First you have to wait 10 minutes for the upgrade, then it reboots, then it upgrades some more, then it discovers something is wrong, it reboots, uninstalls the update which takes maybe minutes, reboots again. That can easily kill half an hour if not more of your work day, as you are not able to use the computer when it's upgrading and rebooting. Compared to Linux where you rarely need to reboot after an update. And you can also choose when you want to upgrade, eg. you can shut off your computer without upgrading. And most importantly, Linux lets your start your computer without having to wait 30 minutes for upgrades.

I'm currently using Linux for Windows apps that is no longer supported by Windows. eg. they don't even work in compatibility mode. But they do work in Wine!


If the upgrade fails, that's a completely different issue, not a typical path. We need to separate those, otherwise we'll end up with "On <any> system, when the upgrade catastrophically fails, I have to reinstall the whole machine, so upgrades take hours."


Eh, windows updates, especially when you're including .net 4+ take a pretty considerable amount of time, especially when on a HDD. Server 2016 updates Re very memory hungry.


I get into the install-reboot-uninstall-reboot cycle every time I start Windows (yeh, it's broken, but I've tried nuking the upgrade cache, "repeair" etc).

I've also got a few upgrade fails on Linux. Where I had to boot a live-cd/usb and chroot in to fix the problem. And those took a few hours.

I've probably spent more time fixing stuff on Linux. But it you would run a distro like Ubuntu, and not make any customization, then the upgrade experience is much better on Linux then Windows.


I opted into the insiders thing, to try to help MS make Windows better. I had to opt out of the beta channel, because every update takes a ridiculous amount of time 10+ minutes easy. Multiple reboots. No way.


You've never tried patching server 2016.


I’m having flashbacks of people comparing their gentoo boot times based off of various config settings.


We at croit.io use it to boot all systems in our environment. Debian uses a better live-boot than most other distributions with Dracut (only half the ram required).

After more then 10 years of PXE booting most systems, I personally can definitely say it is rock solid, reduce maintenance times, and makes scaling simple.


I really enjoyed this article. However the following passage seemed a bit hand-wavy to me:

>"How will our userland program know which kernel modules to load? Linux kernel modules declare patterns for their supported hardware as an alias, e.g.:"

  initrd# grep virtio_pci lib/modules/5.4.6/modules.alias
  alias pci:v00001AF4d*sv*sd*bc*sc*i* virtio_pci
Devices in sysfs have a modalias file whose content can be matched against these declarations to identify the module to load:

  initrd# cat /sys/devices/pci0000:00/*/modalias
  pci:v00001AF4d00001005sv00001AF4sd00000004bc00scFFi00
  pci:v00001AF4d00001004sv00001AF4sd00000008bc01sc00i00

>"Hence, for the initial round of module loading, it is sufficient to locate all modalias files within sysfs and load the responsible modules."

Could someone elaborate on this a bit? I'm familiar with udevd. Is /sysfs enumerated first and then a match is searched for in lib/modules/5.4.6/modules.alias? Or the other way around? Are these files parsed before normal kernel PCI bus enumeration then? Any other specifics or detail would be greatly appreciated.


> Is /sysfs enumerated first and then a match is searched for in lib/modules/5.4.6/modules.alias?

Yes :)


This is 20x faster build time to create the initramfs image, is that correct?

It's not 20x faster to boot?


Yes, 20x faster build of initramfs image. But boot is also a bit faster due to the faster go-based initial init process, I'd guess:

> Measuring early boot time using qemu, I measured the dracut-generated initramfs taking 588ms to display the full disk encryption passphrase prompt, whereas minitrd took only 195ms.


Is that an interesting difference, though? At least on my machine, boot time is dominated by... grub loading the kernel and initramfs. Which is actually ridiculous when you consider it's a few dozen megabytes on a NVMe. I have't inveatigated what's up yet.


On a VM, where you don't have to deal with the incredible slowness of pre-boot hardware, that's a big speedup.

On a physical computer where you have to wait 15 seconds for the machine to post anyway, yeah it's not a big difference.


Certainly is to me! I build embedded Linux devices. Our current boottime from cold-boot to fully functioning is roughly 3 seconds. I would really like to get it lower and shaving off almost half a second would be amazing.


fyi a quick way to see this is with systemd-analyze

  systemd-analyze plot > bootplot.svg

  systemd-analyze critical-chain

  systemd-analyze blame


Systemd-analyze doesn't know about grub.


BIOS I/O calls are extremely slow. EFI boot-time services are significantly faster.


That's on a machine with EFI.


I want to like dracut, I really do. It's amazingly flexible and dynamic. Mostly, though, I find it a complete pain in the neck any time I go remotely outside the default behaviour.

It gets even worse when you enable debug. It's a series of really nested bash scripts, and enabling debug does "set -x" for everything. That spews out so much information over serial console / screen that actually tracking down the bug is almost impossible. The sheer complexity of the embedding and sequence of events means that even innocuous changes can have a way bigger impact than expected. RedHat changed something between RHEL7.6 and RHEL7.7 and suddenly we were having all sorts of issues with CentOS7 instances booting off iSCSI.

The one big positive about it (from my perspective at least) is that it _does_ make it pretty easy to inject additional functionality in to the initramfs. Just drop in a bash script with a number prefix relevant to where you want it to occur in the boot order.


Related work is being done in the u-root project (https://github.com/u-root/u-root).


This is an excellent article giving into the Initrd pattern.


He cuts 390ms off of time to passphrase prompt. Does that translate to 390ms faster boot time?

I can’t remember the last time I sat in front of a Linux box during reboot. Is that a pittance or a noticeable improvement?


A "Linux box" might very well be a consumer electronics device, a car, ... 400ms boot time is quite a nice gain in those domains (although there's a bunch of different factors at play, so one would have to try and see how the numbers play out)


With full disk encryption you can turn on a system and have a usable light weight desktop up in 20 seconds.

Alternatively you can suspend and resume in most cases and have it in 2.


If your DE is light enough just put it and its dependancies in the initramfs with your home folder (and /etc /var?) encrypted.

I used to do this, although without the encryption.


So it's less than 3% for WDE. The other thing I wasn't clear on is if this initramfs speeds up boot of 'normal' Linux machines, sans encryption.


this article is oddly really knowledgeable and also really unknowledgeable at the same time. despite its length, it doesn't explain what an initramfs is or how it's created until basically the end (excluding the appendix). an initramfs is simply a cpio archive with a minimum of one file, an executable "/init", then compressed with a standard compressor. currently gzip, xz, and lz4 are the useful ones.

this means that an optimized initramfs creator ought to take as long as a tar file creator, because it basically is a glorified tar file creator. on my system, with a full cache, mkinitcpio takes about 1.5 seconds to run. my custom initramfs generator (not released yet) takes 0.03 seconds to run using "cat" as compressor. it is written entirely in POSIX shell and uses only the external commands ldd, gen_init_cpio, and the compressor. however, if gzip -9 is used for the compressor, then compressing the 12 MB takes 1.8 seconds. so, we can see that compression can significantly inflate (pun intended) the time consumed. as a matter of fact, looking at https://github.com/distr1/distri/blob/master/cmd/distri/init..., it appears to pass no options to gzip, so I assume the default is used, probably -6. http://man7.org/linux/man-pages/man8/dracut.8.html indicates that dracut uses gzip -9 by default, which would increase the time required. if dracut is configured in a non-default manner (e.g. lz4 -12) that would further increase the time.

the author credits Go, concurrency, no external dependencies, and threads (?!) for the improved performance. this is manifestly unnecessary: no threads are needed to improve the performance of "tar", firstly because it is I/O bound, and secondly because the compressor is usually the slowest part anyways. it might be possible to slightly improve the performance by using native code, but there is no substantial difference between running mkinitcpio in 1.0 vs 1.5 seconds, and there's definitely no difference between 10.0 and 11.5 seconds (lz4 -12 takes about 10 seconds to run, and I assume the remaining 1.5 seconds can be fully optimized to require 0 seconds).

regarding the rest of the post, it's... a little weird. I don't understand why anyone would manually trudge through modules.dep files when that is the main objective of the "modprobe" command, as opposed to "insmod", which does no dependency resolution. (modprobe also supports config files, aliases, and some other things, but the main and original objective is dependency resolution.) it's also a bad idea to reimplement libblkid: it supports a ton of filesystems, many of which one might actually want to use as a root filesystem, but are not supported by this basic implementation, including xfs, btrfs, or zfs. it also doesn't appear to support LUKS2. libblkid isn't even necessary: busybox findfs works perfectly well for most common filesystems, and runs very quickly (0.02 seconds on my machine). but, again, the main cost when booting with a necessarily cold cache is likely to be I/O, not spawning a process. I don't understand why someone would faff about with manually parsing ELF files when ldd works just fine with simple text parsing and handles interpreter, rpath, and transitive dependencies built-in.

this all just seems overly complicated to me. my initramfs generator is 77 lines of code, including automatic ldd dependency generation, and init is 37 lines, including LUKS decryption, dropbear for remote password entry, and e2fsck with automatic reboot if requested by e2fsck, for a total of about 110 lines. I didn't count the lines of code for minitrd, but adding together the two files linked is already about 1000 lines of code. mkinitcpio has far more flexibility and is only about 1500 lines for the core code.


"I currently have no desire to make minitrd available outside of distri. While the technical challenges (such as extending the generator to not rely on distri's hermetic packages) are surmountable, I don't want to support people's initramfs remotely.

Also, I think that people's efforts should in general be spent on rallying behind dracut and making it work faster, thereby benefiting all Linux distributions that use dracut (increasingly more)."

As described on the website unixsheikh.com recently posted to HN^1 and making the front page^2, I think this illustrates one difference between "Linux" and BSD. Much of what unixsheikh describes has been said before however I think it still remains true today. I am currently using Linux and open to being persuaded otherwise. I want to learn "the Linux way" if there is such a thing.

That said, I always found booting to ramdisk, "root on mfs", "mfsroot", or whatever people might call it, very straightforward on BSD. They maintained and provided all the necessary tools, some of them have no Linux equivalent. Things stayed more or less the same so once the process was learnt, there seemed little chance of the goalposts moving. One did not need to be a contributor to understand these processes. Sometimes I get the feeling I am just not smart enough for Linux, and it seems like these processes are constantly subject to change. (I hope to find the truth is otherwise.)

As a BSD end user, I found it relatively easy to compile a custom kernel and a custom, statically-linked, multi-call binary as a userland, like busybox but using utilities "as is" not pared down with features missing, to be inserted into the kernel. I could boot from USB stick^3 to ramdisk root, pull out the USB stick, e.g., if I needed the USB port, then chroot into any userland. This was extremely flexible and allowed for easy experimentation. Sometimes I might have that userland prepared on internal media, the USB stick, or some other external media, or I might create it dynamically by downloading and extracting binary sets. This "targetroot"^4 userland might be mounted, usually r/o, on HDD or it might just be mfs/tmpfs.

Apparently, the author was inspired to make his own distribution because he found the package installation process too slow for large packages, e.g. qemu.^5 He tried using SquashFS images to speed up the process. I did the same thing in 2012 using cloop2 compressed filesystem images for large packages, e.g. ghostscript, on NetBSD. I would keep them on the USB stick and just mount them over an executable directory as necessary. I was not using these large programs too frequently and I was running entirely from tmpfs, no HDD, so I did not want them occupying space and depleting RAM.

1. https://news.ycombinator.com/from?site=unixsheikh.com

2. https://news.ycombinator.com/front?day=2020-01-20

3. Thanks to the excellent BSD bootloaders I could manually choose the kernel during boot. I would have several kernels on the USB stick. Of course the kernel did not have to reside on the USB stick, I could load it from any connected media. At one point I was able to boot NetBSD kernels using the FreeBSD bootloader.

4. TBH, I still do not know the true story behind targetroot. Things in BSD always have a history behind them. Surely there is a story behind the "targetroot" directory.

5 https://michael.stapelberg.ch/posts/2019-08-17-linux-package...

"How can minitrd be 20 times faster than dracut?

dracut is mainly written in shell, with a C helper program. It drives the generation process by spawning lots of external dependencies (e.g. ldd or the dracut-install helper program). I assume that the combination of using an interpreted language (shell) that spawns lots of processes and precludes a concurrent architecture is to blame for the poor performance."

Interesting that dracut uses bash, not dash for its scripts. Would dash be any faster. Are bash features a necessity for the simple tasks dracut performs.

Also, could dracut use readelf instead of the bash shell script ldd. According to Wikipedia, the Linux man page for ldd asserts it is a security risk.

Does default Ubuntu include a manpage for ldd. Mine is missing.


The only thing in your post I really want in Linux is a better bootloader the FreeBSD one is pretty neat, I've seen bootloaders that include kexec to change Linux kernels in early boot. But I seldom see the need for a better pre boot env, and I think this is the general consensus. Also the key to boot speed is to forgo initramfs completely, at least if you look at those sub 500ms boots that are out there.


Seems there is a ever-growing list of Linux bootloaders. Which ones are not satisfactory and why.

I was very satisfied with the BSD bootloaders. Plus the kernels comply with the multiboot specification so it was possible to use other bootloaders. This way I could boot kernels from different BSD projects from the same USB stick using a single bootloader. I do not use GRUB.


Since you seem to like to experiment, have you ever heard of

https://www.plop.at/en/bootmanagers.html ?

Though that seems for older hardware, mostly. The one for current systems is in development. Anyways, on older hardware it enabled otherwise impossible things for me.


I use this on all of my systems. It's great and supports unlocking disks with full disk encryption.

https://github.com/slashbeast/better-initramfs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: