How modern Linux systems boot

voidmain · on June 18, 2018

One day, through the primeval wood, A calf walked home, as good calves should...

bjackman · on June 19, 2018

This is wonderful. By "S.W. Foss" as well...

pmalynin · on June 18, 2018

For those interested: nowadays Linux has support for what is called the EFI Stub boot, which allows the kernel to be booted directly by UEFI. So in a sense steps 1-3 are rolled into the kernel, and the kernel acts as its own boot loader.

chungy · on June 18, 2018

Wouldn't it rather just be UEFI taking place of steps 1-3?

The kernel becomes a PE binary instead of an ELF one, but I get the sense that it's basically the same thing...?

secure · on June 19, 2018

Do you know of any major distributions using this mechanism? I’ve only ever seen GRUB on UEFI.

rl3 · on June 19, 2018

Arch supports it:

https://wiki.archlinux.org/index.php/EFISTUB

Being able to directly edit your motherboard's EFI boot order via command line is admittedly pretty neat. That said, using a traditional bootloader seems to be a popular choice still for simplicity and portability reasons. EFISTUB also can make certain crypto setups more difficult as far as I'm aware.

checkyoursudo · on June 20, 2018

On Gentoo, for a couple of years, I used an initramfs compiled into the kernel to boot via EFI_STUB with full disk encryption. It's possible. It's been a while. I don't remember it being very easy to wrangle. I don't use that laptop anymore, but I still boot via EFI_STUB on my desktop. It's super simple to set up when you're not using FDE.

I don't remember there being any specific problem that make some crypto setups more difficult than others, with the caveat that implementing FDE at all makes EFI_STUB somewhat more complicated no matter what.

Like I said, it's been a couple of years since I figured all that stuff out, so the passage of time may have softened the terror of configuration and clouded my memories.

rl3 · on June 21, 2018

>... the terror of configuration ...

That's an excellent way of phrasing it.

checkyoursudo · on June 20, 2018

I have been booting an EFI_STUB kernel in Gentoo for several years. It's pretty easy to do. See, e.g., https://wiki.gentoo.org/wiki/EFI_stub_kernel

organsnyder · on June 19, 2018

When I was first learning Linux (almost 20 years ago [yikes]), I installed it by hand using the Linux From Scratch guide. It was invaluable for learning a ton about how Linux works at a deep level. While a lot has changed since then (mainly due to UEFI), that knowledge is still valuable for understanding how a system functions.

http://linuxfromscratch.org/

BJanecke · on June 19, 2018

A bit more lightweight version of that is https://wiki.archlinux.org/index.php/installation_guide The first time I tried to install arch I had many aha! moments

zshrdlu · on June 19, 2018

Similarly, the Gentoo Installation Handbook: https://wiki.gentoo.org/wiki/Handbook:AMD64

watersb · on June 20, 2018

Gentoo taught me Linux. The Handbooks are really good.

I think that much of early Gentoo ideas - the OpenRC init system, the ports system, the handbooks - were inspired by BSD traditions. OpenBSD and FreeBSD each have good documentation. Working through the FreeBSD handbook, and then study of McKusik's BSD Book [0], are a good way to get another perspective if you get into this sort of thing.

[0] http://www.worldcat.org/title/design-and-implementation-of-t...

BrandoElFollito · on June 19, 2018

When I was hiring junior sydadmins. I had them go through an install of Gentoo (some 15 years ago so things may have changed).

It was always an eyes opening experience for them.

AceJohnny2 · on June 19, 2018

> How the initramfs /init pivots into running your real system's init daemon on your real system's root filesystem is beyond the scope of this entry. The commands may be simple (systemd just runs 'systemctl switch-root'), but how they work is complicated.

Darn, I was actually hoping this would be elaborated on. Does anyone know of other (just as understandable) sources?

thatcks · on June 19, 2018

For switching to the real root filesystem, you want to read about the pivot_root system call, for example http://man7.org/linux/man-pages/man2/pivot_root.2.html (the usage examples in the pivot_root(8) manpage will help).

When the initramfs PID 1 is something simple like a shell script, I believe it just exec's the real root filesystem /sbin/init after the pivot_root and that init starts from scratch. For Linuxes where systemd is the initramfs init, there's a magic internal protocol to let a running systemd re-exec a new systemd binary while preserving all of its runtime state; this is used, for example, during systemd package upgrades and can be done deliberately with 'systemctl daemon-reexec'.

(I'm the author of the original entry.)

predakanga · on June 19, 2018

If you're interested in the userspace machinations, as opposed to what goes on in the kernel, there's a great rundown on this StackOverflow answer: https://unix.stackexchange.com/questions/226872/how-to-shrin...

There's also https://github.com/marcan/takeover.sh (previously on HN: https://news.ycombinator.com/item?id=13622301)

Both of these share a similar premise - set up and transfer control to an in-memory root filesystem, in order to make major changes to the real root filesystem.

cmurf · on June 19, 2018

You can boot with two additional boot parameters:

    systemd.log_level=debug rd.debug

That will get you very verbose logging to the journal, it'll include all the dracut stuff, and systemd setting up jobs and going through all sorts of logic. To read the result:

    sudo journalctl -b -o short-monotonic

I prefer monotonic time even though with these two boot options the startup will be significantly slower, it'll give you a better idea of the relative cost of everything than date/time.

JdeBP · on June 19, 2018

Some of this just came up on StackExchange, by coincidence.

* https://unix.stackexchange.com/questions/450223/

xfs · on June 19, 2018

dracut does the work. The reason it is complicated is because it supports booting from btrfs, DM RAID, MD RAID, LVM2, device mapper multipath I/O, dm-crypt, cifs, FCoE, iSCSI, NBD and NFS.

https://en.wikipedia.org/wiki/Dracut_(software)

AceJohnny2 · on June 19, 2018

Hm, Dracut looks like a userspace framework for building initramfs. I was wondering more about what's going on at the kernel level to switch the root. What depends on / ? Do any processes survive whose CWD is in the initramfs? Later in the article it's pointed out that systemd(from initramfs) passes the torch to systemd(in final root) while preserving state, so are any processes/ressources from the initramfs leftover or is everything cleared?

xfs · on June 19, 2018

A process spawned from initramfs can certainly live on but you wouldn't want to do that because that would prevent initramfs from being unmounted. By switching to the new root you can run real binaries. You can also not switch the root and also run real binaries from the new mount but that's not very convenient is it?

Pivoting root is not the complicated. The complicated is the process of detecting and setting up the real root filesystem which can involve loading kernel modules to dhcp negotiation to http requests and more. Dracut handles all these.

herpderperator · on June 18, 2018

What this article fails to mention is that having an initramfs is not required. It's only necessary if additional tools are needed to mount a particular filesystem, otherwise the kernel can run init directly from the root.

isostatic · on June 18, 2018

I think that was the way for Linux when I were a lad - everything could be compiled into tw kernel to the point the normal / partition could be mounted. Assuming it was under 2gb?

Is it still the case, with UEFI and systemd and other modern things that the damn kids do nowadays?

jschwartzi · on June 19, 2018

Yes. I have a systemd-based embedded system that uses initramfs so that I can execute some things out of RAM when I need to erase the root filesystem. The old mechanism was initrd which was an ext2 filesystem that was baked into the kernel. The new system, initramfs uses a CPIO archive instead. But it's basically the same idea.

voltagex_ · on June 19, 2018

Does anyone know why CPIO is used in particular? The commands for dealing with those archives are particularly arcane.

JdeBP · on June 19, 2018

For dealing with cpio archives, just use pax.

* http://netbsd.gw.com/cgi-bin/man-cgi?pax

* https://manpages.debian.org/unstable/pax/pax.1.en.html

* http://pubs.opengroup.org/onlinepubs/9699919799/

The answer to Why? is given in the kernel doco itself, in Documentation/filesystems/ramfs-rootfs-initramfs.txt .

vetinari · on June 19, 2018

Because it is simple, much simpler than tar, and has an universally agreed serialization format, without variants, unlike tar.

textmode · on June 18, 2018

True or false: An initrd (cf. initramfs) is still required, where "rd" stands for ramdisk.

As a BSD user, I prefer to use a mfs root (ramdisk) for personal reasons, but AFAIK BSD has never required a ramdisk in order to boot.

If I recall correctly Linux needed to use a ramdisk in its early days. Is this still true today?

kijiki · on June 19, 2018

Early on, Linux supported initrd, which was just a ramdisk with a FS image written to it and mounted. Later, initramfs came along, which is all anyone uses anymore. It is a cpio archive that is extracted onto a special form of tmpfs. One neat trick is that you can append multiple cpio archives, and the kernel will expand them one after another, so you can easily customize your distro-provided initramfs. Only the first one can be compressed, and you have to be careful about it getting overwritten by a kernel update, but it is handy for quick experiments. There is also pivot_root and switch_root, which have man pages if you're interested in the gory details.

For all of initrd/initramfs's history, you've only needed one if your monolithic kernel couldn't mount the root filesystem. So if you needed a module, or had some special config that needed to happen that the kernel couldn't do internally.

ajross · on June 19, 2018

Linux has never needed a ramdisk. If it's built with drivers for the boot device and it probes in an unambiguous manner, you can pass root=/dev/whatever99. That's as true today as it was in 1992.

checkyoursudo · on June 20, 2018

I pass root=PARTUUID=whatever to be safe, but yes this is how I boot an efi stub kernel directly with no rd/initramfs.

Just need to, for example, build the block device driver for the root disk into the kernel instead of a module.

exDM69 · on June 19, 2018

An initramfs image is not required if you can mount root directly and don't have any drivers or firmware that need to be up before the real init.

Some of these can also be bundled in the kernel image, but an initramfs makes things so much more flexible.

jschwartzi · on June 19, 2018

No, neither are required. initrd is obsolete and has been replaced by initramfs.

textmode · on June 19, 2018

It is not necessary, but it is enabled by default. False?

scott_s · on June 18, 2018

If anyone is curious what it looked like in 2004: http://www.scott-a-s.com/the-linux-boot-process-of-2004/

amorousf00p · on June 19, 2018

Was a much better time imho.

jbuzbee · on June 19, 2018

Back in the day, I seem to recall having to binary patch LILO every time I installed a new kernel to tell it what partition to boot. I suppose that dates me...

Florin_Andrei · on June 18, 2018

It's not really that different from how things were 20 years ago. There's a few bells and whistles added now, but for the most part it's still the old sequence.

(added this comment because of the "modern" qualifier in the title)

thatcks · on June 19, 2018

How much things have changed depends on what level you look at the boot process at. For example, before 2.3.41 introduced the pivot_root system call, the starting process in the initial ramdisk wasn't run as PID 1 (as far as I remember and can tell from the remaining kernel code for this). The kernel also used to be far more willing to do things itself, such as assemble software RAID devices; modern Linux pushes all of that into the initial ramdisk user level code, or even later in boot.

At the broad level, though, yes absolutely. Unixes with System V init have been drawing a distinction between 'single user' boot activities like fsck'ing the filesystems and 'multi-user' ones like starting daemons for a long time, so that's a two stage boot. Linux booting became three stage once it added initial ramdisks so that the core kernel didn't have to have to build in all of the pieces necessary to get the root filesystem.

(I'm the author of the linked-to entry.)

JdeBP · on June 19, 2018

You did, however, proffer a discussion of the Linux boot process, and Linux operating systems do not and did not have AT&T System 5 init. They have/had something resembling it, written by Miquel van Smoorenburg in the 1990s. But it isn't AT&T System 5 init, and it actively diverged from it early on.

Ironically, 20 years ago was almost three years after van Smoorenburg init+rc had diverged from the old idea of one "single-user" mode and had instead taken the route of two modes, emergency and rescue. In fact, we only need to wait a year or so until it has been 20 years since those names can be found in widespread use. It has already been more than 20 years since the name "emergency" gained traction.

* http://jdebp.info./FGA/emergency-and-rescue-mode-bootstrap.h...

So whilst things are like they were 20 years ago, how things were 20 years ago isn't in fact the old AT&T System 5 world that people nowadays tell one another it was. That was the 1980s, and it wasn't Linux-based. Indeed, by 25 years ago the AT&T world itself had already introduced ideas changing the original model of multi-user login, such as the Service Access Facility.

siebenmann · on June 21, 2018

Belatedly: that was a very interesting read on the history of this in both Linux and Unix more generally. Thank you for the link.

forgot-my-pw · on June 18, 2018

Brief overview on how ChromeOS boots (8 years ago): https://www.youtube.com/watch?v=mTFfl7AjNfI

Would love to watch an updated version today, if anything changes.

JdeBP · on June 19, 2018

Contrast a different explanation discussed at https://news.ycombinator.com/item?id=17326691

JdeBP · on June 19, 2018

I notice that I and M. Siebenmann reacted to this in similar ways. (-:

rl3 · on June 19, 2018

There's something to be said for flow charts when trying to explain a complex topic such as the boot process.

watersb · on June 20, 2018

Google terms: "Early Userspace"

21 · on June 18, 2018

I have a full disk encryption setup for my Ubuntu. It uses LVM, LUKS, and cryptosetup in GRUB.

The problem is that it takes forever to boot (5 min, not kidding). Apparently the whole boot partition is decrypted to a RAM disk or something.

tscs37 · on June 19, 2018

I use LVM and LUKS too. While I haven't encrypted /boot (due to using UEFI) I do have full disk encryption on the remaining disks.

The longest delay is mounting the spinning disks but overall after I enter the password boot takes less than 30 seconds...

I would recommend to check the output of the following commands:

    systemd-analye

    systemd-analyze critical-chain

These will output the boot times (firmware, bootloader, kernel, initrd, userspace) and the service chain that took the longest during boot.

lgunsch · on June 18, 2018

I have my whole disk encrypted, including swap, and use LVM with BTRFS on top. My key is included in the initramfs. After I type my password in the Grub prompt, it only takes about 60 to 90 seconds to do a full boot. Although, the disk is a SSD. I'm sure if I didn't have a SSD disk, the boot process would take a very long time.

CBLT · on June 18, 2018

> 60 to 90 seconds

That seems pretty long, what's your setup?

I noticed mine taking much longer than it should, and the reason seemed to be a lot of useless PKDF time. For every disk I had to decrypt, my system went through the LUKS table and tried the keyfile in order. By putting the keyfile entry first in the table and configuring it to minimum PKDF time (just for the keyfile, still millions of sha512 for passwords) my disks all decrypted in a couple seconds.

lgunsch · on June 19, 2018

I probably over estimated a bit, and I only reboot about once a week or so.

I only have the one disk to decrypt, and one key. When I was setting it all up I did run cryptsetup benchmark to choose the best hashing algorithm for my architecture. It ended up to be aes-xts sha256. Has enough iterations for about 2.5 seconds.

scintill76 · on June 18, 2018

I've similarly included my key in the initramfs, until I noticed the initramfs file in /boot is world-readable. (On Linux Mint.) I'm a bit over-paranoid maybe, but I don't want a compromise of my user account turning into compromising my keys. Have you had that issue, and fixed it cleanly? (I could just chmod, but I don't want to have to remember.)

For now, I just enter my password twice -- once for GRUB to load Linux, then again in Linux's early boot. I've been tempted to implement some way for GRUB to pass the key to Linux in RAM, avoiding the need for a key file.

lgunsch · on June 18, 2018

My initramfs is only readable by root (0600). The key is also in a file only readable by root (0600). The key file is added into the initramfs image when it's built.

edit: I'm on Arch Linux

zeth___ · on June 18, 2018

What's to stop someone with a live cd from reading it by mounting the file system?

lgunsch · on June 18, 2018

/boot is also fully encrypted on my system - which includes the initramfs, the backup initramfs, kernel, grub config, etc. /boot/efi is not encrypted, but only has the EFI modules.

The first stage of Grub (not sure one terminology here) knows about LUKS encrypted partitions and how to decrypt them. Only version 1 of the header though - found that out the hard way.

mmirate · on June 19, 2018

Then - since you aren't claiming to be Mike Cardwell - how do you know that your unencrypted LUKS drivers and EFI modules haven't been tampered-with?

CBLT · on June 19, 2018

Not the person you're asking, but see this reddit comment[0]

>If you are using a proper private secure boot setup (not the Microsoft keys, signing keys not accessible to an attacker that remotely compromises your system), you trust that your motherboard implements it correctly, you can safely assume nobody is physically tampering with your motherboard and you're using a variant of GRUB that fully implements secure boot, ESP access gives your attacker nothing. They can't change anything there without breaking the system.

[0] https://www.reddit.com/r/linux/comments/7n92ip/linux_boot_pa...

Rjevski · on June 19, 2018

If you’re letting GRUB handle the disk decryption then that’s part of your problem. The key derivation function in GRUB is much slower than in Linux.

If possible, switch to an unencrypted boot partition (and rely on Secure Boot to ensure integrity of GRUB and kernels), and only decrypt the encrypted partitions within Linux (more specifically in the initramfs).

sydney6 · on June 18, 2018

You might perhaps want to try another hashing algorithm for your /boot FS.

https://unix.stackexchange.com/questions/369414/grub-takes-t...

kzrdude · on June 18, 2018

I just boot using full root mount encryption (but the /boot & initrd is not encrypted). So it's lame, but it boots super fast (10 seconds including password typing), the main waiting time is to type in the password.

CJefferson · on June 18, 2018

There is also a number of problems in Ubuntu installs -- I was having a similar problem, this fixed it for me: https://askubuntu.com/questions/1029050/long-boot-times-on-1...

mjevans · on June 18, 2018

This depends heavily on your security model... Are you trying to use just a literal 'usb key' as a fob for booting the computer?

Decrypting the LUKS key will take some time, but you can ship the key for decrypting that in the initramfs that you'd store with the kernel on the USB drive.

newnewpdro · on June 18, 2018

This is not normal.