It could kexec other kernels but probably won't be able to jump to other OS boot...

derefr · 2024-07-08T22:07:44 1720476464

The sibling comments who think you need to jump back to EFI to solve this, are thinking in layer-ossified terms. This is Redhat proposing this, and they're perfectly confident in upstreaming kernel patches to make this happen.

I would assume that in their proposed solution, the kernel would have logic to check for a CMDLINE flag (or rather, lack of any CMDLINE flags!) to indicate that it's operating in bootloader mode; and if decides that it is, then it never calls ExitBootServices. All the EFI stuff stays mapped for the whole lifetime of the kernel.

(Also, given that they call this a "unified kernel image", I presume that in the case where the kernel decides to boot the same kernel image that's already loaded in memory as the bootloader, then nothing like a kexec needs to occur — rather, that's the point at which the kernel calls ExitBootServices (basically to say "I'm done with caring about being able to potentially boot into something else now"), and transitions from "phase 1 initramfs for running bootload-time logic" into "phase 2 initramfs to bootstrap a multi-user userland.")

garaetjjte · 2024-07-08T22:23:50 1720477430

>and if decides that it is, then it never calls ExitBootServices

That's unlikely, I think that would mean you cannot use native drivers, at which point you're just writing another bootloader. I suspect they only planning to kexec into target kernel, not chainloading other EFI bootloaders.

drewdevault · 2024-07-09T07:21:40 1720509700

Something that hasn't been addressed by comments here yet is that you could implement EFI boot services in the Linux kernel and essentially turn Linux into a firmware interface. Though note that I generally shy away from any attempts to make the kernel into a really fat bootloader.

derefr · 2024-07-09T18:23:34 1720549414

I mean, you can and you can't.

AFAIK, the UEFI spec imposes no requirement that (non-hotplug) devices be re-initializable after you've already initialized them once. Devices are free to take the "ExitBootServices has been called" signal from EFI and use it to latch a mask over their ACPI initialization endpoints, and then depend on the device's physical reset line going low to unmask these (as the device would start off in this unmasked state on first power-on.)

Devices are also free to have an "EFI-app support mode" they enter on power-on, and which they can't enter again once they are told to leave that mode (except by being physically reset.) For example, a USB controller's PS2 legacy keyboard emulation, or a modern GPU's VGA emulation, could both be one-way transitions like this, as only EFI apps (like BIOS setup programs) use these modes any more.

Of course, presuming we're talking about a device that exists on a bus that was designed to support hotplug, the ability to "logically" power the device off and on — essentially, a software-controlled reset line — is part of the abstraction, something the OS kernel necessarily has access to. So devices on such busses can be put back in whatever their power-on state is quite easily.

But for non-hotplug busses (e.g. the bus between the CPU to DRAM), bringing the bus's reset line low is something that the board itself can do; and something that the CPU can do in "System Management Mode", using special board-specific knowledge burned into the board's EFI firmware (which is how EFI bring-up and EFI ResetSystem manage to do it); but which the OS kernel has no access to.

So while a Linux kernel could in theory call ExitBootServices and then virtualize the API of EFI boot services, the kernel wouldn't be guaranteed to be able to actually do what EFI boot services does, in terms of getting the hardware back into its on-boot EFI-support state.

The kernel could emulate these states, by having its native drivers for these devices configure the hardware into states approximating their on-boot EFI-support states; but it would just be an emulation at best. And some devices wouldn't have any kind of runtime state approximating their on-boot state (e.g. the CPU in protected mode doesn't have any state it can enter that approximates real mode.)

derefr · 2024-07-08T23:05:54 1720479954

You're right (as I saw another comment cite the primary-source for); but I'm still curious now, whether there'd be a way to pull this off.

> I think that would mean you cannot use native drivers

Yes, that's right.

> at which point you're just writing another bootloader

But that's not necessarily true.

Even if you could only use EFI boot+runtime services until you call ExitBootServices, in theory, an OS kernel could have a HAL for which many different pieces of hardware have an "EFI boot services driver" as well as a native driver; and where the active driver for a given piece of discovered hardware could be hotswapped "under" the HAL abstraction, atomically, without live HAL-intermediated kernel handles going bad — as long as the kernel includes a driver-to-driver state-translation function for the two implementations.

So you could "bring up" a kernel and userland while riding on EFI boot services; and then the kernel would snap its fingers at some critical point, and it'd suddenly all be native drivers.

Of course, Linux is not architected in a way that even comes close to allowing something like this. (Windows might be, maybe?)

---

I think a more interesting idea, though, would come from slightly extending the UEFI spec. Imagine two calls: PauseBootServices and ResumeBootServices.

PauseBootServices would stop all management of devices by the EFI (so, as with ExitBootServices, you'd have to be ready to take over such management) — but crucially, it would leave all the stuff that EFI had discovered+computed+mapped into memory during early boot, mapped into memory (and these pages would be read-only and would be locked at ring-negative-3 or something, so the kernel wouldn't have permission to unmap them.)

If this existed, then at any time (even in the middle of running a multi-user OS!), the running kernel that had previously called PauseBootServices, could call ResumeBootServices — basically "relinquishing back" control over the hardware to EFI.

EFI would then go about reinitializing all hardware other than the CPU and memory, taking over the CPU for a while the same way peripheral bring-up logic does at early boot. But when it's done with getting all the peripherals into known-good states, it would then return control to the caller[1] of ResumeBootServices, with the kernel now having transitioned into being an EFI app again.

[1] ...through a vector of the caller's choice. To get those drivers back into being EFI boot services drivers before the kernel tries using them again, naturally.

It's a dumb idea, mostly useless, thoroughly impractical to implement given how ossified EFI already is — but it'd "work" ;)

Joker_vD · 2024-07-09T10:40:46 1720521646

Giving "the control of hardware back" is going to be extremely difficult. Just look at the mess that ACPI is: there are lots of notebooks that Linux can not put into/back from hibernation, and here we're talking simply about pausing/resuming devices themselves. What you are proposing means that an OS would have to revert the hardware back to the state that would be compatible with its state at the moment of booting, so that UEFI could manage it correctly. I don't think that's gonna happen.

yjftsjthsd-h · 2024-07-08T22:01:21 1720476081

This is being discussed more extensively in other comment threads but it sounds like maybe there's a way for it to just reboot but set a flag so the firmware boots into a different .efi next time (once).

p_l · 2024-07-09T07:33:19 1720510399

You can set BootNext variable to number of BootXXX variable you want to use once for next boot.

TylerE · 2024-07-08T21:59:03 1720475943

Theoretically, couldn't it just write to a "boot this image next time" field (is the legacy MBR area available?) and trigger a reboot?

adtac · 2024-07-08T22:25:49 1720477549

The target image would need to reset that field so that a second reboot puts you back into the bootloader because otherwise you'll be stuck booting that image forever.

rcxdude · 2024-07-09T00:24:38 1720484678

The image doesn't need to do it, that's how UEFI bootnext works: the firmware resets the flag before it loads the image.

garaetjjte · 2024-07-08T22:27:19 1720477639

Well you could change default boot entry in efivars, but if you're relying on firmware for that why not use firmware provided boot menu anyway?

Arch-TK · 2024-07-09T16:12:23 1720541543

The boot disk isn't guaranteed to be writable.

TylerE · 2024-07-09T17:47:36 1720547256

Even after you’ve already installed a custom boot laser to it? I mean, I agree with you in principle, but we already have the chicken - can’t existence of the egg be assumed?

Arch-TK · 2024-07-10T18:47:38 1720637258

Aside from the DVD issue mentioned in the other person's comment. I have a design for a SED OPAL based encryption setup where the system boots with a read-only boot partition and it only becomes RW as part of the initramfs running (although optionally you can just keep it RO until you need to write to it, but this requires buy-in from the package manager).

I think network booting with EFI would also suffer from a similar problem.

yjftsjthsd-h · 2024-07-10T18:05:18 1720634718

Consider a DVD that's EFI bootable; we can have whatever bootloader we want on the disc but it is not physically writable