Nice little article, but the common wisdom in "hobby" OS development is: if you want to write an operating system, do not write a bootloader.
The arcane details of initializing a x86 PC are not a good use of brain space or programming time (things like A20 line). The OP says he spent weeks writing this small snippet. There is some appeal in doing it "from the ground up" in Assembly but that's very slow and tiresome. And there's still the PC BIOS code blob running before your OS loads, so it's not like you're in control of the CPU right from the first instruction it executes.
It takes roughly the same amount of trouble as this article (or slightly less) to build a bootable ELF file using the Multiboot protocol, which you can boot to using qemu or bochs directly (e.g. qemu -kernel mykernel.elf) or on real hardware using GRUB.
There are lots of advantages to this: the bootloader will initialize the CPU for you and you'll enter directly into 32 bit protected mode, the state of the CPU will be in a well defined state. Additionally you can use the bootloader to set up a graphics mode and provide a physical memory address to the VRAM (this would require falling back to 16 bit mode for some BIOS interrupts and then back to 32 bit mode).
You'll also get a proper ELF binary file with all the section information (useful for setting up your page tables), and debugging symbols so you can debug your kernel in GDB which is a lot more productive than using the built-in monitor facilities of QEMU or Bochs. And you can actually use a linker so you don't have to put all your code in one file (you will need a linker script).
It also makes sense to boot to C code as soon as possible (Rust is another option, C++ too but it's a bit more trouble). I actually did write a small OS prototype in "raw" assembly (with interrupt handling and simple threading), but it wasn't very productive and I quickly reverted to writing C code instead and got a lot more done in less time.
You shouldn't need a lot of assembly code, only the early boot code (a few hundred lines), the interrupt handler entry point and trampoline code for booting the other cores of your CPU. In addition, there will be some inline assembly oneliners for accessing privileged instructions (like lgdt, etc).
Here's the beginnings of my hobby OS project from years ago. I made the mistake of using 64 bit long mode, which is a lot more work to get booted and not a very good use of time. Stick to 32 bit mode if you want to get things done.
As another alternative, bootstrap your OS using EFI. You get to boot directly in 32-bit or 64-bit mode, as with multiboot, and you get a handful of OS-like facilities to bootstrap yourself with, such as files, input, output, graphics, and even networking. (You'll want to replace those with your own facilities eventually, but until then you can focus on more interesting problems.)
KVM/QEMU with the OVMF BIOS image provides a full emulation environment for testing and debugging. You can build EFI binaries using entirely Open Source tools.
Also try BITS (biosbits.org) for experimentation with either BIOS, ACPI, and EFI; it's a port of Python to run in GRUB2. (I'm one of the upstream authors.)
Firmware? No, the OS will have to explicitly choose the mode the CPU operates in.
The PC, under "normal" legacy BIOS, always boots to 16 bit mode (like in OP). From there on, you manually have to change the processor operating mode to 32 bits (protected mode) or 64 bits (long mode), by setting some privileged registers and doing a jump to 32/64 bit code.
In the case of UEFI, it may be possible to explicitly ask for a certain operating mode at boot, but I am not familiar with that (yet).
True, the portion of your OS that actually uses EFI services must work in both 32-bit and 64-bit mode, so you'll have to write it portably. That's somewhat easier since you don't need to write any assembly; you can write in any high-level language with an FFI to C. However, you can run a 64-bit OS with 32-bit EFI or vice versa; you just have to switch modes yourself. And you can defer that until later in the development of your OS.
Once you switch modes, you better stick to Runtime Services, so networking etc is out.
But yes, for the beginning all that might not matter too much and you can get by running your "OS" as BDS application and incrementally migrate away from that.
Even switching back and forth at the right times seems somewhat risky to me before ExitBootServices(), while Runtime Services are restricted to a subset that's much more resilient to external state changes (and doesn't run background stuff).
NetBSD rump kernels may provide a better base for OS experiments where you can use working code until you have replacements ready - based on a real OS, not some API that accidentally became one.
The arcane details of initializing a x86 PC are not a good use of brain space or programming time (things like A20 line). The OP says he spent weeks writing this small snippet. There is some appeal in doing it "from the ground up" in Assembly but that's very slow and tiresome. And there's still the PC BIOS code blob running before your OS loads, so it's not like you're in control of the CPU right from the first instruction it executes.
It takes roughly the same amount of trouble as this article (or slightly less) to build a bootable ELF file using the Multiboot protocol, which you can boot to using qemu or bochs directly (e.g. qemu -kernel mykernel.elf) or on real hardware using GRUB.
There are lots of advantages to this: the bootloader will initialize the CPU for you and you'll enter directly into 32 bit protected mode, the state of the CPU will be in a well defined state. Additionally you can use the bootloader to set up a graphics mode and provide a physical memory address to the VRAM (this would require falling back to 16 bit mode for some BIOS interrupts and then back to 32 bit mode).
You'll also get a proper ELF binary file with all the section information (useful for setting up your page tables), and debugging symbols so you can debug your kernel in GDB which is a lot more productive than using the built-in monitor facilities of QEMU or Bochs. And you can actually use a linker so you don't have to put all your code in one file (you will need a linker script).
It also makes sense to boot to C code as soon as possible (Rust is another option, C++ too but it's a bit more trouble). I actually did write a small OS prototype in "raw" assembly (with interrupt handling and simple threading), but it wasn't very productive and I quickly reverted to writing C code instead and got a lot more done in less time.
You shouldn't need a lot of assembly code, only the early boot code (a few hundred lines), the interrupt handler entry point and trampoline code for booting the other cores of your CPU. In addition, there will be some inline assembly oneliners for accessing privileged instructions (like lgdt, etc).
Here's the beginnings of my hobby OS project from years ago. I made the mistake of using 64 bit long mode, which is a lot more work to get booted and not a very good use of time. Stick to 32 bit mode if you want to get things done.
https://github.com/rikusalminen/danjeros