Hacker News new | past | comments | ask | show | jobs | submit login
Writing a Bootloader (3zanders.co.uk)
391 points by ingve on Oct 20, 2017 | hide | past | favorite | 52 comments



This is deep nostalgia for me. A bootloader and a toy ... kernel ... were the first things I ever wrote after leaving Apple II BASIC behind.

The local library had no books on anything other than BASIC and assembler. I didn't know what a higher-level language was - never even heard of C or Pascal or anything - I just knew that I didn't seem to be able to do a lot of stuff in BASIC. After a few hours at the library, I thought that assembler was the only option.

And that's how I spent that entire summer. Going over books on assembler that for some reason were in the library of a tiny Mississippi town.

Ralf Brown's Interrupt List made me giddy once I understood exactly what it meant. I still remember that day. It was in some TSR program that I could view while in EDIT.

Next year, I found out about Pascal, pirated a copy of TP(3? 5?) from my friend's dad and didn't touch assembler again until college. But I found the knowledge of how x86 worked useful for many years, even into my first real-world job.

God, I'm old.


You sound like me, except your kernel worked :) Good times.


There's even more from Alex: Part 2 [0] covers getting into 32-bit protected mode, and part 3 [1] deals with compiling, linking and running C++ code. There is also a very nice presentation available [2]:

[0] http://3zanders.co.uk/2017/10/16/writing-a-bootloader2/

[1] http://3zanders.co.uk/2017/10/18/writing-a-bootloader3/

[2] http://3zanders.co.uk/2017/10/13/writing-a-bootloader/writin...


Great to see someone is doing this in 2017. I was writing my own bootloader in 2003 with together with a little command interpreter in pure assembler. It taught mena lot about computers, it changed my thinking about what software really is. I think for instructional purposes, getting back to this basic level is a great thing.


> QEMU is great because you don’t have to worry about accidentally destroying your hardware with badly written OS code

Is that actually possible?


Yes. Not as easily as twenty years ago, but still fairly easily.

In fact, even well-written code can cause hardware-bugs to burn out your hardware. Modern OS's have a ton of hacks to work around limitations of certain hardware.

Example A: Intel's Atom C2000 family. [0] There's quite a few things there, SATA voltages being too high etc., but if we're looking for ultimate destruction then we can't go passed AVR54 from the errata.

> The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning.

If you lose the LPC clock... The system won't boot anymore.

The whole story isn't in the link, but apparently the cause for this, is that the clock can suddenly stop functioning if you check it "too often", usually failing after 18 months of use.

Cisco has the same problem with some of their routers, from the same hardware.

It's a hardware bug, but with a quick BIOS fix, Cisco and Intel have worked around it, and their devices keep working without a recall... But your own code hasn't, and will eventually brick your device.

Oh, and notice the date on Intel's link. April, 2017.

[0] https://www-ssl.intel.com/content/dam/www/public/us/en/docum...


As recently as 4 years ago there were Samsung laptops that could be bricked by installing Linux. Apparently there was a special physical memory address that you weren't supposed to poke and which Linux did, which resulted in bricking the laptop.

https://arstechnica.com/information-technology/2013/02/booti...


Not quite right. The documented interface that was used for firmware control on BIOS systems generated errors if you poked it on UEFI systems, and the kernel logged those errors into UEFI variables. If the UEFI variable storage filled up, the machine stopped booting. It was possible to trigger the failure by filling the variable store, even under Windows.

The workaround was to reserve some variable storage space at all times, but this was made difficult due to the way some UEFI implementations do garbage collection - deleting variables wouldn't free the space. The workaround is to try to create a veritable that's larger than the reported free space in order to trigger garbage collection, but not all implementations will do garbage collection at runtime. So the kernel will check how much free space there is during early boot and try to create an oversized variable in order to trigger gc before it transitions to runtime mode. Last time I checked, Windows did nothing to stop you killing your system.

Computers are bad.

(source: I debugged this and wrote most of the remediation code)


I love this comment, in part because it's totally outside my area of knowledge but I can still picture the battles and dead ends you must have worked through to make it stable.

Did you kill a lot of hardware figuring it out? What was the purpose of poking that caused the underlying issue?


I only killed one laptop in the end, the rest of the testing was done with VMs and a system that had a hardware EFI variable reset button. The issue that was triggering it on Samsungs was the driver that supported backlight control - it worked by writing a value to an address which triggered some system management mode code in the firmware, but that code only worked properly in BIOS mode. In EFI mode it generated machine check exceptions, which in turn caused the kernel to dump the backtrace to EFI variables.


> a system that had a hardware EFI variable reset button

Ooh, that sounds useful for hardware hackers. What was this?


The only times I've heard of destroying hardware with software have been: 1) stopping the ray in a CRT monitor through special purpose registers and using it to burn through the phosphorous. 2) Early floppy drives where you could position the head to an impossible position causing the servo to burn.

Haven't heard of anything like what he is describing the last 20 years. Perhaps you can overheat some stuff - but most likley it'll shut down before anything bad happends. There are however some worrying notes on OSDEV about causing potential damage when probing for memory, perhaps the author read this and got worries. It isn't detailed or likely, imho.


On computers where `efivars' is mounted read-write, you can possibly brick your UEFI by deleting the contents of that “file system” https://github.com/systemd/systemd/issues/2402


Tangent, but a good story: Early in my career, in the CRT era, I was working late at a client's office. One of the executives stopped by to chat; he was telling me that his house burned down the past weekend (!) due to a wiring fault in an SUV in the attached garage that somehow ignited something ... whatever caused it, holy shit. His family all got out ok - in the middle of the night - but everything they owned was lost. As we were walking through the darkened office on our way out - the last two to leave - he said, 'do you smell something burning?'

Poor guy; he must smell it everywhere. For all I know there was still smoke residue in his sinuses. Anyway, I have to humor him - I'm tired and really want to get home, but there's no question of trying to talk sense to someone in his state. So I make a show of looking around the office with him - and sure enough, there was a CRT monitor, plugged in, off (or maybe asleep), and literally smoking. I never saw anything like it before or since, in over 20 years in IT. But he must still smells smoke everywhere he goes.


> The only times I've heard of destroying hardware with software have been

I was trying to find some information to back up my story, but I can't find anything that does. So I'll describe what I experienced, and maybe someone will have an idea.

Around 1999, my father gave me the first computer that was "mine" (previous ones having been "family computers"). I was inexperienced and 15 years old, with access to filesharing platforms, and learned the hard way about *.jpg.exe files.

The hard drive started making rhythmic sounds as soon as the OS was booted. A couple days later, the OS wouldn't boot. A reinstall worked for a short time (but the drive still did its odd sound). We had some bootable disk scanning utilities from the drive vendor. They identified the drive as having 100% bad sectors.

I've always assumed that a virus was crashing or misaligning the read heads somehow. That was reinforced when the second drive that I got met the same fate. Although, I guess it's more likely that they were 2 drives from the same shipment that met early deaths due to manufacturing defects.


I have never heard of anything like the following, but here is a reasonable yet very much theoretical explanation for what you described:

This virus loaded itself somehow at early bootup (maybe even launched via an altered bootloader) and then sequentially accessed every single sector on the disk and deliberately marked it as bad at either the FAT32 or ATA (hardware) level.

The bustlework involved with actually issuing tons of such ATA commands could explain the thrashing.

Ref/inspiration for this theory: ^F for "--make-bad-sector" in https://linux.die.net/man/8/hdparm

(Just to be redundantly, obsessively clear, this parameter is several orders of magnitude more dangerous than "rm -rf --no-preserve-root", as hdparm will use ATA/SCSI commands that will be preserved by the hardware across infinite reboots until exactly the right --repair-sector command is issued.)

And FWIW, I do see a lot of holes in this (very simplistic) interpretation, and would be genuinely stunned if this is what actually happened.


Given the year, that sounds suspiciously like the IBM 75gxp Deathstar fiasco. It probably wasn’t your fault at all.


Was about to say the same when I refreshed and saw your comment. For GP: https://en.wikipedia.org/wiki/Deskstar#IBM_Deskstar_75GXP_fa...


1999, maybe 2000, with 6.4GB and 8.4GB drives (I suspect they were WDs, based on disk utility floppies I've got around from that era). It looks like the affected Deskstars were 15-75GB, and probably at least a year later, right?


On the Commodore PET you could write a short BASIC program to rapidly change the direction of the tape drive motor, it would fry the transformer that ran it.


I had a coworker who used to be a C64 enthusiast. One day, a very long time ago, he discovered a C64 program on Usenet called "drive music".

He downloaded it and ran it... and his floppy drive started playing music. After playing with it for a while, he heard that using it too much could throw your drive heads out of alignment, so he got rid of it.


For me, alignment problems on the 1541 were fairly common (there was a lot of head thrashing when trying to read "protected" disks), which ended up with the 1541 cover being easily removable for me, and always a screwdriver nearby to recalibrate.


I've always been told that there is risk of irreversible hardware damage. I haven't any idea where this claim comes from, and I've never been offered an explanation. It's one of the things that has always made me wonder, and why I've never tested any of my own code on real hardware. I'd be curious to see this claim expanded, or debased.


Yep.

Back in the early days of GNU/Linux, getting X to run could kill your monitor.

Another example, many hard drive controllers used to blindly move their heads to whatever position they were told, even if it was an incorrect position. In the old days you also had to use something like PC Tools to park the heads before moving the computer.


You can brick laptops with rm -rf /

https://m.slashdot.org/story/306621


I'd say yes. Especially if you mess with ACPI and, say, turn off the fans...


Modern processors just thermal throttle to the extreme if the heat can’t be dissipated, no?


Experience with an incorrectly heatsunk laptop GPU has demonstrated to me that the BIOS will yank power to the system if it hits 100°C.

Yeah.

Well, in all honesty I'm not sure if it was Linux, the BIOS, or some low-level thermal monitoring circuitry, but yeah, I one minute I was messing around with WebGL and the next minute I got confused about whether I'd forgotten the power cable. (Then I realized I hadn't heard any beeping (this was a ThinkPad and I didn't have the beeper muted) and realized it had overheated.)


They will shut down the whole computer.


Adding to the already existing examples: Back in 1998, the CIH virus https://en.wikipedia.org/wiki/CIH_(computer_virus) bricked quite a few computers by overwriting the BIOS.


The possibility is small, but with voltages and frequencies under software control, yes. AFAIK even ordinary "non-overclocker" hardware has this capability, despite not being exposed in the BIOS settings, for power management purposes.

This is especially problematic with certain laptops --- I remember hearing about some that would overheat when booted into BIOS or even an alternative OS for an extended period of time, since they depended entirely on a driver to turn on and control the CPU fan...


Yes in the 80s PC monitors had CGA and MONOCHROME mode, and if you switched fast between them in DOS (you could do mode mono, and mode co80 in a loop) and it fried my GPU :)


Apparently hackers can turn your home computer into a bomb

...& blow your family to smithereens!


Last line in this article mentions a Part 2, which will cover getting into Protected Mode. Which implies that x86 boxen are still to this day are POSTing in 16-bit real mode

What I'm wondering, is whether this is because of the design of the firmware, hardware or both. Back when protected mode was the new hotness, it made sense for the CPU to power on in real mode, for backwards compatibilty. But back-compat w/ DOS is less of a concern today than it was 20 years ago. Is it still required for back-compat with older versions of Windows, since UEFI wasn't commonplace in the WinXP or Win7 days? Does UEFI have to lift the CPU from real-mode to protected mode, or does it leave that to the OS?

Another thing I wonder about, is if it's possible to have the CPU come online directly in protected mode or long mode after POWER_OK has settled and the motherboard releases the reset line. I recall reading various datasheets for tiny specialized controller chips (fan controller, et. al.), wherein by leaving various pins floating, or asserting them high or low with pull-up/down resistors, one set the power-on value of the register(s) which controlled start-up behavior. It'd be cool if you could do that with a modern mobo/CPU. But even if it were, I suspect it would be a mod reserved for those brave enough to take a soldering iron their motherboard. I doubt such is a common enough need that mobo manufacturers are exposing that through jumpers or firmware config.


UEFI boot handles the real->protected (and ->long) transitions, so it's no longer necessary for the OS to handle it. EFI executables run in protected mode, with a memory mapping set up by the runtime.

> Another thing I wonder about, is if it's possible to have the CPU come online directly in protected mode or long mode after POWER_OK has settled and the motherboard releases the reset line.

No. The BIOS needs to perform some touchy, hardware-specific initialization -- like detecting and calibrating RAM -- before releasing control to user code. It's not something you'd want the OS to be responsible for.


>> Another thing I wonder about, is if it's possible to have the CPU come online directly in protected mode or long mode after POWER_OK has settled and the motherboard releases the reset line.

> No. The BIOS needs to perform some touchy, hardware-specific initialization -- like detecting and calibrating RAM -- before releasing control to user code. It's not something you'd want the OS to be responsible for.

I think you misunderstood me. I didn't mention, and wasn't even thinking about the OS yet. As you said, UEFI handles the real->protected->long transitions. While the OS isn't loaded yet, and hardware initialization is being done, the CPU is still executing. What I'm talking about is setting the CPU's default power-on state, how it is before it begins executing even the boot firmware to initalize hardware and prepare to hand over execution to the OS.

By my understanding, when the power comes on, the motherboard waits for the PSU to assert POWER_OK. Once the PSU has done so, and POWER_OK has settled, then the mobo releases the CPU's RESET line, allowing the CPU to begin executing. At this point, the CPU is in real mode. What I am wondering is if the hardware can be configured so that once the motherboard releases RESET, the CPU is already in long mode, and begins executing the boot firmware. Is there something about the nature of pre-OS hardware initializtion that requires the CPU to be in real-mode to do this?

If CPUs could be coming online directly in long-mode, then perhaps the UEFI firmware could be simplified, since it doesn't need to handle the real->protected->long transitions anymore.


> No. The BIOS needs to perform some touchy, hardware-specific initialization -- like detecting and calibrating RAM -- before releasing control to user code. It's not something you'd want the OS to be responsible for.

I don't feel like the person you're responding to suggested that the OS should be responsible for it; he/she just wondered if it is possible to have the CPU not have to jump through 16-bit real mode and 32-bit protected mode just to get to 64-bit.

I see no fundamental reason why it should be impossible for the BIOS to do its initialization work in 64-bit mode. There's the issue of paging and the page-table perhaps, but it seems trivial enough to suggest that the CPU could initialize with a very basic 1:1 mapping between virtual memory and RAM.

That said, I don't know if such a thing would ever actually happen, since I imagine that it would suggest re-writing a lot of code.


meh. since the memory controllers are part of the CPU proper this is a pita, but not clearly super magic that only contractors of motherboard manufacturers should touch either.

I worked someplace where we ran AMD chips with no bios. someone who had better uses for his time had to probe the ram modules and configure the controller. it wasn't the end of the world.

same with pci tree discovery (waste of time, not the end of the world)

interacting with EFI is a little more sane than the old bioses, and less proprietary - but i dont think there as clear a line as you imply


It's a great write-up (follow the next parts if you've only seen the 1st one)

I wonder how this changes by booting from UEFI (and not using any 'emulation mode')


> I wonder how this changes by booting from UEFI (and not using any 'emulation mode')

It's pretty damn easy:

1. You don't write a bootloader since UEFI is that bootloader.

2. You write a portable executable which gets executed by the UEFI bootloader. This executable runs directly in long-mode without any nonsense.

3. You have to link against some different C-libraries, not the standard ones, but that's about it.

Here's an example: http://x86asm.net/articles/uefi-programming-first-steps/


Yeah, I really want to write a hobbyist OS atop UEFI & amd64, but … the learning curve is daunting. BIOS is too simple, but UEFI is hyper-complex. There's probably a reason most hobbyist OSes seem to be using BIOS …


> Yeah, I really want to write a hobbyist OS atop UEFI & amd64, but … the learning curve is daunting. BIOS is too simple, but UEFI is hyper-complex.

That doesn't really seem very representative of reality. UEFI basically takes care of all the terrible legacy stuff for you, so you don't have to.

You can just focus on the OS, built on a modern architecture.

See my other comment regarding BIOS vs UEFI: https://news.ycombinator.com/item?id=15517300

If you also try to compare UEFI vs BIOS on a deeper technical level, UEFI also seems to come out in a favourable way: https://www.happyassassin.net/2014/01/25/uefi-boot-how-does-...

The only "complex" part about it, is that you already know and have come to terms with all those terrible & complex things which booting a OS from legacy BIOS-mode entails, but UEFI while simplifying a million things is still different and something you haven't learned yet.


> That doesn't really seem very representative of reality … You can just focus on the OS, built on a modern architecture.

But that means complexity. E.g. BIOS just loads the first 512 bytes from a volume into memory; UEFI requires FAT filesystems, with paths &c. BIOS routines can easily be called from assembly; I don't know if UEFI routines can, or if the EFI Development Toolkit is required (I could find out, of course — but that's part of the learning curve).

I have no doubt that once I learn it all I'll prefer UEFI. But, as I said, the learning curve is daunting.


UEFI itself might be tricky but, actually, nothing stops you from burning those 512 bytes on a flash drive and booting from it on Legacy mode.


It’s probably easiest to download boot3.asm directly.

This is kind of funny, since this is the only link that's broken. http://3zanders.co.uk/2017/10/13/writing-a-bootloader/boot3....

I tried to email the author, but I couldn't find his email anywhere.


Hello - author here :) A silly mistake I've fixed it now! http://3zanders.co.uk/2017/10/13/writing-a-bootloader/boot3....


The challenge is to fit the FAT or EXT2 filesystem in single-block bootloader itself. Also it should read a track at time so that booting from old floppy drives is fast:

https://github.com/jhallen/joes-sandbox/tree/master/boot


Nobody knows how to do this anymore. About half the projects I got when I freelanced were projects that were finished but "just needed" a boot loader.


I've just finished studying the bootloader and JOS Kernel and it's nice to see that people are actually working around this in 2017 :)


Any good articles about doing this in Forth?


I guess Open Firmware is the canonical answer. This of course compiles for x86. But it's massive and sprawling and not really designed for "can be easily grasped while leaving mental room to actually learn something" but rather for wide-spectrum support.

--

You of course already know about ColorForth (confident guess), which of course fits the bill here. Not quite "build an OS" and more "finished thing", but certainly hits the mark of "rapidly bring up new hardware functionality".

--

I just found 4os (article: https://news.ycombinator.com/item?id=12709802)

--

I found http://forthos.org/drive.html a little while ago but completely failed to get it going in QEMU (the CD image boots GRUB, but promptly hangs on loading). I haven't yet tested it with any other emulator, and haven't fed it to any actual hardware yet either.

HN article: https://news.ycombinator.com/item?id=2973134

--

There's also gfxboot, SuSE's approach to bootloader management.

This is a scary pile of assembly language (https://github.com/openSUSE/gfxboot/blob/master/bincode.asm - warning: 16k lines, large webpage) that parses an equally scary script grammar (see .bc files in https://github.com/openSUSE/gfxboot/blob/master/themes/openS...) that is heavily inspired by Forth (the stack/RPN grammar is right there) but also reminds me of Tcl as well (it uses a { } block syntax).

AFAIK this ships on the install media, and I also vaguely recall building it from scratch being very easy (I used SYSLINUX to bootstrap it).

--

This isn't quite an OS-dev thing, but I think it's fun: http://phrack.org/issues/53/9.html


Simple and nice article !!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: