Booting Linux using UEFI can brick Samsung laptops

UnoriginalGuy · on Jan 30, 2013

So random academic question - Should it be literally possible for an OS to "brick" hardware even if the OS was intentionally designed to do just that?

Now I know in this particular case Samsung wrote both the driver and the firmware, so it is easy to point blame here.

But more broadly, should hardware be built/designed so it has a "fail safe" mode where it just won't allow its self to be damaged by OS/software instruction?

PS - Let's just assume BIOS/uEFI firmware updates are off the table for this discussion. Since many modern uEFIs allow you to disable user updates entirely.

xmodem · on Jan 30, 2013

Yes - hardware should be designed to always provide a way to recover from any failure mode that can result from software that's running on that hardware. Even bad firmware updates should be recoverable - what happens if you lose power mid-update?

cdawzrd · on Jan 30, 2013

Bad firmware updates are (almost always) recoverable, just not always (easily) by the end user. What happens when you lose power is that you need to use the hardware-level debugging/testing functionality of whatever programmable device is in question--JTAG, for example--to load a running firmware image and then reprogram the half-programmed flash storage.

Someone · on Jan 30, 2013

I expect that you also would like "hardware should be designed to always provide a way to update all of its software" to be true.

I don't think you can expect both to be true nowadays. For example, chances are that your battery charging 'hardware' runs some software. That software, if replaced with faulty software, can destroy your batteries and with it, maybe even your motherboard (through fire, acid leaks, and the like)

This applies elsewhere, too. Historically, we had the 'killer poke' (http://www.6502.org/users/andre/petindex/poke/index.html; variants at http://en.wikipedia.org/wiki/Killer_poke).

Nowadays, it is rumored that buggy baseband firmware for mobile phones can fry the hardware.

I do not think it is feasible to prevent all of these in hardware. Because of that, you must accept "if you can update all software through software, you can brick your device through software".

So, to get to "provide a way to recover from any failure mode that can result from software", you will need to have some unmodifiable software on the device. You also will need that software to allow updating of some firmware and to be free of bugs. I think that is possible, but not economically feasible. Why would anyone spend even a week on bug-checking that earliest running code on a device that will be sold for only six months? That would be giving up the bestselling 4% of the sales cycle.

xmodem · on Feb 1, 2013

> That software, if replaced with faulty software, can destroy your batteries and with it, maybe even your motherboard (through fire, acid leaks, and the like)

I'd put that in the same category as writing robot arm control software that throws the arm off the table, for instance.

lmm · on Jan 30, 2013

"Should"? In some academic sense, of course everything should be engineered perfectly. But in the real world there are tradeoffs, and people pay more for flexibility and functionality than for safety. I don't see that changing in the near term.

keeperofdakeys · on Jan 30, 2013

The word is bugs. It's very hard, and almost impossible to fix all bugs. Code with no bugs is really code with no known bugs. Especially when dealing with C and assembly, different parts of the code can effect other parts in ways a programmer may not anticipate. You really need to get a 'safer' language to prevent certain types of bugs, especially with the "data and code share the same segments" model of C.

UnoriginalGuy · on Jan 30, 2013

It isn't /just/ bugs. It is also the way we think about the relationship between the underlying hardware and the kernel/OS.

There is an inherent level of "trust" there. Which makes sense. But the question is: should that "trust" extend to permanent damage to the hardware/firmware?

For example, a lot of GPUs allow the OS to control fan speed. You can literally set the fan speed so low the GPU will over-heat and damage its self. That isn't a "bug" that is a "feature."

Again, we come back the expectations/relationship.

keeperofdakeys · on Jan 30, 2013

It's always said "hardware is expensive, software is cheap; so do as much as you can in software". When you also remember that we give an OS full control of all hardware, it's the natural conclusion that things like this can happen. Although, this is changing, subtly. Now you have harddrives and SSDs that run their own software that can't be controlled by the OS. The firmware may even lie to the OS, for example harddrives might tell the OS "I've written this data", when it might actually be caching it for more efficient writing. Unfortunately computes are nearly infinitely complex these days, with so many layers developed by many people.

Hoff · on Jan 30, 2013

Yes, bugs happen.

So what can be done?

There are systems with firmware that can automatically detect failed updates or corrupted firmware, or where a failsafe firmware loader can be triggered by a jumper or related request, and that can then perform a reset and (re)load of replacement firmware.

Without requiring a test harness or JTAG access or other equipment.

In various of these cases, there are two copies of the firmware, meaning the old firmware can be immediately accessed, or — pending successful completion — a second copy of working firmware can be generated.

In one case, a system had its firmware mostly in ROM, and had NVRAM that could hot-patch routines via an NVRAM-based vector table, and with space for replacement routines in the NVRAM. This meant that the box would always boot, and bad vectors could be detected by checksum, and firmware bugs could still be patched up to the limit of the available NVRAM.

Put another way, we know how to avoid this mess. It just costs some time and effort and money, and that can get this capability cut.

This stuff is not rocket science.

simias · on Jan 30, 2013

If you can brick the hardware with a bogus driver it looks more to me like a hardware bug rather than a software one.

Why exactly do you blame C's memory model for this issue? The article is not overly specific about the exact specifics of the problem. Do you have a more technical source? I'd be curious to learn how exactly this driver can completely brick the mobo.

keeperofdakeys · on Jan 30, 2013

I don't have any technical sources for this bug, but I'm quite certain it would be a bug in either the UEFI firmware or the kernel driver. Firmware can be quite complicated pieces of software, and if you accidentally change the wrong memory values, you might trigger a firmware update or change configurations of the hardware present in the firmware. All this stuff can be changed because it needs to be configurable, and doing it in software is much cheaper.

There will usually be some kind of firmware reset in the hardware, but it might involve processes that only the manufacturer can do.

I would say C's memory model can contribute towards this because if you have a buffer overflow, or change the wrong bits of memory, you might accidentally change the code, or execute data. Executing data is why buffer overflows are so serious, and allow malicious code to essentially do anything.

bd_at_rivenhill · on Jan 30, 2013

Seems like the best solution is that there should always be a way to reset the firmware to a hard-coded factory default which represents the state that existed when the machine came off the assembly line. In this way, a bad firmware update can always be rolled back.

EDIT: looks like somebody reporting the bug did this the old-fashioned way: "Just to add, on UEFI machines that got bricked like this I removed the battery and disconnected the CMOS NVRAM battery and this restored the machine to the factory default and fixed the issue for me."

donrhummy · on Feb 1, 2013

It appears he didn't do that with a Samsung laptop but other machines in the past and someone replied saying it didn't fix it for them: https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557/comme...

entropy_ · on Jan 30, 2013

With various laptops I've owned over the years, this almost always fixes the problem when the laptop refuses to boot(except when there's actual hardware failure). It's usually pretty straightforward to do too.

hexonexxon · on Jan 30, 2013

Or just learn how to do it properly

Getting Code Right And Secure, The OpenBSD Way http://www.bsdcan.org/2010/schedule/events/172.en.html

cube13 · on Jan 30, 2013

It's worth pointing out that Samsung does have a history of issues with "secure" bootloaders.

https://plus.google.com/u/0/101093310520661581786/posts/Drq9...

stephengillie · on Jan 30, 2013

Are you describing a Denial of Service attack? One where the attacker would get the desktop or server to try to install and run Arch Linux, then let it brick itself? If a critical Windows server or workstation was hit, this could cause a project to grind to a halt.

tarabukka · on Jan 30, 2013

I think finding a kernel exploit in the server's OS that let you run the firmware-bricking code would probably be more viable than installing a new operating system.

Breakthrough · on Jan 30, 2013

I was about to pick one of these laptops up late last year until I happened to stumble across that very bug report myself: https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557?comme... Everyone seems to have it working fine with UEFI disabled, and several people have noted the vendor or Samsung has provided a replacement in the cases where it was bricked.

At least in their darkest hours, Linux users can still put on a smile:

"[...] they changed the motherboard and it's working again now. I won't try to install Ubuntu again though. The whole process took about 2 weeks."

speeder · on Jan 30, 2013

I wish UEFI was designed more with the user in mind and less politics and corporate decisions.

To start, it should not possible to brick a hardware in any way...

Interestingly, the UEFI looks to me sufficiently complex to control the device graphics and input before the OS boots, I wonder if that can be used to abuse the system and do again some "on the metal" coding for high performance stuff (ie: games... and scientific things).

I think maybe that cannot be done because probably no UEFI comes with drivers for video hardware acceleration.

RexRollman · on Jan 30, 2013

Say what you will about the old BIOS systems but at least it worked and everyone understood it. EFI/uEFI seems to be a big clusterfuck.

daeken · on Jan 30, 2013

I am completely and utterly baffled by this statement. Do you realize how many man-years have been spent working around BIOS bugs over the years? Get any kernel developer a drink, then just say the word 'BIOS'; your opinion of UEFI will change pretty rapidly.

I've dealt with kernel dev for BIOS systems, CSM development for UEFI, etc etc. I'll stick with UEFI, even if it does still have some growing pains.

Locke1689 · on Jan 30, 2013

daeken is absolutely right. BIOS is a fucking nightmare.

Real mode (or the lack of it in Intel VT-x) still wakes me up in a cold sweat at night.

nathell · on Jan 30, 2013

> Do you realize how many man-years have been spent working around BIOS bugs over the years? Get any kernel developer a drink, then just say the word 'BIOS'; your opinion of UEFI will change pretty rapidly.

Isn't it the case that once the kernel is fully booted, up and running, it bypasses BIOS entirely and talks directly to the hardware? Have all those bugs you mention been related to the booting process itself (constituting a relatively tiny part of the kernel)?

gizmo686 · on Jan 30, 2013

I do not know to what extent the OS bypasses the BIOS, but it is not completly. If you look in the linux kernel config, you will see an option to control how much RAM is reserved for BIOS. Also, on (many?) Dells, Fn+Shift+15324 followed by Fn+r brings up BIOS thermal controls [1]. I have verified this on an Inspiron 1420, in Windows 7 and Ubuntu 12.10 (kernel 3.5.0-21-generic).

[1]http://ubuntuforums.org/showthread.php?t=1684657

xmodem · on Jan 30, 2013

That's because it's a lot harder to fuck up a few kilobytes of assembly than it is to fuck up several MB of bootloaders, device drivers, filesystem drivers, code signature verification, etc, etc, all of which are required for a complete EFI implementation.

What we are seeing here is solid evidence of the fact that software is hard - and a lot of companies just don't have the chops to do a good job. Somehow I doubt the Surface Pro would have bugs anything like this, for example.

ygra · on Jan 30, 2013

I guess Surface Pro will be locked down via Secure Boot and won't ever boot Linux unless jailbroken. Whether that's inherently better is probably another question.

demetris · on Jan 30, 2013

According to Microsoft’s own “Windows Hardware Certification Requirements for Client and Server Systems”,[0] on non-ARM systems it is obligatory to offer the ability to disable Secure Boot:

QUOTATION START

18. Mandatory. Enable/Disable Secure Boot. On non-ARM systems, it is required to implement the ability to disable Secure Boot via firmware setup. A physically present user must be allowed to disable Secure Boot via firmware setup without possession of PKpriv. A Windows Server may also disable Secure Boot remotely using a strongly authenticated (preferably public-key based) out-of-band management connection, such as to a baseboard management controller or service processor. Programmatic disabling of Secure Boot either during Boot Services or after exiting EFI Boot Services MUST NOT be possible. Disabling Secure Boot must not be possible on ARM systems.

QUOTATION END

0. http://msdn.microsoft.com/en-US/library/windows/hardware/jj1...

beerscout · on Jan 30, 2013

This requirement is only for other vendors to get the certification from Microsoft. Microsoft itself does not need to conform to it, so it is still possible that their products will be locked.

RexRollman · on Jan 30, 2013

It's sad that everybody wants to lock everything down. It's like no one learned anything from the success of the PC.

tinco · on Jan 30, 2013

The PC was only a success for the market as a whole. The setters of the standards did not profit from it at all and even went (close to) bankruptcy because of it.

Contrast this closed hw architectures like Nintendo and Apple produce. No consumer freedom but incredible profit margins.

Note that modern macs are in fact PC's with just minor modifications so they profit from the PC economies of scale while still locking their customers into their hardware platform.

davidp · on Jan 30, 2013

What if lockdown worked in such a way that you could lock it down to only run your Linux kernels? E.g. load up your own certificate in the TPM and use that for signing when you build.

In that case it would be a security feature -- another line of defense against bootloader malware and/or adversaries in physical possession of your machine.

(I don't know how technically feasible that is; I know Canonical and others are looking at having their own cert so at least their unmodified kernels can run, but I don't know the mechanism for how that interacts with already-released UEFI machines.)

The point is that technologies like this are a double-edged sword, not evil in themselves. A similar argument is made by Linus himself for sticking with GPL v2 instead of moving to GPL v3, which outlaws certain DRM-related uses; he's more interested in providing a functioning mechanism, and leaving the policy-setting to others.

aw3c2 · on Jan 30, 2013

sadly the success of locked down.apple devices stands orthogonal to it.

rbanffy · on Jan 30, 2013

The PC was a success mostly for Microsoft. With the commoditization of hardware around a single software option, PC makers were squeezed and their margins are exceedingly thin. It's the "thin slice of a larger pie" metaphor. It's just that Microsoft has the whole OS pie, while every hardware maker has a vanishingly thin slice of it.

xmodem · on Jan 30, 2013

Quite likely it will be locked down. Even if it's not, however, I don't think Microsoft's EFI implementation will be so crap that drivers can brick the device.

meaty · on Jan 30, 2013

Hmm. Almost right, until they introduced ACPI which in itself is much more of a clusterfuck because most of the vendors actually do it all wrong.

UnoriginalGuy · on Jan 30, 2013

Everyone understood BIOS? Really? I thought one of the issues with BIOS that almost nobody understood how it actually worked any more. Large chunks of it are tens of years old and the people who created them have moved on...

josteink · on Jan 30, 2013

The biggest problem with UEFI was that someone decided to mess it up and complicate it by adding DRM and signing-requirements.

It's the result of computing following designs mandated by the MPAA/RIAA. Who on earth thought that could turn out OK?

dmm · on Jan 30, 2013

UEFI has _NOTHING_ to do with DRM.

Locke1689 · on Jan 30, 2013

None of this is accurate.

guilloche · on Jan 30, 2013

Bios is outdated, but UEFI is even worse and became such a mess. It did not fix BIOS's issues and is unnecessary complicated. The efforts are regretfully wasted on UI.

keeperofdakeys · on Jan 30, 2013

Sorry, but the BIOS had many issues, and UEFI provided a solution to them, even if it wasn't an inherently good system.

1.The BIOS isn't portable, you can compile UEFI for any platform by porting a small base module. Everything uses this module, so the compiler will take care of the rest.

2. BIOS was a heap of 16bit assembly code, with a small memory space. It was quite hard to add any kind of complex functionality.

3. You couldn't use a boot volume greater than 2TB due to MBR. A new one wasn't added because of number 2.

4. UEFI is more like a micro kernel then a generic BIOS, and provides the functionality for vendors to write 'better' interfaces. Here the interfaces aren't actually provided in the official UEFI sample implementation, and most can be arguably called worse then an average BIOS interface. However, you have to acknowledge the capability is there.

5. It added a framework for kernel verification (unlike what some people think, it only verifies the UEFI firmware and the kernel/boot loader it loads directly). The direction that Microsoft is taking it is quite unfortunate, but it's actually a good feature.

6. It has a limited capability as a boot loader, allowing multiple operating systems to be started directly.

These are all features that aren't present in the BIOS, that UEFI has fixed. Unfortunately the current implementation has many bugs, and many features are arguably implemented badly. However is was created to fix real problems, and has been undeniably successful at that. At a linux conference there was a talk about all this, from someone quite involved with the UEFI creation process. If you are interested, the recording will probably be released in a few weeks. There is also a talk from the previous year by Matthew Garrett (the guy who does UEFI stuff in Linux), talking about all the bugs present in UEFI, which is an entertaining watch, https://www.youtube.com/watch?v=V2aq5M3Q76U.

RexRollman · on Jan 30, 2013

It sounds like UEFI is over-engineered. All I want is for something to initialize the hardware and hand over for booting.

guilloche · on Jan 31, 2013

"over-engineered" is the right word for UEFI.

protomyth · on Jan 30, 2013

I do so miss OpenFirmware (forth and all).

noonespecial · on Jan 30, 2013

Looks like a roadmap for some truly nasty malware. This should not be possible in any circumstance. It's like having a history eraser button on your spaceship.

albertzeyer · on Jan 30, 2013

Just disabling the driver doesn't seem like the ultimate solution... Have they even identified the problem itself? Why aren't they just fixing it? Is it on the hardware site or in the driver software? They should fix that.

telent · on Jan 30, 2013

As stories go, this one would be a lot better if it actually linked to some technical information about the bug. Does anyone have a reference for what the samsung_laptop driver's doing that is so bad? The kernel bugzilla link (#47121) that someone has speculated is related is a boot panic, not a complete bricking, so while it _may_ be the same thing ...

https://bugzilla.kernel.org/buglist.cgi?quicksearch=samsung-... # but nothing else in there looks to be any closer either

mixmastamyk · on Jan 30, 2013

U/EFI is an interesting subject. Yes, of course better than the primitive bios it replaced. Interesting in that you can can play with a new OS.

But, Intel went out and invented another operating system? Why, when embedded Linux (&coreboot) already existed? with a full stack of software? Would be nice to have a web browser available from the firmware for downloading drivers, etc. I ran Linux on an Itanium at work circa 2003 so it was definitely doable.

Is the answer that MS wouldn't allow it? Or is there another reason?

nogoodnik · on Jan 31, 2013

In a few weeks, "Don't buy Samsung laptops" is what I will remember from this.

nextparadigms · on Jan 30, 2013

Why can't they just use Google's method for the Chromebooks? I think it allows to easily install any other OS as soon as you physically disable the bootloader with a switch on your laptop.

wmf · on Jan 30, 2013

Because they're not as smart as Google. Most hardware vendors budget as little as possible for firmware, and the result is predictable.

spyder · on Jan 30, 2013

Just fixing the driver doesn't seems a safe solution, because then malwares still can do the same to brick the hardware, isn't? Sure nowadays most malwares don't want to brick your hardware because ads are more profitable but it can still happen. It's understandable that new technologies have more serious bugs than decades old ones and hopefully it will be fixed in UEFI soon, but this shows why you have too be careful with new technologies if you are an early adpoter.