'x86 virtualization is about basically placing another nearly full
kernel, full of new bugs, on top of a nasty x86 architecture which
barely has correct page protection. Then running your operating
system on the other side of this brand new pile of shit.
You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can't write operating systems or applications without security holes, can then turn around and suddenly write virtualization layers without security holes.'
This is just FUD and in the usual insulting De Raadt style of communicating.
'barely has correct page protection' is just a way of saying 'has correct page protection, but I want to be really snotty about it'. So, highlighting a non-problem.
No-one is claiming that virtualization makes a system magically completely secure, but do people actually believe that it makes it less secure? (Compared to, running the same software on the same hardware using a single OS). I don't think so.
Not really. He's right. The memory, paging and protection model for X86 is fugly at best. It's very easy to hang yourself as demonstrated.
You can be as insulting as you like if you compare X86 (and X86-64) to SPARC and POWER which is what I assume he is doing here considering he provides operating systems for multiple architectures. We're talking about an architecture that started with the 8086 and despite changes to the underlying microcode architecture, the front end ISA and system interface is still plagued with poorly designed extensions hacked on.
Regarding virtualization, any sharing of resources, particularly at a hardware level is an attack vector if not implemented correctly. Whether or not it is implemented correctly or is exploitable is merely a matter of time and effort as demonstrated here. That is unless mathematically verified, which it isn't and based on the evolved x86 architecture probably isn't possible so it can't be more secure and is unlikely to be as secure. That leaves only less secure.
I implement high-performance software systems in C++ as my day job. The software has to compile and run on Linux, Solaris, and AIX. The same code is 2x slower on AIX (Power) and 3x-5x slower on Solaris (Sparc) than on Linux (x86). So say whatever you want about theoretical differences in architectures, but in the real world Sparc and Power systems are absolutely not competitive, both on price (absolute $$, and per CPU) and performance (per CPU - they do have more cpu cores, usually).
That's just a circular way of saying "x86 is more popular, therefore better." Which doesn't address the person aboves' point that x86 is inferior in terms of its design.
Of course x86 is going to be faster per dollar spent. One is mass market (x86-64) and the other two are hugely niche (Sparc and Power). Plus the Linux kernel has by far the most human-hours spent on its development relative to every other operating system in the world.
There's also a reason why some of x86's market share has been eaten up by ARM. Moving from x86 to ARM was hugely expensive by all measures, but it was worthwhile because x86 was so wasteful.
It's not just "x86 is more popular, therefore better." It's that the performance of x86 was better than SPARC or Power. Regardless of the cost of the chip, performance is what is really important here. In some instances, performance per watt is more important, but either way... it's performance that's key, not market forces driving cost savings.
I haven't had much experience with SPARC, but I've done some work on Power systems (long ago). Back then (10-ish years ago), Power chips were more powerful than their x86 contemporaries. But at some point, that relationship switched.
However, I wonder how much of this is the chip, and how much is the tooling. Its been awhile since I've needed to think about C/C++ compiling, but from what I remember, the Intel compiler produced (slightly) faster binaries than gcc. Now this is where popularity could prove to be decisive... if the compiler that the OP uses works for x86, SPARC, and Power, how much do you suspect each of those architectures has been optimized? Even if the non-x86 chip itself is capable of running faster than x86, if the toolchain isn't similarly optimized, they could end up having worse performance.
It might well be fugly but it works. That's the key point.
I'm sure Intel (or anyone else), if they could develop x86 again from scratch, and with the benefit of hindsight, would create a much nicer mechanism. But that's just speculation and wishful thinking.
Haven't they? I thought x86 was now basically just a legacy compatibility layer on top of significantly more streamlined and optimized RISC-like operations.
(Compared to, running the same software on the same hardware using a single OS). I don't think so.
If you had been running an OpenBSD instance on hardware as a single OS, you would not be vulnerable to having your system's memory read by this hypervisor bug. So... yes.
I'll take the world where we have the odd hypervisor vulnerability over the world where we have to increase power output by multiple orders of magnitudes and pave the planet with datacentres to run every service as a single-instance, non-virtualised server.
I guess Theo prefers runaway global warming to the odd data breach. Which makes him the idiot, to use his own language.
Every kernel (not named seL4) has vulnerabilities. The question is not whether OpenBSD would have been vulnerable to this specific issue, but whether Xen generally has fewer bugs than the OpenBSD kernel - or, for that matter, the Linux kernel, since the grandparent mentioned a bunch of Linux sandboxing technologies.
edit: To be absolutely clear, per the grandparent, I'm assuming an environment where unrelated people are running software on the same hardware, either using their own kernel under Xen or user processes under jails.
And if any thinks that the awesome perfection of OpenBSD's authors outweighs the vastly smaller attack surface of something like Xen, I think that they're deluding themselves.
Flipside is that with hardware virtualization, a lot of that behavior is protected in hardware which, for whatever reason, seems to be extremely secure in practice. You don't see a lot of erratum-based exploits... the recent SYSRET bug was severe but only counts somewhat ("instruction does something different than what it does on another vendor's processors, and is technically documented to do so" is bad, but it's not like there was some sequence of instructions that would just get you arbitrary memory access without interacting with the hypervisor).
The attack surface of incorrect use of the admittedly complicated x86 privilege transition and protection mechanisms is shared in its entirety by all x86 operating systems (except, to a limited extent, by those that turn off some of these mechanisms, which AFAIK none do).
You have an incoherent mental threat model of this. Those two systems are functionally identical to the end user to which they are being sold, but one is vulnerable (in this specific way) to the actions of other customers with the same service provider.
Usecases with multiple users on one piece of hardware make no sense? Is there a reason (besides the previous question) you are ignoring the ability to use containers as an alternative to virtualization for all of the perks they provide?
I'm not ignoring containers at all. Linux, FreeBSD, and OpenBSD all have some form of user mode containers. All POSIX systems also have user ids. Linux containers are probably the most functional (no citation here -- I know lots about Linux containers and very little about FreeBSD jails) and are also probably the least secure, because of the aforementioned functionality and because they're rather new. (On the other hand, a really well designed Linux seccomp sandbox is probably the most secure option of all.)
Linux on Xen also allows you to have multiple Linuxes on the same Xen machine. This is the most functional of all and probably also the most secure of all, XSA-108 notwithstanding.
(Also, I find this all rather odd. If you want to compare Linux+Xen to OpenBSD, XSA-105 and XSA-106 much bigger deals. They allow code in a Linux container or other sandbox to break out by exploiting a Xen bug to take control of their Linux host.)
In which AMD defines a new instruction, and Intel copies it with a subtle difference which trips up everyone (AMD's way is better IMO, and it came first).
He's just very dramatically pointing out that, oh, there are some changes we now need to account for, and Intel didn't tell us poor open source developers about them, and that's (supposedly) totally unreasonable. Besides the fact that Intel does not owe De Raadt anything (other software makers pay a hefty sum to be partners, while OpenBSD developers insult anyone who doesn't give them free shit), these bugs are a given in any production process. I don't care if you're Ikea or Exxon or Apple, you don't adopt new shit into your product and not expect shit to break. So his outrage is both presumptuous and facile.
No, you're moving the goalposts. DeRaadt pointed out that x86 "barely" has a working paging system. A commenter on HN said there was no basis for that statement, that he was picking on something that wasn't broken, that it was just FUD. It was not FUD. That claim has been refuted.
I wasn't addressing the paging system issue, but that was definitely FUD too. FUD does not have to be disinformation per se. Its main property is the spread of a generally negative viewpoint that is intended to persuade the recipient to side with the negative actor.
Even if the paging system is broken, that's no reason to simply stop using VMs on it, or to say it's impossible to have a secure VM on a system with broken paging. It's perfectly possible to have a VM on a broken-paging machine that's more secure than a working-paging machine's OS, with or without a VM.
De Raadt was not trying to make a rational argument about the validity of VMs on faulty hardware. He was literally saying you are stupid if you put a VM on x86 and expect it to be secure. Which is a stupid thing to say without knowing anything about the OSes, or what the alternative might be, either platform or OS-wise, to say nothing of hardening.
De Raadt has a bone to pick with Intel and specifically x86-based machines, and is simply interested in convincing people not to use it by insinuating you're innately not capable of doing secure computing on it. Which is basically untrue. That's why it's FUD.
One thing does not follow from the other. Core 2 has had paging bugs, ergo x86 has a barely working paging system => Pentium had the FDIV bug, ergo x86 can't be trusted with arithmetic.
For the claim to be properly refuted, the claimer would have to show systemic problems in x86 paging. I believe such a claim can be made, but it simply wasn't yet.
You're litigating a different claim than I am. The claim I'm refuting is:
"barely has correct page protection' is just a way of saying 'has correct page protection, but I want to be really snotty about it
I don't have to demonstrate systemic flaws in x86 paging to refute that.
I don't think paging system security is a good basis on which to choose processor architectures. I do sort of agree with Theo's point about virtualization, which really is a petri dish for terrible vulnerabilities. But either way: my point is just that Theo isn't just making things up here.
Inherently, adding more systems creates more risk vectors. When an application is installed on an operating system, both the OS and application now have to be protected. An example would be Flash ontop of an OS. You have to defend, patch, and architect based on whether your systems have Flash or not.
With virtualization you have the hypervisor that has applications running along with it (ie: ssh, a cli, syslog, bash etc) and then you install a guest in a VM on top of the hypervisor. The OS on the VM is another vector which has applications on it (DB, web, ftp, etc).
If I have a bare metal server with just an OS installed on it and its applications on top of that, I only have to worry about that set of OS and applications and their associated risk vectors.
If I have a bare metal server with a hypervisor and then the above OS and set of applications, I have increased the number of risk vectors by however many applications are running along with the hypervisor.
“but do people actually believe that it makes it less secure?”
Just bringing a hypervisor into an environment does not of itself immediately make it less secure. I agree with you, I don’t think it makes it less secure. It does increase the risk of the environment and appropriate architecture and action must be taken to prevent your statement from being true. A large number of environments do not architect and manage properly.
Another very realistic, and happening today, example:
Bare metal server with Windows OS installed.
Bare metal server with a hypervisor which just happens to have bash on it (or is susceptible to this memory issue). The same Windows OS is installed as a VM.
In the second instance with the hypervisor I would have, indirectly and out of my immediate control, made my environment less secure.
An increase in complexity or increase in components will increase the risk of an environment.
"No-one is claiming that virtualization makes a system magically completely secure, but do people actually believe that it makes it less secure? (Compared to, running the same software on the same hardware using a single OS). I don't think so."
However, common virtualized platforms such as EC2 encourage you to run your whole server setup in an environment where a (possibly malicious) neighbour could be running arbitrary x86 code in an instance on the same physical machine. This is not an attack vector that is remotely possible in the traditional, non-virtualized setup.
They aren't encouraging you, it is being demanded (by you, by everybody). This is how cheap, reliable, redundant computing is offered, and it will always be cheaper than paying for and maintaining entire physical machines.
There is a general assumption amongst virtualised environment administrators that guests are securely separated. And yes, more code to run means more vulnerabilities.
From the perspective of a public cloud host etc., it's not more code to run; any fault of the guest kernel is not their problem, so they likely have less code to run compared to jail-based solutions that run a full Unix kernel in ring 0.
Thanks to Jan Beulich, the SUSE Xen maintainer in Germany who is credited with finding this x86 HVM vulnerability.
It would be helpful if errata announcements included documentation of the static analysis tools, code review process or automated testing techniques which identified the weakness, along with a postmortem of previous audits of relevant code paths.
What made it possible for this issue to be identified now, when the issue escaped previous analysis, audits and tests? Such process improvement knowledge is possibly more valuable to the worldwide technical community than any point fix.
Heartbleed was discovered by an external party, but this issue which affects the data of millions of users was found by the originating open-source project. Kudos to Jan for finding this cross-domain escalation.
I haven't checked after the reboot, but I hope the MSRs I'm using can still be accessed: IA32_MPERF and IA32_APERF (to calculate real CPU MHz); IA32_THERM_STATUS and MSR_TEMPERATURE_TARGET (to calculate CPU temperatures); and MSR_TURBO_RATIO_LIMIT and MSR_TURBO_RATIO_LIMIT1 (to see turbo ratios).
What is the "..." operator? I have never seen that before. I can't find any references to it. Is that a macro specific to this project? [I checked the post above, but it doesn't match this source code exactly (and doesn't have ... as an operator).]
Can somebody please confirm that it is impossible to boot a HVM system on Linode? The hypervisor on my Linode host certainly supports HVM (according to cat /sys/hypervisor/properties/capabilities). The host Xen is 4.1 and therefore vulnerable in the case that another user could be running HV guests.
Huh, I have a tiny machine at one of those smaller places, and they are on the list. Good to know the smaller players can build up a reputation for embargoing, too.
Interesting that over half of the companies on the list were added within the last week, if the dates in the page changelog accurately reflect when they were added. If so, perhaps they all suddenly bundled in so they could find out what the embargoed vuln. was.
"""
Yesterday we started notifying some of our customers of a timely security and operational update we need to perform on a small percentage (less than 10%) of our EC2 fleet globally.
AWS customers know that security and operational excellence are our top two priorities. These updates must be completed by October 1st before the issue is made public as part of an upcoming Xen Security Announcement (XSA). Following security best practices, the details of this update are embargoed until then. The issue in that notice affects many Xen environments, and is not specific to AWS.
"""
I wonder if Amazon or someone will take the time to make a ksplice-like system for Xen so that future security upgrades probably won't have to go through such disruptive reboot events.
(Or, for that matter, whether they considered making an ad-hoc machine code patch - based on the source patch, it looks like it would probably be doable just by changing a few bytes. I guess it's a bit risky...)
Sigh. That's obnoxious - yet another example of software patents confusing proving that an idea is commercially valuable with inventing it in the first place. Anyone with the requisite skills in reverse engineering, compilers, etc. could have told you that hot patching functions in memory is possible and would take at most a few minutes to notice that this may be unsafe if some suspended thread is sitting in a function prolog. Yet "identifying a portion of executable code to be updated in a running computer program; and determining whether it is safe to modify the executable code of the running computer program without having to restart the running computer program" (an actual claim, not the abstract or title quotes that people tend to misconstrue) is now locked out for the next decade or so.
I had "dedicated" AWS instances that were rebooted. A dedicated instance means there is only one guest per box, right? So I'm curious why those had to be rebooted if there is no network-facing vector to this vulnerability. I guess because we could have read from the hypervisor's memory?
This seemingly looks like a serious problem, but if we think a little
bit about the practical impact the conclusion might be quite different.
First, there are really no secrets or keys in the hypervisor memory that
might make a good target for an exploit here. Xen hypervisor does not do
encryption, neither it deals with any storage subsystems. Also there is
no explicit guest memory content intermixed with the hypervisor code and
data.
But one place to see pieces of potentially sensitive data are the Xen
internal structures where the guest _registers_ are stored whenever the
guest execution is interrupted (e.g. because of a trap). These registers
might contain e.g. (parts of) keys or other secrets, if the guest was
executing some sensitive crypto operation just before it got interrupted.
The vulnerability allows to read only a few kB of the hypervisor memory,
with only relative addressing from the emulated APIC registers page,
whose address is not known to the attacker. Still, for the exactly
same systems (same binaries running, same ACPI tables, etc) it's likely
that the attacker would be able to guess the address of the APIC page.
However, it is much less probable she would be able to predict what Xen
structures are located in the adjacent memory. Much less the attacker
would be able to control what structure are located there, as there
doesn't seem to be many ways of how a malicious HVM might be
significantly affecting the layout of the hypervisor heap (e.g. force
arch_vcpu structures of interesting domains to appear nearby).
Nevertheless, it might happen, by pure coincidence, that an arch_vcpu
structure with a content of an interesting VM will just happen to be
located adjacently to the emulated APIC page.
In that case, the next problem for the attacker would be lack of control
and knowledge over the target VM execution: even if the attacker were
somehow lucky to find the other VM's register-holding-structure adjacent
to the APIC page, it would still be unclear what the target VM was
executing at the time it was suspended and so, whether the registers
stored in the structure are worthwhile or not.
It is thinkable that the attacker might attempt to use some form of a
heuristic, such as e.g. "if RIP == X, then RAX likely contains (parts
of) the important key", hoping that this specific RIP would signify a
specific interesting instruction (e.g. part of some crypto library)
being executed while the VM was interrupted, and so the key is to be
found in one of the registers.
But the attacker's memory reading exploit doesn't offer a comfort of
synchronization, so even though the attacker might be so extremely lucky
as to find out that
*(apic_page + guessed_offset_to_rip) == X
(the
attacker here assumes the 'guessed_offset_to_rip' is the distance
between the APIC page and the address where RIP is stored in the
presumable arch_vcpu structure, that presumably is located adjacently),
still there is no guarantees that the next read to
*(apic_page + guest_offset_to_rax)
will return the content of RAX from the same moment
that RIP was snapshot (and which the attacker considered interesting).
Arguably the attacker might try to fire up the attack continuously, thus
increasing chances of success. Assuming this won't cause system to crash
due to accessing non-mapped memory, this might sound like a somehow good
strategy.
However, in case of a desktop system like Qubes OS, the attacker has
very limited control over other domains. Unlike as in case of attacking
a VM playing a role of a Web server for instance, the attacker probably
won't be able to force the target VMs to do lots of repeated crypto
operations, neither choose moments when the target VM traps.
It seems like exploiting this bug in an IaaS scenario might be more
practical, though, as the attacker also has some control of domain
creation/termination, so can affect Xen heap to some extent. But on a
system like Qubes OS, it seems unlikely.
So, are we doomed? We likely are, but probably not because of this bug.
Yep, it seems only to change a small memory range. I'm guessing this means this isn't highly exploitable, but contains at least some risk of leaking private information. Can anyone with more experience with this specific APIC stuff comment? Could you get different data each time or only the same small range?
> While the write path change appears to be purely cosmetic (but still gets done here for consistency), the read side mistake permitted accesses beyond the virtual APIC page.
'x86 virtualization is about basically placing another nearly full kernel, full of new bugs, on top of a nasty x86 architecture which barely has correct page protection. Then running your operating system on the other side of this brand new pile of shit.
You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can't write operating systems or applications without security holes, can then turn around and suddenly write virtualization layers without security holes.'
Source: http://marc.info/?l=openbsd-misc&m=119318909016582
Personally, I have hope for things like cgroups/jails and MAC/SELinux over virtualization.