What I find odd is that after the initial Spectre attacks, there have been a long string of these attacks discovered by outside researchers and then patched by the chipmakers.
In principle it seems like the chipmakers should hold all the cards when it comes to discovery: they are experts in speculative execution, know exactly how their chips work and have massive existing validation suites, simulators and internal machine-readable specifications for the low-level operations of these chips.
Outside researches need to reverse all this by probing a black box (plus a few much-worse-than-insider sources like patents).
Yet years after the initial disclosures it's still random individuals or groups who are discovering these? Perhaps pre-Spectre this attack vector wasn't even considered, but once the general mechanism was obvious did the chip-makers not simply set their biggest brains down and say "go through this with a fine-toothed comb looking for other Spectre attacks"?
Maybe they did and are well aware of all these attacks but to save face and performance hits they simply hold on to them hoping nobody makes them public?
I considered this, but we have pretty good evidence that the chipmakers have not been busily secretly patching Spectre attacks:
1) Microcode updates are visible and Spectre fixes are hard to hide: most have performance impacts and most require coordination from the kernel to enable or make effective (which are visible for Linux). There have been a large number of microcode changes tied to published attacks and corresponding fixes, but no corresponding "mystery" updates and hidden kernel fixes to my knowledge which have a similar shape to Spectre fixes.
It's possible they could wait to try to bundle these fixes into a microcode update that arrives for another reason, but the performance impacts and kernel-side changes are harder to hide.
2) If this were the case, we'd expect independent researches to be at least in part re-discovering these attacks, rather than finding completely new ones. This would lead to a case where an attack was already resolved in a release microcode version. To my knowledge this hasn't really happened.
That's true, but it leads the odd assumption that the vendor managed to fix N side-channel attacks before release but 0 thereafter, while random individuals fixed M thereafter over a period of years with N >> M.
This seems to be much less likely than the conclusion that vendors are not in fact fixing many prior to the release and then stopping "cold turkey" after that. Especially since these attacks seem to cross chip versions, in many cases 6+ generations of chips: if vendors had substantial and increasing efforts on new chip versions they'd also be catching issues that applied to old released chips as well. We don't see that happening.
This. Downfall affects CPUs released almost a decade ago. The argument of 'Intel caught a ton of these and downfall is the one that got away' doesn't add up. Surely Intel would find at least one issue during development of a new chip that has the property that it _also_ affects older chips. They can hardware-fix it before ever releasing their new chip, but unless they just decided to ignore the vulnerability in their older chips, they'd have starte work on a patch or at least on contacting OS vendors.
None of which is easy to do without the public at large figuring out what happend.
Conclusion: That is highly unlikely to have ever happened. Therefore, this isn't survivor bias and it really is bizarre that chip vendors (or, at least, Intel) doesn't look for this stuff or at least didn't find this.
FWIW. Spectre/meltdown really caught the hardware development world off guard. Speculative execution was well-trodden ground and we thought it was fine. After these attacks, we had legacy designs that we needed to patch in a hurry, and the performance costs of not speculating were unfathomable. A lot of work went into mitigations like new kinds of barriers. But hardware designs are enormous and there is state lurking absolutely everywhere. It’s just a really hard problem.
dan bernstein had pointed out the risk that speculative execution would lead to side-channel attacks years before, but there's a common pattern where security people point out a risk and then vendors dismiss it as impractical until it's demonstrated in practice
In computer engineering grad school, I mentioned to my PI that there was an upcoming hardware security flaw that I had heard about from … sources… and he quickly guessed it was speculative execution related without knowing anything more. I think people knew it was possibly dangerous, but the performance gains from speculative execution were huge enough nobody in Intel/AMD red-teamed the design basically.
If a bug is not known, most of the incentives to the vendor are to not bother investigating, I suspect.
You could spend arbitrary amounts of time looking for these bugs and find nothing. Simpler and easier to offer a bounty or something and fix it then. If no one publicly finds the bug it doesn't matter to the mfg (and it wouldn't surprise me if there's truth to your supposition that they know about the bug but wait to fix until someone reports it - no public backlash so long as the bug is unknown)
Fixing bugs prior to release seems easy and free, though, especially since many more eyes would also be on the "new" in progress architecture, and proper hardware mitigations that don't cost a lot of performance can be made.
What about the incentive to release "the most secure chips on the market", are you discounting that a bit too much?
Granted that human nature tends to mean these factors don't have a high enough weight, e.g. it's not the safest airplanes that sell the most, it's the cheapest ones that meet the regulations, and the regulations drive safety improvements, for the most part
I guess there's probably some margin in it - if both parties seem about equally vulnerable, there's not much lost. You could expend a lot of effort into security, but the nature of these bugs is still that they are fairly rare, often require pretty significant hurdles, etc. The mfg. could probably spend a lot more money and find a few extra bugs, but who knows if they would have turned into "real" exploits?
Remember that this particular bug isn't actually present on the newest chips either - and 12th/13th gen were shipping before Intel was informed of this bug - so it was fixed eventually, probably incidentally as a result of design changes.
The unknown factor is how much additional money you'd have to invest to gain additional security, given how esoteric many of these bugs are.
Or the problem could be with methodology, and the wrong people are in charge of the right people left, and so the mindset for testing is just wrong.
Also you’re dealing with a company that has been running to stand still for a long time. There’s been a lot of pressure to meet numbers that they simply cannot keep up with. At some point people cheat, unconsciously or consciously, to achieve the impossible.
There's probably thousands if not more. The way I always imagined this working was. agency and company work together to leave gaps in our hardware and software under the ruse of "it must be secure enough for our use." Agencies get a preemptive six months or so to find enough zero days to do what they want. The engineers at the company screaming about the issues are ignored.
Eventually a foreign adversary or domestic hacker finds one that can cause a lot of harm. As soon as they find one a DOD funded student simultaneously discovers it. Alternatively, if documents leak showing how these exploits could happen, same scenario.
Not to say all bugs are known, but I'd imagine a fair deal of them certainly are.
The logic in this comment rubs me the wrong way. You could use the same train of thought to postulate programmers that have made 2 memory safety errors are nefarious instead of simply human.
When billions use something I expect them to find more problems, flaws, and exploits in it than the creator/manufacturer did. The presence of this does nothing to indicate (or refute) any further conclusion about why.
I think the comparison between CPU and software exploits holds at a very high level, but in the case of software the gap between internal and external researches seems lower. Much software is open-source, in which case the play field is almost level and even closed source software is available in assembly which exposes the entire attack surface in a reasonably consumable form.
Software reverse-engineering is a hugely popular, fairly accessible field with good tools. Hardware not so much.
> When billions use something I expect them to find more problems, flaws, and exploits in it than the creator/manufacturer did. The presence of this does nothing to indicate (or refute) any further conclusion about why.
To be very clear none of these errors have been found by billions of random users but by a few interested third parties: many of them working as students with microscopic funding levels and no apparent inside information.
I'm not actually suggesting that the nefarious explanation holds: I'm genuinely curious.
I'm not sure I agree significant gaps in the playing fields are really there. By significant I mean something that explains it should be e.g. 10x harder or something to the point it's supposed to be suspicious how the ratio is indicative of something off. Sure, you don't get to see how they laid out the transistors of the CPU but that's not how these attacks work it's by some oversight in memory handling not that different from software. They analyze how individual assembly instructions behave in certain scenarios vs how they are designed to behave. Compare this to the process of attacking closed source software like Windows, sift through a bunch of assembly in a debugger and see what gets left behind or compared incorrectly, and it's not glaringly different just because hardware is involved. Difficult, sure, but far from anything to suggest it should be uncommon. More importantly, by definition you don't really get to see all of the things they do catch. Maybe they are getting 90% of other related vulnerabilities with the patches but you only hear about the 10% that weren't covered by it because that's the only thing someone is going to publish/get a cve for/make the front page of HN.
The point isn't that billions of users all actively try to exploit software it's that if you have billions of users then even if 0.01% try to then that's still a hell of a lot more external bug finders than internal bug finders.
Yeah nothing against you or genuine curiosity it's just when a comment sets up a series of logic and concludes with a leading question then the conversation is damned to largely revolve around the leading question instead of genuine answers.
I dabble in this space (hardware reverse-engineering) and write software for a living and in my opinion the gaps are huge.
I should disclose have been paid by a chip-maker for a blog post that I wrote which "disclosed" an optimization which could be uses for a side channel attack (though I did not even suggest that aspect) and which was subsequently patched away via a microcode update. The whole process was very surprising to me in that there must have been several people inside the chip-maker who knew about the optimization I described in much deeper detail ... after all they conceived and implemented it.
So by what path does a blog post mentioning it get treated as the disclosure that results it it being removed when they knew about it all along?
> that's not how these attacks work it's by some oversight in memory handling not that different from software.
I think it is very different. Assembly is merely a somewhat less convenient form of the original semantics that embeds all the relevant semantics related to the attack surface since the original source has been "erased". Many analysis tools such as fuzzers operate directly on assembly with little loss in functionality.
These attacks are against completely unspecified aspects of the instruction execution and lean heavily on the actual hardware implementation (almost at the level of "how the transistors are laid out") such as what hidden buffers are used, when they are filled, how they are shared with sibling threads, etc.
In my experience there are very few people interested in these details outside of the vendors themselves and these folks and the ones creating the exploits would fit in a modestly sized lecture hall. The scope has increased a bit lately (see Tavis's fuzzer work) but it was originally a small group with little or no funding.
Do you have a link for the blog post? I'd love to read more about that.
Since we disagree in how big the gap is, and neither of us is going to get a satisfactory answer out of a chip maker any time soon, perhaps a different argument: there are plenty or microcode updates all the time, doing more than fix just security bugs. There are also security bugs like M1racles which have nothing to do with performance incentives. If these can all be explained by a lecture halls worth of people finding things most wouldn't post release of the chip then why does the same situation on security issues require unique explanation?
This reminds me of lock manufactures vs The Lockpicking Lawyer. He is able to pick nearly every damn lock out there, yet they have all the design resources and money to hire people like him to make better locks.
I had a failed startup where we warned our customers they were using suppliers without sufficient qualifications, insurance, etc. Nobody wanted the product, even for free. At a system level, they chose to use cowboy tradespeople and cover the risk with "plausible deniability", because the market wouldn't pay them to only use quality.
It's like a Gresham's law - eventually the lowest quality dominates the market, because that's what maximises profit.
One possibility nobody mentioned yet: the chip vendors don't invest a ton of time looking for them because they don't actually matter that much.
Bear in mind, security researchers are incentivized to find things to build their reputation. It's very often the case that they claim something is a world-shaking security vulnerability when in reality it doesn't matter much for real world attackers. Has anyone ever found a speculation attack in the wild? I think the answer might be no. In which case, why would chip vendors invest tons of money into this? Real customers aren't being hurt by it except in the sense that when an external researcher forces action, they're made to release new microcode that slows things down. Note how all their mitigations for these attacks always have off switches: not something you usually see in security fixes. It's because in many, many cases, these attacks just don't matter. All software running on the same physical core or even the same physical CPU is either running at the same trust level, or sandboxed so heavily it can't mount the attack.
> Has anyone ever found a speculation attack in the wild? I think the answer might be no.
this is known as the Y2K paradox.
The Y2K bug had the potential to be very dangerous, but due to a wide-reaching campaign and tonnes of investment in prevention, when the millennium came, it did so with very few issues (though there were still some); leading many to speculate that the issues were overblown
Yeah, but we have practically usable variants of meltdown and spectre.
the only reason its not useful to deploy them is due to a massive amount of herd immunity (and the fact you consume a lot of CPU when trying the attack making it clear - bad combo for an attacker)
> Note how all their mitigations for these attacks always have off switches: not something you usually see in security fixes. It's because in many, many cases, these attacks just don't matter.
They have off switches because they can have severe performance costs, which most security fixes don't have.
> All software running on the same physical core or even the same physical CPU is either running at the same trust level, or sandboxed so heavily it can't mount the attack.
Former is simply not true. For example, EC2 avoids mixing tenants on the same core, but even the AWS serverless stuff doesn't do that, for cost reasons. For most end user computers, this is blatantly not true.
Operating system level sandboxing is largely irrelevant if the attacker can simply read secrets from other processes running on the same core at the CPU level. Most processes will have a way to smuggle the stolen goods out.
By "sandboxed so heavily" I meant browsers, which don't allow shared memory multi-threading, tight control over CPU instructions or high resolution timers, so it's very hard to mount specex attacks there. I've seen claims it can be done but very few demos, and the only demo I remember trying didn't actually work.
By "running on the same physical core" I meant simultaneously. You can usually wipe uarch state when switching between tenants pretty well. If AWS aren't doing this then that's something for them to solve but I'm pretty sure major cloud users don't have to worry about these attacks as so many only affect hyperthreads that are running concurrently and it's not cost prohibitive to avoid mixing tenants that way.
Maybe they're simply victims of Kernighan's Law of Debugging: "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"
There is no doubt that Intel make chips "as clever as they can". Hence, by definition, they can't fully debug them.
As a CPU engineer, I can say that spectre highlighted channels of information escape that weren't previously considered as vulnerable. That's why it kicked off a new batch of exploits. There was a new idea at the heart of it and others built on that idea.
It's also important to say that these are not bugs. The design is behaving as intentioned. That the performance differs based off the previous code that the CPU has executed was understood and deameed acceptable because the cost of the alternative was considered too high (either in power, performance or area) than the alternative. That's what it means to be an engineer. You weigh up alternatives and make a choice.
In this case, CPUs became fast enough that the fractional part of a bit per iteration became high enough bandwidth to be exploitable, but it needed someone to demonstrate it for it to be understood in the industry. That changed the engineering decision.
The problem is that nobody wants to admit that the old, stodgy mainframe guys were right 30 years ago and that sharing anything always results in an exfiltration surface.
Nobody wants to be the first to take proper steps because they have to either:
1) Partition hardware properly so that users are genuinely isolated. This costs silicon area that nobody wants to pay for.
2) Stop all the speculative bullshit. This throws performance into rewind and will put chip performance back a decade or two that nobody wants to pay for.
Until people are willing to value security enough to put real money behind it, this will continue ad nauseam.
I'd add that the status quo has done pretty well, and many of these exploits are fixed. It's also worth noting that a lot of the exploits in question may be known, but the people working on them couldn't theorize a practical exploit. How many web browser sandbox breaches have there been over the years? Far less than the CPU exploits in the past several years. The latter can have a much bigger impact though.
The biggest risk target seems to be shared servers, and you often don't know who you're sharing with, so is it worth trying? It seems to be usually, no... in a specific target, maybe.
No, 1993 is about right. The mainframe guys were screaming their heads off as these dumbass, insecure, non-ECC x86 processors were eating up more and more computing--before that the mainframe guys kind of dismissed x86 as simply toys.
A fundamental problem is that the attack surface is so, so huge. Even if their security researchers are doing blue-sky research on both very small and very broad areas of processor functionality, they're going to miss a lot.
And in line with that and
>Maybe they _did_ and are well aware of all these attacks but to save face and performance hits they simply hold on to them hoping nobody makes them public?
... maybe they have patched a number of issues and just never announced them.
> A fundamental problem is that the attack surface is so, so huge. Even if their security researchers are doing blue-sky research on both very small and very broad areas of processor functionality, they're going to miss a lot.
Sure. If they had patched a bunch of Spectre vulnerabilities and independent researchers had discovered a few more that would be one thing, but as far as I can tell they have patched _zero_ while independent researches have found many and it has been years since the initial attack. Many of these follow very similar patterns and "in what cases is protected data exposed via speculative execution" is something that an architect or engineer could definitely assess.
Generally all these workarounds have a measurable slowdown associated with them. This mitigation apparently has an up to 50% cost. It's unlikely many of them have been silently fixed without people noticing.
It's the same as any product, the product team wants a faster, cheaper product, yesterday. Security and trust is secondary, because if you're lucky enough, that will fall on the next product team.
Beyond that, processors contain billions (or trillions?) of possible outcomes from a set of inputs. Testing for all of these just to verify reliability and stamp out logic bugs is really hard due to the combinatorial explosion. Putting security testing on top just complicates matter further. The best they can probably do is map out potential ways in which their general purpose processors could be used for specific nefarious uses.
The chipmakers don't have an incentive to look too hard for speculation security issues beyond a bit of PR. If they succeed, they lose money and marketshare, while their 'insecure' opponents gain and at most patches later. And in fairness, a lot of these bugs are rather theoretical. Until buyers take these bugs much more seriously, this isn't going to change.
Clouds take these vulns seriously, and have a lot to lose, and have deep wallets. I'd be surprised if this topic didn't come up when large purchases are discussed.
They have workarounds. If you prevent multi-tenant from sharing their threads on the same core, that eliminates the most desirable goal of an attacker.
However it does not eliminate the vulnerability within a single tenants own threads.
You also have to think about all the web shops that are out there that are just running proxmox or a cloud reseller who reintroduces the vulnerability in their multicore VPS setup
Big clouds have the exact same incentive issue - if a customer is really paranoid, the customer can pay the cloud extra and ensure their own exclusive infra. For regular users, the clouds can mitigate a bit at scale (but not care much about this in practice, it's not as important as cheaper faster processors).
>In principle it seems like the chipmakers should hold all the cards when it comes to discovery: they are experts in speculative execution, know exactly how their chips work and have massive existing validation suites, simulators and internal machine-readable specifications for the low-level operations of these chips.
I hope you're not in charge of hiring QA. Bugs are often found by people who AREN'T thinking like the developers whichstart wearing blinders on how they're stuff should work and stop trying stupid things.
I think this sort of excuses the initial blindness to Spectre style attacks in the first place, but once the basic pattern was clear it doesn't excuse the subsequent lack of discovering any of the subsequent issues.
It is as if someone found a bug by examining the assembly language of your process which was caused by unsafe inlined `strcpy` calls (though they could not see the source, so they had no idea strcpy was the problem), and then over a the subsequent 6 years other people slowly found more strcpy issues in your application using brute-force black-box engineering and meanwhile you never just grepped your (closed) source for strcpy uses and audited them or used many of the compiler or runtime mitigations against this issue.
That still seems like a job for QA and you do have a good point. If there's an attack using one technique, there should be an audit to make sure that technique can't be modified to be used in other nasty ways, then have those tests part of the test suite.
I'm in the camp that developers shouldn't be responsible for having a transcendental ability to predict future (example) security holes. If one appears, it's really the QA's job to document it and experiment with other similar vectors of attack. The developers are, you know: developing. Bring them back in when the security issues needs to be fixed, once it's found just how big the problem is.
Shouldn't the functions of "development" and "QA" both reside under the umbrella of the chip-maker though?
In fact, chip-makers famously invest an insane amount of money into "QA" (aka "validation") and many features or lack thereof are often put down to the cost of QA rather than the cost of development.
>Shouldn't the functions of "development" and "QA" both reside under the umbrella of the chip-maker though?
Sure, umbrella, but different teams. What I'm trying to get across is that it's a good thing that the developers are not wearing the QA hat, so that the QA people are thinking in ways that aren't exactly parallel to the developers. You get close to your work, you sometimes forget that you're not look at the larger picture.
Think of it like if I was building a big secure wall in front of my house, but I neglect that it's easy to just walk to the opposite side to get in. Oh, whoops, I was too focused on the solution in front of me.
Possibly. When I worked at a place that designed a "simple" chip that was just a more energy efficient and very parallel version of its FPGA, the fab that was contracted to make it insisted on validating it themselves. It consisted primarily of a lot of copy-paste of the primary logic to build as many paths as would fit on the die. They described it as "unusually dense" and, I heard they later said that they'd never seen a design that dense. The validation process was partly described as people manually driving a car through a 3D model, making sure there were no unexpected junctions or other divergences. This process took months longer than their initial estimate, allegedly due to the density.
I guess you could say this was "under the umbrella of the chip-maker", though we had little say in it aside from pressing them for progress as our final product's shipdates came and left. When we finally got the first samples, power consumption was, I think, an order of magnitude higher than expected. Our lead engineer struggled to get it down without going to a smaller process that we could barely afford (and given the delays already, could probably not have afforded to wait for). We thus thought we had working logic, but our case designs were scuttled. After enlarging the cases to accommodate extra cooling, our base unit was more than ten times taller, and our next size up, while the same height, was three times longer. Highest units had a water cooling system[1].
Our QA was able to find other sorts of flaws, like misprinted unpopulated circuit boards, software faults on the host, or when we received shoddy interlink cables that either melted under test[2] or other cables that scrambled communications[3].
All of which is to say, that many other pressing issues can interfere with doing what you feel you ought to be doing. At least at our scale. I can't speak for the likes of big guys like Intel or AMD, but it's possible that unfound faults or known unpatched flaws can ship because resources were committed elsewhere or fabrication leadtimes preclude waiting. This is not to say that shipping a security flaw is okay, but rather that sometimes you think you've done you're due diligence, or sometimes your choices seem to be "Ship, ship late, or never ship.". The answer you pick can be existential, so you hope you've picked the least bad option.
[1] Misbegotten, because "beauty of the promo images".
[2] Conductors much thinner than spec, not initially observed because both ends were fitted with moulded plugs.
[3] Longer than spec, initially recieved with enthusiasm by assembly staff, before an engineer investigating a difficulty saw them and exclaimed, "No-no-no! That's longer than I am tall! Stray capacitance alone will kill the communication.". (He uncharacteristic'ly exagerated here. While they were three times longer than expected, this was at most .40 times his height.)
Generalizing: QA/Test and dev people just think differently.
I served as QA manager for a while. I was fortunate to have worked with some really, really good QA/Test people. They're more rare than good devs.
Some individuals can do well in both worlds.
I've had great devs who were pretty good at test. It seems to me like these devs came from outside of software and CS. Like from aerospace or ballet or history.
I've mentored QA/Test people so they could better automate and manage their work. Then they could pick up tasks like CI/CD, testing scripts, scrub data, etc.
But, in my experience, devs are bad at testing, and just terrible at QA.
Getting enough QA/Test support on a team in the 90s was a tough sell, even though everyone gave lip service to quality.
It's been a long time since I've worked with an actual Test person, dedicated to that role. And that was just 1 person vs 8 devs. So ridiculous.
These days, any kind of "test" is done by "business analysts", whatever that means. And I can't recall the last QA person I've worked with (since my stint as QA Manager in the '90s).
FWIW, I wholly agree with Dan Luu's observations about today's QA/Test standard practice in software vs hardware.
And maybe just maybe when the Snowden revelations started to come out some people woke up and realised that the companies who design the processors used in the vast majority of computers are from the US.
Since Applied Cryptograhy and everyday since publication it's been well-known and well-understood that NSA's never-ending efforts to weaken systems to make it easier for them to do their work wrecks havoc and costs billions to everyone else. Luckily their attempt to get everyone to adopt a PRNG that was broken on-purpose was thwarted.
But who's to say that any chipmaker gets bribed to backdoor their design to allow the reading of any page of RAM from any protection level doesn't happen? It would make their job super easy.
Some people might say “don’t attribute to malice what can be attributed to incompetence” but introducing bugs via bribing an insider or even getting one of your people a job at Intel or AMD would be a very clever way to give yourself the keys to (nearly) all the castles.
Just don’t forget to patch the microcode on your own systems.
The intersection of probabilistic optimization and timing based side channels is a gift that will never ever fully go away.
Everything _really fast_ which is approximately any very mature systems gear has probabilistic optimization in it now, and that's where a great deal of modern performance comes from.
Even thermals and power draws produce side channels. Eradicating every side channel is untenable, research is required to understand what side channels are tenable, then we have to patch them up as best we can.
I'm waiting for the one where we find that binning is involved in an integrated way, and that some arbitrary sub-population of popular chips turn out to much more exploitable than others due to particular paths being disabled and leaking extra timing info. That'll be a really fun and awful day.
You are describing a creator bias, there are probably a better name, where you think the ones who created something knows it best. For example, you could create a programming language, a game, or anything, and you think that you know it better then someone who use it for several hours every day. The larger the user base the less likely you are better or know it better then all users.
I think you're last statement is half right. They probably don't bother looking for them that hard because if they look for them they might find them and then have to make their CPUs slower before launch.
> Maybe they did and are well aware of all these attacks but to save face and performance hits they simply hold on to them hoping nobody makes them public?
Or they chose not to look for these types of bugs in the first place, for those reasons.
There's more money to be made in performance and efficiency than security. If chipmakers could design and build the perfect processor, they would. But like anything else complex, there are compromises everywhere.
Granted but the vendors accept these are serious problems given that they are immediately patched and the mitigation all enabled by default even at significant performance cost (most chip generations are down double digit perf % based versus "zero mitigations" at this int).
So they don't need to build the "perfect processor" but why aren't they discovering any of these issues themselves?
It's probably that they are aware of several vulnerabilities, such is life. But they are unable to prioritize them and assess which to solve first, they need external auditors using black box techniques to help them identify which are exploitable by external attackers, so they can fix that one, and not the other non issues
They know at release about many of these bugs but between incentivizing upgrades and selling the resultant bugs as backdoor to 'three letter agencies' there's simply too much money to be made by not disclosing/patching the problems before third parties release discovery of the problems.
My guess: There are a lot more researchers on the outside than inside. And also incentives - you would need a red team at Intel who tries to attack their own chips, and those people would be even fewer. And the best people may want to stay independent.
Having worked in the industry, my gut feeling is that chipmakers don't invest all that much in looking for and preventing these sorts of attacks.
When working on a new feature, you are desperately trying to deliver on time something that adds value in the sorts of scenarios that it was designed for. And that is already hard enough.
an easier explanation is that modern chipsets are incredibly complex, side channel attacks like these are hard to reason about and traditionally, processors themselves have not been the target of these kinds of attacks, so i don’t think the engineers working on them are accustomed to thinking about them as attack surfaces in this way.
You didn't have to think about this when you ran your code on your own chips; so it's your fault for backdooring the front end into the datacenter.
But now we have the 21st-century mainframe we like to call the cloud, where everything is shared. So I upload a container image with the vampire vuln intending to read all the activity on the host. Other customers jobs, the OS itself, even steal internal keys used at Amazon.
The motivation to do this kind of attack now is incalculable.
I kid. The entire Zen class of AMD processors are vuln to Inception so you're not safe anywhere.
There is the "just use OS/2" strategy of the 1990s (referring to the period from 1995-2002 where you could sit happy in your niche OS that researchers and BHs were not targeting): in this case s390x probably has the same problem but no one is paying attention to it (other than a 3 letter agency, of course), so the chances of a payload reaching that arch are very close to zero.
There are security teams and/or methodologies inside all major CPU designers today that look at speculation and other side channels. Although these might still not be up to quite the kind of rigor that you see in traditional verification. That is to say, their unit tests and fuzzers and formal analysis and proving methodologies are all well set up to verify that architectural results are correct, my guess is that they don't all verify intermediate results or side effects can't be observed by different privilege contexts.
In many ways it is a much more difficult problem too. Going back to first principles, what execution in one context could have an affect on a subsequent context? The answer is just about everything. If you really wanted to be sure you couldn't leak information between contexts, you could only ever run one privilege level on the machine at any time. Not just core, entire machine. When switching contexts, you would have to stop everything, stop all memory and IO traffic, flush all caches and queues, and idle all silicon until all temperatures had reached ambient, voltages, fans, frequency scaling had settled. Then you could start your next program. Even then there's probably things you've forgotten -- programs can access DRAM in certain ways to cause bitflips in adjacent cells for example, so you've probably got a bunch of persistent and unfixable problems there to if you're paranoid. That's not even starting on any of the OS, hypervisor or firmware state that the attacking program might have influenced. So the real answer is that you simply can't share any silicon, software, or wires whatsoever between different trust domains if you are totally paranoid.
All of these things are well known about, but at some point you make your best estimation of whether something could realistically be exploited and that's very hard to actually prove one way or another. Multiply by all possible channels and techniques.
That's probably why you see a side channel vulnerability discovered every month by outsiders, but very few architectural defects (Pentium FDIV type bugs).
That said, this issue looks like a clear miss by their security process. Supplying data to an untrusted context, even if it can only be used speculatively, is clearly outside what is acceptable, and it is one of the things that could be discovered by an analysis of the pipeline.
Contrast with the recent AMD branch prediction vulnerability, which could plausibly fall under teh category of it being a known risk but was not thought to be realistically exploitable.
As others have said though, everyone makes mistakes, every CPU and program and every engineering project has bugs and mistakes. I don't know if you can deduce much about a CPU design company's internal process from looking at things like this.
General caveats: are there many clouds that still run workloads from different users on the same physical core? I thought most had changed their schedulers years ago so you can't get cross-domain leaks between hyperthreads anymore. Claiming that it affects all users on the internet seems like a massive over-exaggeration, as he hasn't demonstrated any kind of browser based exploit and even if such a thing did exist, it'd affect only a tiny minority of targeted users, as AFAIK many years after the introduction of Spectre nobody has ever found a specex attack in the wild (or have they?)
I think the more interesting thing here is that it continues the long run of speculation bugs that always seem to be patchable in microcode. When this stuff first appeared there was the looming fear that we'd have to be regularly junking and replacing the physical chips en masse, but has that ever been necessary? AFAIK all of the bugs could be addressed via a mix of software and microcode changes, sometimes at the cost of some performance in some cases. But there's never been a bug that needed new physical silicon (except for the early versions of AMD SEV, which were genuinely jailbroken in unpatchable ways).
>are there many clouds that still run workloads from different users on the same physical core?
There are a vast number of VPS providers out there that aren’t AWS/GCP/Azure/etc where the answer is yes. Even the ones that sell ‘dedicated’ cores, which really just means unmetered cpu
What about burstable instances on AWS, and whatever is the equivalent in other clouds? Hard to imagine those having a dedicated core, would probably defeat the purpose.
My guess is there's likely less value in trying to target those kinds of environments... Just poking random data out of lambda or low end vps neighbors is a needle in a haystack the size of the moon in terms of finding anything useful.
It's more likely useful as part of a group of exploits to hit an individual, targeted system.
Per the paper, this looks like an attack against speculated instructions that modify the store forward buffer. The details aren't super clear, but that seems extremely unlikely to survive a context switch. In practice this is probably only an attack against hyperthread code running simultaneously on the same CPU, which I'd expect cloud hosts to have eliminated long ago.
Yeah, the way AWS has stock language like “AWS has designed and implemented its infrastructure with protections against this class of issues” supports the idea that you’re probably not getting anywhere with exploits of this class on a major cloud host any more.
What's interesting is that the FDIV bug from 1994 could also be worked around, but Intel recalled and wrote off those processors[1]. For their latest several problems, their response was more of a "sucks to be you". While they provided microcode updates and worked with OS vendors, there were performance impacts that materially affected the value of the chips.
The software workaround for the FDIV bug required the actual userspace software to be modified and recompiled. There was a decade of pre-existing software out there that would be hard to fix, especially in the days before most people had the internet.
There was nothing the OS could do to work around the bug, short of disabling the entire FPU and falling back to expensive software emulation of all floating point math.
The workarounds for all these speculation bugs can be mostly applied at the operating system and/or microcode level, and is comparably cheap.
Yes, I think that's what I said? Every attack no matter how deep it seemed to be has been patchable in microcode, sometimes at a cost in performance. But so far nobody had to toss the physical silicon, at least not with Intel. The malleability of these chips is quite fascinating.
On some Skylake CPUs to get full mitigations you are taking a 30% performance penalty _and_ you need to disable SMT which is often another double digit penalty. It's not a literal "toss the physical silicon", but it's getting there.
In fact my thesis is that it's never a literal "toss the physical silicon" because the kernel is able to take control when switching between tasks so (with help from the microcode) it's able to wipe all potential speculation vectors (at an arbitrarily expensive cost) before switching to the next task. This also explains why SMT is unfixably broken on some processors, since by design the kernel does not intervene in the task switching on the virtual cores.
I think it was both. There were initial software only patches, but those were quickly superseded by microcode+kernel patches, where microcode added features the kernel enabled. IBRS or something like that.
> General caveats: are there many clouds that still run workloads from different users on the same physical core? I thought most had changed their schedulers years ago so you can't get cross-domain leaks between hyperthreads anymore.
Isn't this the whole point of AWS' t instances? It's my understanding that they are "shared" at the core level, or else there wouldn't be a reason for the CPU credit balance thing.
They are definitely time-sliced among tenants and very possibly two tenants may run at the same time on two hardware threads on the same core: but you could have a viable burstable instance with time-slicing alone.
Nevermind, AWS explicitly documents that all instance types, including burstable, never co-locate different tenants on the same physical core at the same time:
It only seems to document that there's group scheduling for SMT cores. But that doesn't prevent issues due to switching between customers on the same physical core, no?
"It is possible, however, for two burstable performance EC2 instances to run sequentially (not simultaneously) on the same core. It is also possible for physical memory pages to be reused, remapped, and swapped in and out as virtual memory pages. However, even burstable instances never share the same core at the same time, and virtual memory pages are never shared across instances. "
I only started reading the paper - so I very well might be wrong here - but it doesn't look to me like you need victim/attacker to be scheduled simultaneously on two SMT threads, but that a single core sequentially executing victim / attacker code would be vulnerable. It's possible that the cross-customer "context switch" is larger than the the vulnerable window, but I'd not want to bet on it.
> But that doesn't prevent issues due to switching between customers on the same physical core, no?
Yes they are explicit that customers may be time-shared on a physical core ("burstable" instances don't really make sense without that). Most of these attacks aren't known to be possible in that scenario and in any case the mitigations are much easier since flushing sensitive state at group scheduling boundaries is much less costly than permanent dynamic changes to how concurrent SMT threads interact.
Yes, it's certainly easier to mitigate at a boundary that's already as costly as switching between VMs.
The paper documents that disabling SMT does not entirely mitigate the problem (In 9.1). They briefly mention trying instructions to avoid the microarchitectural leaks, but don't go into more detail than mentioning verw isn't sufficient.
They state that a switch to/from SGX, with SMT disabled, doesn't prevent the attacks. See 8.1. That's not the same as a cross-vm switch, but it's certainly interesting that the attempts at flushing microarchitectural state when exiting SGX don't provide protection.
Think of all the cloud resellers that are out there who really aren't segregating their tenants out or it's just a web shop with proxmox who recombined their own customers onto a core even though the cloud provider specifically segregated it
I think most if not all cloud VMs dedicate a core to you.
Well, there are some that share like the T series on AWS and I think other clouds have similar, but my bet is they can put in an extra "flush" between users to prevent cross tenant leakage.
Of course cross process leakage for a single tenant is an issue, in cloud or on prem, and folks will have to decide how much they trust the processes on their machine to not become evil...
This concern I also share and it's probably worth converting into layman's terms so that all computer users understand what it is. Basically the job scheduler Behavior in the OS needs to surface to the user with understandable language they can read so they can make the trade-off decision.
> Claiming that it affects all users on the internet seems like a massive over-exaggeration, as he hasn't demonstrated any kind of browser based exploit and even if such a thing did exist
He's saying it likely affects "everyone on the Internet" because most servers are vulnerable.
> These are not the same thing. Afaik, most “vCPU” are hyperthreads, not physical cores.
OP didn't say otherwise. They are saying that public clouds do not let work from different tenants run on the same physical core (on different hyperthreads) at the same time.
This doesn't prevent you from selling 1 hyperthread as 1 vCPU, it just means there are some scheduling restrictions and your smallest instance type will probably have 2 vCPUs if you have SMT-2 hardware (and that's exactly what you see on AWS outside of the burstable instance types).
A lot of people on YC are enterprisey-brained and only think there are 3 possible clouds, and then there is the rest of the planet who can't afford to park their cash at AWS and set it on fire.
Once again it seems clear that running code from two security domains on the same physical processor cores is just not possible to get right, and we should probably just stop doing it.
There are really only two common cases for this anyway. VMs and JavaScript.
For VMs we just need to give up on it. Dedicate specific cores to specific VMs or at least customers.
For JavaScript it’s a bit harder.
Either way, we need to not be giving up performance for the the normal case.
Agreed. Browsers are now nothing but an application platform of APIs (https://developer.mozilla.org/en-US/docs/Web/API). For some reason they still retain the vestigial HTML, CSS and JS, but really all you need is bytecode that calls an ABI, and a widget toolkit that talks to a rendering API. Then we can finally ship apps to users without the shackles of how a browser wants to interpret and render some markup.
The idea of security "sandboxes" is quaint, but have been defeated pretty much since their inception. And the only reason we have "frontend developers" rather than just "developers" is HTML/CSS/JS/DOM/etc is a byzantine relic we refuse to let go of. Just let us deliver regular-old apps to users and control how they're displayed in regular programming languages. Let users can find any app in any online marketplace based on open standards.
HTML/CSS is one of the easiest way to develop GUIs, one of the most visually flexible, and one of very few things this side of ncurses that runs everywhere. Actually, without hacks, there are probably more end user devices with a browser than a command line.
It's also highly standardized. Regular programming languages have dozens of GUI toolkits, or at least one per platform.
I'd rather we go the other way, and build the browser into the OS, so that desktop apps just serve a perfectly standard web server, with an API to launch a special OS client with slightly more native integration(Toolbar icons, closing the process when the browser closes, etc), to make native app dev easier and hopefully more popular.
Security sandboxes mostly work. People aren't constantly getting viruses from clicking the wrong link, at least not quite as much. They're not perfect, but they're better than having to completely trust 100 different sites unsandboxed, and they can be improved. I'd rather have crappy security than no security at all.
If by "highly standardized" you mean "you don't get a choice in what you can do or how it works", I agree.
Native mobile apps thrive despite this magical web browser working everywhere, because the web browser simply doesn't do what native apps do. You may enjoy that, but a million businesses and billions of users out there don't agree, because they use native apps. There were 255 billion native mobile app downloads in 2022, generating billions in revenue. That's not a mistake or accident; that's a market filling a need.
If we really want an application platform that works everywhere, then let's stop dicking around with these stupid document hypertext viewers and build a real app platform that works everywhere.
There are probably more website visits than that. People download games and bank apps and things like that, stuff they expect to use frequently, want fast access or offline ability, of they need hardware access.
There are still tons of things that don't need an app. Things that are inherently online and not accessed frequently work fine as sites.
The web doesn't limit what you can do that much, it limits how you can do it, which I think is a good thing. Less to break (Android API levels have the same effect) with fewer original lines of code. Less focus on clever and interesting code and more focus on UI and features (Although they try their best to reinvent the same js framework 1000 times).
A lot of the limitations are probably just Mozzilla hating anything that could be used for tracking, and not trusting users to manage permissions. The trend has been pretty strongly towards making the web very close to native apps, with all kinds of APIs.
Native apps fill a use case very well, that the web does not. That doesn't make the web obsolete.
> A lot of the limitations are probably just Mozzilla hating anything that could be used for tracking, and not trusting users to manage permissions.
Don't denigrate Mozilla for protecting users. There are a lot of privacy issues that can't be addressed by permissions models, Android serves as an example of that. Enumeration is not effective.
If Mozilla are standing in the way of invasive webpages, more power to them.
> Native mobile apps thrive despite this magical web browser working everywhere
There’s also user behavior. Many users are conditioned to get software through the App Store. I’ve seen this be a driving factor for quite a few web native applications spinning up native dev teams and shipping native clients.
Many folks are surprised to see just how far you can push a browser app and how small the gap between web and browser has become for well built applications, including native-like things like Bluetooth, NFC, USB, etc. (see: https://youmightnotneedelectron.com/)
Have hacked with quite a few devs/companies that had a fully functioning offline capable web native application. The most requested feature they’d get? “I want to download it from the App Store.” (Usually in the form of “I can’t find your app in the App Store”)
I suspect this is why the PWA experience on mobile devices hasn’t been well paved and why some App Store policies call out “don’t just wrap a browser view in a native app” - they want to keep users coming in the front door of a marketplace where they collect a cut off the top of all transactions.
We really need to standardize "just wrapping a browser view". Why are we shipping a whole browser when we could be shipping a zip file of HTML with some metadata, and maybe a few tiny native helper utilities?
This is how PWAs work on smartphones. I recall coming across some electron alternatives that use the system webview and those are able to generate binaries hundreds of kilobytes in size. But you lose out on access to many of the modern APIs due to Safari there.
This is standardized for many years and called PWA. There are ways to interact with native helper utilities as well (simplest is just run http server on localhost), but not for mobile apps.
For mobile apps you can use webview component which will use system browser. You don't need to ship the entire browser (actually you can't even do that on iOS).
You are delusional and terribly out of your depth if you think even a sizable minority has any interest in getting rid of hypertext and the web. Networked native apps are useless without the web architecture as the glue to integrate between them. It is highly likely that URIs, HTTP and hyperlinks will still be a foundational elements of our technology world in a hundred years.
That sounds awful. People would be jamming entire 10MB toolkits into WASM to do things that HTML could do, and performance would probably suffer.
HTML is declarative. The machine understands it. I like things machines can understand and optimize. Things that you can write automated tools to work with because it's not a full turing machine. If you want to make a screen reader, you can. If you want to reflow for mobile in a better way, you can, because you know what's text and what's an image.
Just giving people a programming environment and the ability to draw some pixels, the browser has no idea what the intent is. There's nothing to optimize unless the individual sites do, and I doubt they have Google and Mozilla's budget.
We already have too many unnecessary powerful imperative systems out there.
IF GTK/Qt etc can render to canvas using WebGPU while compiled to assembly, I think game is almost over than too IF there's a way to lazy load application modules.
Think Autodesk products. Certain parts (wasm modules) only load when you hover over a menu while overall app loads within milliseconds because it just has the main window and such.
This has been possible for years. I actually got in the habit of porting my qt projects to the web target because they ran better than native compiled on some of my older machines.
But “regular old apps” don’t do all the same things.
The whole point of properly developing web apps is that html/css indicates a certain level of semantic understanding which allows for different interpretation in different contexts.
If you build a web app well, you build it to be a good experience on a computer for a power user with a huge screen and a keyboard for shortcuts, good for a user on a tiny touch screen, good for a blind person who doesn’t use a screen, and good for robots to parse and index. It should even be good for someone on an iPad with a mouse or a pencil which interprets the whole concept of the mouse differently from the desktop user’s mouse.
The agreed-upon semantics of HTML and the separation of visual styling into CSS is what allows you to take a step beyond just building an app and add a layer of “say what you mean” such that human and non-human users can re-interpret it into their own devices, use cases, and specific needs.
Web frontends are at their best when you don’t expect to perfectly control the end user experience, but try to convey the semantic meaning as perfectly as you can so it can be interpreted into good experiences even in situations where the display is wildly different than originally intended.
> really all you need is bytecode that calls an ABI, and a widget toolkit that talks to a rendering API
Browsers have a 2D renderer (canvas) and you can write your GUI code in C++, Rust, or whatever and compile it to WASM. Some widget toolkits for this even exist already. If this were a superior model it would have taken over by now? I guess you are after deeper integration with the desktop environment?
The challenge with toolkits confined to <canvas> rendering is their inability to effectively utilize the browser's integration with the OS. Key components like font rendering, input methods (including standard key shortcuts and behavior within input fields), selection handling, image decoding, network stack, native-like scrolling, and accessibility features need to be recreated from scratch. A web toolkit must use the DOM for optimal performance. (and therefore lower to HTML)
> In computer software, an application binary interface ( ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
There’s a marketing opportunity here to put multi-core back in the spotlight. Most workloads have reached the point of diminishing returns for adding more cores to a CPU, but if it turns out we need more cores just so we can run more concurrent processes (or browser tabs) securely, then here come the 128-core laptop chips…
From a user’s perspective I often think that applications which run multiple processes, demand multiple threads and large chunks of memory are too entitled.
I know it’s a (not even) half baked thought. But there’s something to that. We never really think of “how many resources is this application allowed to demand?”
Software would be orders of magnitudes faster if there was some standard, sensible way of giving applications fixed resource buckets.
I've wondered for a while whether it would make sense to split the CPU into a "IOPU" and a "SPU"
- The IOPU would be responsible for directing other hardware on the system. It doesn't need to be very performant.
- The SPU would be optimized for scalar and branch-heavy code that needs to run fast.
The SPU could have minimal security, just enough so it can't read arbitrary memory when fetching from RAM. It would only run one program at a time, so speculation shouldn't be an issue.
At least on my system few programs need a lot of processing power (and even then only intermittently), so little task switching should occur on an SPU.
> Once again it seems clear that running code from two security domains on the same physical processor cores is just not possible to get right, and we should probably just stop doing it.
Yes. This has had its heyday: the era of the time-shared systems, from the 1960's, right into the 1990's (Unix systems with multiple users having "shell accounts", at universities and ISP's and such). These CPU attacks show us that secure time-shared systems where users run arbitrary machine code is no longer feasible.
There will still be time sharing where users trust each other, like workers on the same projects accessing a build machine and whatnot.
It is sort of like the second law of thermodynamics (before statistical mechanics came around and cleared things up): sure, maybe not well founded in some analytical or philosophical sense, but experimentally bulletproof to the point where anyone who tries to sell you otherwise would be regarded very suspiciously. The idea that any two programs running on a computer can be prevented from snooping on each other.
Sacrifice IPC/core to reduce gates per core, stuff more cores into a square centimeter, pin processes to cores or groups of cores, keep thread preemption cheap, but let thread migration take as long as it takes.
Arm already has asymmetric multiprocessing, which I feel like is halfway there. Or maybe a third. Fifteen years ago asynchrony primitives weren’t what they are today. I think there’s more flexibility to design a core around current or emerging primitives instead of the old ways. And then there are kernel primitives like io_uring meant to reduce system call overhead, amortizing over multiple calls. If you split the difference, you could afford to allow individual calls to get several times more expensive, while juggling five or ten at once for the cost of two.
I've wondered if we can't give a dedicated core to the browser. Of course, then web pages can steal from other web pages. Maybe task switching needs to erect much higher barriers between security contexts, a complete flush or so?
I wish chips would come with a core devoted to running this kind of untrusted code; maybe they could take something like an old bonnell atom core, strip out the hyper threading, and run JavaScript on that.
If a script can’t run happily on a core like that, it should really be a program anyway, and I don’t want to run it.
I’m not saying you’re wrong, but I have a hard time believing web developers would be capable of writing code efficient enough to share a single core. LinkedIn was slamming my CPU so much that I isolated it in a separate browser
I think you are right, my wish involves living in a slightly different universe where the web has taken a slightly less silly direction of development.
It probably would be possible to add a new instruction that causes the processor to flush all state in exchange for sacrificing task switching speed. Of course it might still have bugs, but you could imagine that it would be easier to get right.
Of course, it’s not doing much for the billions of devices that exist.
I would hope that we could find a software solution that web browsers can implement so that devices can be patched.
Either way, I would want such a solution to not compromise performance in the case where code is running in the same security context.
This is what I don’t like about existing mitigations. It’s making computers slower in many contexts where they don’t need to be.
Software can choose to invalidate all cache, fence access, and the like today. It may not be a single instruction but it's not far off. Usually something like "just don't JIT 3rd party JS to native code" is "secure enough" most don't want to go down that route though. For cloud (reputable) providers just don't allow more than 1 VM to be assigned to a core at the same time and flush everything between if they are time shared. The mitigations are the way to keep the most overall performance outside those who are most concerned with maximum security, so they are the most popular.
Normally only one tab is in view at a time. Probably a bunch of use cases would hate this, but what if all background tabs just got very sparse time-share of a core and the foreground tab a dedicated core?
Do you really think that giving up on getting things done right is the way to progress computing? While AMD has it's own spectrum of problems and not-quite-there security features, most of their vulnyerabilities have been fixed in microcode shortly after disclosure.
We as an industry should stop excusing chipmakers from doing their jobs and reject broken products. It's brand loyalty all over again, like when Apple does something retarded like losing the headphone jack and the whole industry follows, breaking years of interoperability.
When the products/services we buy break, we should demand better, not lower our expectations.
Vector is found to get untrusted code C to run in the user area on Z via exploit in X that Y has not acknowledged, so researchers publish a CVE with an example.
C starts trying to read memory from threads shared on same vCPU, revealing db connection string used by X, the nonce and salt for hashing.
> Once again it seems clear that running code from two security domains on the same physical processor cores is just not possible to get right, and we should probably just stop doing it.
I believe this is why OpenBSD disabled SMT by default in June of 2018. [0]
It can still be enabled with a simple 'sysctl', though.
Once you "dedicate cores to specific VMs" you will find that chip designers can also screw that up, just like they can screw up protection within a core. So you might as well proclaim that "impossible to get right" preemptively.
They said physical processor. I took this to mean chip. Of course you can't trust chips on the same mother board (there could be a bug where they can read the same memory), so you need a network boundary. So different rack on the server, for each [boundary, however you define that...]
A number of vexing CPU security vulnerabilities have resulted from a general class of optimizations that share cache resources (memory, TLBs, branch prediction tables, etc.) across different domains, which enables cross-domain attacks.
There is a potential for vulnerabilities even when the domains are separated by a network boundary ... if those domains view the world through some shared cache at the network layer.
> This is an unreasonable position. Vulnerabilities can be fixed
That's a highly optimistic position, to the point of being almost wishful thinking.
The vulnerability being talked about today has been around since 2014 according to the report. Possibly being exploited for unknown number of years since. Sure, maybe we can workaround this one, now.
Other similar ones to be published years into the future are also there today being likely exploited as we speak.
Running untrusted code on the same silicon as sensitive code (data), is unlikely to ever actually be truly safe.
The real problem here is the x86 architecture. We should stop using it. It's too complex, it's full of stuff made for backward compatibility purposes, it has too many instructions that may have unpredictable results, and for that exact reason it's extremely difficult to get things right. Somewhere in the thousands of instructions that it has you will surely find a bug.
We should move forward. Apple did a great job with the M{1,2} processors. Not only they are fantastic in terms of performance and energy usage, but also (to this moment) they don't seem to suffer from these issues. The reason is that the CPU is simpler, and a simpler thing is easy to design right in the first place.
Performance sorta hinges on it. It could be that the cheaper way the chipmakers deal with it is to phase out the 4core set and push the cores on a die higher, and to do that--incorporate an older core design into dedicated cores for untrusted code.
This would also require changes at the OS-makers to tag thread forks for trusted and untrusted behavior.
Essentially: instead of shutting down SMT for the entire machine, make the customer "prove" the code is safe for elevation or else it gets scheduled as non-SMT.
Even if we could say for sure that x86 has been disproportionately affected by speculative execution bugs (which already seems dubious), that could easily be due to a kind of selection bias. Presumably security researchers as a group more or less focus on the most popular and relevant ISAs/microarchitectures.
Vulnerabilities are like unreliable cars. It doesn't matter so much if they can be fixed, it's the very, very high opportunity cost of needing to be fixed, when you were busy doing something else of high value.
Responsible people tend to pick the small consequence now over the unknowable distribution of consequences later.
The mitigation here can incur a whopping 50% performance penalty. At what point can customers return these CPUs for either being defective or sue for false advertising? If they can't safely meet the target performance they shouldn't be doing these tricks at all.
Did processor companies ever advertise that processors guaranteed certain security properties of the software they execute?
Aren't system designers at fault for coming up with the idea of a context switch and assuming that we can trust a processor not to leak details across artificial software constructed boundaries?
The problem is that cycle-speed boosts left the industry cic. 2012-or-so. All the perf boost we get these days is by optimizing the instruction sequencing and multiprocessing. This is why languages like Go popped up (making the advanced programming topic of multiprogramming an entry-level accomplishment) and you now see the 'async' decorator plastered everywhere in C#, and so-on.
Keeping the security context intact and separated is a gargantuan task.
To me it makes more sense to add "lousy cores" to the die and force the operator to declare the launching threads are safe for SMT, else the job gets scheduled to the less-performing core where the pipelining is safer. It delegates responsibility to the chip-gobbler, forces them to understand the tradeoff for performance, until some elegant solution is found for side channel attacks like this.
> But wait, how is it even considered a processor bug ?
These side channel attacks allow an attacker to read registers from other processes before any context switching has occurred, or independently of any context switching (even if the OS has been written to correctly clear state).
> Did processor companies ever advertise that processors guaranteed certain security properties
It's not the guaranteed security as much as the advertised speed that's the issue. If they put out a chip which could only be used at half the advertised speed or else it would catch on fire, nobody would argue that they should be let off the hook because they didn't guarantee that the chips were fireproof in their ads.
If the chips can't perform at advertised speeds safely during typical use they're not delivering what was advertised.
AMD has the same problem (Inception). Predictive instruction pipelining makes timing and context separation harder. Even if you are on s390x or M1 it's not like you are safe either. This is a whole field of study.
In my mind the better mitigation is to put control over trusted code back to the user and to do that you have to add less-performant cores onto the die and force the operator to elevate (or not) to SMT.
Right now it's an all-or-nothing proposition for the whole board. I would like to think that you can take your untrusted code and stick it on the less-performy cores with the safer instruction pipelining scheme so an actual physical barrier exists.
If that was in the chip architecture, then it's up to OS vendors to surface it in a way that developers understand, and then down to the operator to decide upon configuration.
You are never going to get a perfect-solve from the chipmakers on this where the consumer has to do nothing.
Only up to 11th gen... it didn't seem like this could have been disclosed to Intel soon enough for them to have fixed it for 12th gen, so had they just happened to fix it while fixing something else, or what?
Decided to look in the paper and "Intel states that newer CPUs such as Alder Lake, Raptor Lake, and Sapphire Rapids are unaffected, although not a security consideration and seems just a side effect of a significantly modified architecture." So basically they just randomly fixed it, or at least made this particular exploit nonworkable.
Microarchitectural behaviour changes from generation to generation, and thus so do side effects. Fixing things by accident (and also introducing new problems by accident) are relatively frequent occurrences
All a publication indicates is that a white/grey hat researcher has discovered the vulnerability. There is no way to know if or how many times the same flaw has been exploited by less scrupulous parties in the interim.
And information leak exploits are less likely to be detected than arbitrary code execution. If somebody is exploiting a buffer overflow, they need to get it exactly right, or they'll probably crash the process, which can be logged and noticed. The only sign of somebody attempting Downfall or similar attacks is increased CPU use, which has many benign causes.
Since it is in a class of other well known vulnerabilities, I'm going to assume that there has been quite a bit of active research by state-operated and state-sponsored labs. I think it's more likely than not that this has been exploited.
on Linux, any cpus that don't have updated microcode will have AVX completely disabled as a mitigation for this issue. That's rather harsh if you ask me and would be very noticeable. Now I'm interested in finding out if I can get updated microcode..
> Specifying "gather_data_sampling=force" will use the microcode mitigation when
> available or disable AVX on affected systems where the microcode hasn't been
> updated to include the mitigation.
Disclaimer: I work on Linux at Intel. I probably wrote or tweaked the documentation and changelogs that are confusing folks.
AWS customers’ data and instances are not affected by this issue, and no customer action is required. AWS has designed and implemented its infrastructure with protections against this class of issues. Amazon EC2 instances, including Lambda, Fargate, and other AWS-managed compute and container services protect customer data against GDS through microcode and software based mitigations.
"Red Hat’s internal performance testing of a worst-case microbenchmark showed a significant slowdown. However, more realistic applications that utilize vector gathering showed only low single-digit percentage slowdowns."
The performance impact of the microcode mitigation is limited to applications that use the gather instructions provided by Intel Advanced Vector Extensions (AVX2 and AVX-512) and the CLWB instruction. Actual performance impact will depend on how heavily an application uses those instructions. Red Hat’s internal performance testing of a worst-case microbenchmark showed a significant slowdown. However, more realistic applications that utilize vector gathering showed only low single-digit percentage slowdowns.
If the user decides to disable the mitigation after doing a thorough risk analysis (for example the system isn’t multi-tenant and doesn’t execute untrusted code), the user can disable the mitigation. After applying the microcode and kernel updates, the user can disable the mitigation by adding gather_data_samping=off to the kernel command line.Alternatively, to disable all CPU speculative execution mitigations, including GDS, use mitigations=off.
Is that 50% overhead for "Gather" instructions? If that is the case, then if 10% of instructions are "gathers" in your workload then that would be 5% overall.
My guess is that HPC systems which run applications whose performance strongly depends on efficient data scatter gather will immediately disable the mitigation.
It is kind of like thinking that it is a huge sale at a store when they say "Up to 70% off". And there are like two things in the whole store which are 70% off.
I always translate such weasel words to their more straightforward equivalent.
"Up to 70% off" means for me "Nothing in our store is discounted by more than 70%" :-)
The NES incorporated a chipset that buried an entire 6502 inside. You can get a Rockchip ARM chip for the price of a pizza that incorporates mixed cores on the die. Maybe the chipmakers don't NEED to solve every edge case until the end of time, but delegate side channel attack mitigations like this back to the chipgobblers (us).
Hear me out: Instead of making SMT an all-or-nothing pre oposition, we have "lousy cores" where untrusted code goes and make the customer "prove" it should get elevation to the cores with SMT?
I dont't want to clobber my chip that's running the mega-important payroll jobs where no one can load anything else on to the board under pain of death. However I would like to be forced into tagging what is safe to run SMT else I get stuck on the safer cores.
I might be an intern who has no idea what any of this stuff means and goes to Google it, then learn what this attack vector truly is. Then makes a plan on how to defend against it.
You can have performance cores x efficiency cores.
You do now want to be the one who proposes trusted cores x "lousy" (untrusted) cores. The benefits, however many, would be lost in the discourse of "promoting insecurity". Such is life.
Speculative execution seems like a never-ending rabbit hole of vulnerabilities. Though I feel like most of them end up being in Intel chips for some reason
> [Q] Should other processor vendors and designers be concerned?
> [A] Other processors have shared SRAM memory inside the core, such as hardware register files and fill buffers. Manufacturers must design shared memory units with extra care to prevent data from leaking across different security domains and invest more in security validation and testing.
Not sure what to make of this wording. Thinly veiled threat? Hint that other embargoes are in place?
AMD had a similar vulnerability recently (Zenbleed) that used speculative evaluation misprediction to create an effective "use-after-free" bug that would reveal the contents of the internal SIMD register file https://lock.cmpxchg8b.com/zenbleed.html, that might be what is being referenced here.
I do not doubt the severity of the flaw, but most practical attacks end up being far more mundane. Consider SolarWinds, for example. No dazzling tricks needed, whatever gets the job done.
Are state actors subject to different economics? Their budgets are finite, too. Why opt for a complex less reliable exploit if a simple more reliable one is available?
It feels like chipmakers never learned to "Make it work, make it right, make it fast", in that order. But then, hindsight is 20/20.
How much slower would processors be if they got rid of all complex / risky optimizations? How much performance could we gain back with more expensive components, more integration (e.g. SoCs), and other approaches that are unlikely to lead to security problems?
- we diable speculative execution, so every conditional branch goes for 1 cycle to the pipeline depth (5, 10, 20 cycles) to evaluate. Probably a 2x to 3x hit on performance.
- we disable prefetching, so every cache miss now has the full memory latency delay to fill a cache line. 1.2 to 1.3x hit???
- We disable SIMD as per this issue. 4x-8x hit on those parts of the code that usebit.
- If we need to go as far as disabling caching you can now only run instructions at the speed of main memory. 100x hit.
CPUs would only be faster than their 1970s ancestors because the clock speed was now 5GHz and not 1 MGHz.
It seems to me like most don't have a fundamental gut feel for what speculative execution actually is and the implications of "not getting it". At some level we need to fight for performance. My operating systems are only getting slower and shittier. I cannot fathom my CPUs going backwards too. Security must take a back seat at some point. You can't put bubble wrap and warning labels over everything or it becomes useless. The most dangerous tools are typically the most effective.
The CPUs are vulnerable because of the exact way in which they are being applied to a problem. Speculative execution is not inherently unsafe. Whatever future predicted memory prefetching shenanigans are going on in my CPU over here have absolutely ZERO impact on your CPU over there. Certainly someone could figure out a protocol/system/architecture that capitalizes on this notion that "2 different CPUs are indeed different CPUs".
One can see how any perspective here still causes trouble for Amazon, Microsoft, et. al., but that was a business risk they signed up for the moment they intended to squeeze every last drop of subscriber revenue out of the hardware. Why should everyone else on earth have to suffer crappier performance by default because of the business/software practices of a select few?
These vulnerabilities have a lot less to do with cloud providers, and a lot to do with networked computers in general. It's not unreasonable to expect this exploit to be done via web browser, as was demonstrated with prior speculative execution exploits.
Fundamentally, the only reason we need speculative execution is that we haven't updated our software to be more concurrent (reflecting how chips have kept pace with Moore's law for 15+ years), we still program as if we're in the 1970s.
This may turn into a great opportunity to force a rebuild a lot of ancient code.
> Why should everyone else on earth have to suffer crappier performance by default because of the business/software practices of a select few?
The author states in the article that they believe this may be exploitable from javascript in a browser. Just to hammer the point home, any web page could steal anything in memory on your computer. Spectre was also browser-exploitable, and was mitigated there partly by making access to high precision timers privileged. This is very much not a problem that only impacts cloud providers.
The only 100% reliable way is to turn off branch prediction completely, and yes, this would make the processors at least 2 times slower, perhaps more.
Too bad that apparently nothing came out of the Mill architecture. My limited understanding is that this architecture would not have such vulnerabilities.
Of course it's possible it would have others :-) but being much simpler, at least conceptually, perhaps it would have less and easier to mitigate. Oh well.
Given the classic stack overflow branch predictor question, you are underselling things quite a lot - a factor of 6 on those older processors in question, and who knows on modern processors.
Agreed, and I wonder if we'll see the adoption of "security hardened" CPUs which sacrifice performance features for non-exploitability. You can have one or the other, but not both.
The Linux mitigation can be disabled with gather_data_sampling=off in the kernel boot parameters.
Be warned, apparently Grub had some kind of problem back in August 2022 and this pre-existing bug broke my boot completely when I updated grub for the above mitigation. I had to boot into a live ISO and reinstall grub to fix it.
This isn't the first time, and won't be the last time, something like that happens with grub, and it's entirely an issue with the user not doing what they should when they run archlinux or some other minimalist distribution that doesn't automate the process of updating grub.
Grub uses a configuration file generator that reads from the human readable/editable /etc/default/grub and creates a /boot/grub/grub.cfg, and the result of that generator doesn't have a stable "interface". At any time, an update to grub might read grub.cfg incorrectly, if you do not generate it again.
99% of the linux distributions out there had no breakage with grub because they run the grub-mkconfig command every time grub is updated through grub-install. It is automated within the scripts included in the packaging of the distribution and the user will never see any of this in Debian, Ubuntu, Fedora etc. If a newer grub package comes out on those distro, the scripts will run both grub-install and grub-mkconfig. There will never be a mismatch between the grub binary versions and the config file state.
Arch Linux users who knew how grub works also had no issue. If you update grub, you must regenerate the config file (it's a "if you update" because arch linux, by default, will NOT automatically update grub at all, updating the package is not the same as updating grub). If you edit the config file and regenerate it when you have a newer grub package installed, you must also update the grub binaries. So any time you do something with grub, you MUST run grub-install, then grub-mkconfig. Doing only one of the two is akin to doing a partial upgrade, which is a no-no.
If, as I suspect, you're an arch user, I would recommend switching to systemd-boot, which doesn't even need a configuration file at all if you set up your system to follow its conventions and use unified kernel images. EFI binaries are automatically scanned and shown in the menu and it uses EFI variables to remember the few settings you can interactively edit, like the preferred kernel to boot. It's robust and only has what is needed in a small, KISS boot manager. It can't even be truly called a boot loader because that part is managed by efistub which is part of the unified kernel image. Very unlike grub, which has the whole kitchen sink, including its own implementation of a ton of filesystems..
Good to know and, yes, that was the issue (grub-mkconfig without grub-install). Somehow I've never had this issue in Arch before now and wasn't aware of this idiosyncrasy.. but now I know.
So this is literally the same thing as AMD's Zenbleed vulnerability? Ridiculous how these companies make so much money and are completely incompetent at handling security.
Theoretically, this can be mitigated permanently by disabling hyper-threading?
It claims to also allow you to read any data loaded into a vector register, any data loaded from a {rep mov} (i.e. memcpy), any data used by crypto acceleration, and a bunch more. Basically, the only data it does not let you read is regular loads into the regular GPRs (i.e. what would be a register load in a RISC architecture) though if you save/restore during context switches using the special instructions you will leak the values at the time of the context switch.
This is about as close to a total cross-process data leak as can be imagined.
In general, not really, but the most common string comparison instruction in x86_64 leaves the last character of one of the strings being compared with the other one just being a pointer into the C-style string.
I think you're asking this question because you're wondering if a container that uses environment variables for its configs would show up in this and I think the answer would be no because it's an operating system service that supplies the answers for the values, but every developer on Earth copies the values into variables where there is going to be a pointer put on a register at some point which then would make it vulnerable
I've only done a quick read through the link, but I think the model they imply is that a malicious user could rent a Cloud VM in AWS/Azure/GCP/etc and then sniff the contents of SIMD registers, similar to the Zenbleed attack which was also disclosed recently[1]. This is a big deal because optimized implementations of strcpy, strlen, and memcpy in glibc all use SIMD registers, and glibc is everywhere.
AFAIK none of the cloud vendors run multiple customer's VMs at the same time on the same core; even the "shared-core" virtual machines don't share timeslices (AWS goes into detail about this here[1]).
How do they know what data is in the registers? In the linked article, the person running the attack code knows what is running on the target. The target is also conveniently waiting for the attack code to run without doing anything other than referencing the target data.
I'm gonna get put on a list for typing this out but I'll clarify:
1. Bad guy creates cloud account and spawns 10 of the cheapest VMs across different data centers, let's say this costs a total of, what... $50 a month?
2. Bad guy reads this paper, and makes a program that frequently samples SIMD registers. Contents get dumped to stdout and then streamed over an encrypted line to a RAID array hosted in $COUNTRY_WITHOUT_US_EXTRADITION.
3. Bad guy writes program to sift through data dumps on RAID array for passwords, encryption keys, etc.
If you create a cloud instance right now that has an SSH login on port 22, you stream the SSH login logs and see a steady stream of attempted logins to your device. While the marginal cost of brute forcing SSH logins is free (no cloud VM needed) and my proposed scenario isn't, I think this is a very real scenario that needs monitoring.
A VM cloud provider can't block you from running at least one thread, which is all the malicious threads required for this attack.
However none of the big cloud providers share CPU cores between users to combat exactly this kind of thing. I really wish the people that did these disclosures were more up-front about this, instead of saying vague things like "frequently happens on modern-day computers". Though I guess you can assume that if an attack would work on AWS the researcher would definitely mention it, so the lack of such an explicit claim almost ensures the attack is not viable on major clouds.
This is one of many similar previous attacks, and more of these attacks will continue to come out and be increasingly weaponized. From now on the assumption must be that same-core computing is not secure.
I am a little unclear on the attack. What data in the temporal buffer is being forwarded to the attacking vpgather?
Is the content of the temporal buffer just being blindly forwarded during speculative execution even if the indexed address of the attacking vpgather does not match?
Otherwise how is the speculative vpgather allowed to load the values of the temporal buffer?
If it is not blind is it a virtual address match? I guess it could also be a not-Present mapping physical match as well? I can not think of any other possibility off the top of my head.
If it is a blind forward that is pretty amazingly bad.
In the victim process, the following instructions leak information towards the attacker (because they share internal hidden buffers with the gather instructions executed by the attacker) (the following are quoted from the paper):
• SIMD read. All SIMD operations that read wide data (128/256/512 bits) from memory are affected regardless of their function: e.g., vmov* only read, vpxor* read and compute the xor. These general-purpose instructions are used everywhere, e.g., compilers spread wide data reads to optimize memory access routines.
• SIMD write. The only SIMD write operations that are affected are the compress ((v)(vp)compress*) instructions.
• Cryptographic extensions. Cryptographic extensions, including AES-NI and SHA-NI (SHA1 and SHA256), when accepting a memory operand, are affected. Data leaks from these instructions expose plaintext data and the secret key, e.g., AES or HMAC-SHA.
• Fast memory copy. Fast memory copies of various data types: byte, word, dword, qword using rep movs* instructions are affected. These are widely used to speed up common memory operations such as memcpy and memove.
• Register context restore. Special instructions to more efficiently store/restore the register context (e.g., xsave/xrstor) are affected. GDS leaks the register context of both standard registers due to xsave/xrstor and wide registers due to fxsave/fxrstor.
• Direct store The direct store is affected. Intel has recently added support for a direct store instruction that can copy a 64 bytes cache line from a source to a destination address.
Are there any known attacks which exploited Spectre or Meltdown vulnerabilities? And is it likely that this vulnerability will be successfully used to perform attacks?
In modern CPUs registers are a high-level abstract concept (see register renaming), so writing to a register doesn't have any specific location to overwrite.
Recent Zenbleed vulnerability was an example of that — clearing of a register was setting a temporary boolean saying it's zeroed instead of writing actual zeros.
I don't think so. This vulnerability is leaking data from a load-buffer, not directly from registers. It affects data that is loaded in bulk or to vector registers.
However, general purpose registers are also loaded passing through this buffer during context switches.
Does anyone know what type of workloads this effects the performance of the most? Is this specialty-type of workloads or are general webserver/database/coding/compiling/gaming/desktop usages effected?
Can someone explain why the ssh video at 2:25, where the 256bit comparison is pasted in, it doesn't match? The first two colon separated number sets do match, but not the following two?
At the end of this video? https://downfall.page/media/gds_aes.mp4 They appear to match to me, albeit with the last few characters hidden by the timer. They are visible when the key is copied though.
The OS job scheduler informs the CPU when it's ideal to swap jobs. But the OS is not doing the work of moving the register and stack pointers, the microcode is. These timing attacks take advantage of shared information in the cache (where it's likely the context of the thread you are not supposed to be able to read is).
SMT (hyperthreading) introduces some ambiguity when the context change is going to happen, and it appears there's instructions that are callable where registers/mem can be read that were outside the calling context.
This isn’t true - note how similar vulnerabilities have occurred on ARM, POWER, etc. – and even if you weren’t wrong, it would be off-topic for this thread since you’re not giving anyone useful information.
I see you’re basically trying to “No True Scotsman” a definition where anything which isn’t RISC-V is too complicated. That’s a shame as the time you’re spending on counterproductive RISC-V advocacy could have been spent learning about this class of attack and how it has nothing to do with the ISA.
When a CPU implements speculative execution, this class of attack becomes a concern. If it doesn’t, it’ll be too slow for most applications. Fortunately, the people who are - unlike you - actually helping RISC-V are working on efficient countermeasures:
If it's not clear enough, I am not talking whatever Arduino strawman you came up with.
I am talking about RISC architectures such as RISC-V (which I mentioned by name), the sort that can scale all the way up to supercomputers and down to microcontrollers.
There is no justification for putting up with x86's complexity. This complexity has negative consequences such as vastly increased likeliness of micro-architectural bugs.
It is utter madness for many of its current popular uses, such as in servers handling personal data.
I'm getting annoyed with all of these yawning security holes in Intel's CPUs. I'm tempted to replace my Intel MacBook Pro with an Apple Silicon model sooner than I normally would.
I'm not sure what evidence there is to think that Apple's chips are any better. And that's not really a dig at Apple; these are just very complicated devices and especially with the optimizations that CPUs need to make to run today's software with acceptable performance, it can become very hard to foresee all possible attacks and vulnerabilities.
x86 architecture is now proven to be a minefield. It has too many instructions, being a CISC instruction set (something that was obsolete in the 90s but was keept for backward compatibility with older software). This means that to make efficient CPU the manufacturer has to do a ton of optimizations. These optimizations have resulted in most of the bugs we have seen.
Contrary ARM or other RISC instruction set are more simple. Being more simple they rely on compilers being more smart to generate optimized programs (something we have to this day). For this processors are more simple and straight forward, they don't need all the complexity of an x86 CPU. For this reason they are also more efficient since they don't waste resources doing useless operations.
Of course vulnerabilities can be everywhere (there were a few also on M1, tough not as impressive like these one, they were mostly stuff nearly impossible to exploit|). But recent times proved that particularly on x86 chips this is a real issue.
All high-performance processors have very complex optimizations. This has nothing to do with x86. The vulnerabilities found in other processors are similar in design and about as difficult to exploit.
I'm not trying to suggest that Apple's ARM chips are magically better because they're designed by Apple. Rather that the x86 architecture is obviously very long in the tooth and now would be a great time to switch to a far more modern architecture especially in light of this steady stream of vulnerabilities and defects that keep being found in x86.
I'm sure ARM64 isn't perfect but I've yet to learn of something as severe as this, or Spectre, or Meltdown.
Almost all of these exploits are due to the out-of-order/speculative execution.. which is incredibly complicated. There is no reason to believe that an out of order architecture that has not been hardened is any better at defense here than x86 - Just that as a minority architecture, it's still less profitable to target for exploitation. I have very little faith that the Apple ARM chips do better here without extensive exploit attempts made...
Speculative vulnerabilities have virtually nothing to do with the instruction set. Changing the language of the processor has little to do with how the processor works under the hood, especially for fundamental technologies like speculative execution.
Also, as a corollary, speaking the same language doesn’t mean every vulnerability is shared. AMD does not seem affected by Downfall, for instance.
That doesn't make any sense. You want to prep a fix for distribution ASAP, and release it so people can be protected from it as soon as possible.
Also many attacks are not really feasible to detect if it's happened - that's like trying to know if the mailman read your postcard.
You can develop executable scanning techniques, but people are a lot more concerned about preventing it in the first place than catching it in the act. Why would you leave people vulnerable so they can watch data get stolen - instead of just fixing the issue?
They probably concluded it wasn't possible in this case. From their FAQ:
[Q] Is there a way to detect Downfall attacks?
[A] It is not easy. Downfall execution looks mostly like benign applications. Theoretically, one could develop a detection system that uses hardware performance counters to detect abnormal behaviors like exessive cache misses. However, off-the-shelf Antivirus software cannot detect this attack.
Not all systems are equally exposed, and while there is no universal low level fix many systems can be redesigned at a higher level to reduce exposure.
In principle it seems like the chipmakers should hold all the cards when it comes to discovery: they are experts in speculative execution, know exactly how their chips work and have massive existing validation suites, simulators and internal machine-readable specifications for the low-level operations of these chips.
Outside researches need to reverse all this by probing a black box (plus a few much-worse-than-insider sources like patents).
Yet years after the initial disclosures it's still random individuals or groups who are discovering these? Perhaps pre-Spectre this attack vector wasn't even considered, but once the general mechanism was obvious did the chip-makers not simply set their biggest brains down and say "go through this with a fine-toothed comb looking for other Spectre attacks"?
Maybe they did and are well aware of all these attacks but to save face and performance hits they simply hold on to them hoping nobody makes them public?