Downfall Attacks

BeeOnRope · on Aug 8, 2023

What I find odd is that after the initial Spectre attacks, there have been a long string of these attacks discovered by outside researchers and then patched by the chipmakers.

In principle it seems like the chipmakers should hold all the cards when it comes to discovery: they are experts in speculative execution, know exactly how their chips work and have massive existing validation suites, simulators and internal machine-readable specifications for the low-level operations of these chips.

Outside researches need to reverse all this by probing a black box (plus a few much-worse-than-insider sources like patents).

Yet years after the initial disclosures it's still random individuals or groups who are discovering these? Perhaps pre-Spectre this attack vector wasn't even considered, but once the general mechanism was obvious did the chip-makers not simply set their biggest brains down and say "go through this with a fine-toothed comb looking for other Spectre attacks"?

Maybe they did and are well aware of all these attacks but to save face and performance hits they simply hold on to them hoping nobody makes them public?

_kuvn · on Aug 8, 2023

This could be a case of survivorship bias - we don't know how many spectre-like bugs did get patched, because they never made it to the public

BeeOnRope · on Aug 8, 2023

I considered this, but we have pretty good evidence that the chipmakers have not been busily secretly patching Spectre attacks:

1) Microcode updates are visible and Spectre fixes are hard to hide: most have performance impacts and most require coordination from the kernel to enable or make effective (which are visible for Linux). There have been a large number of microcode changes tied to published attacks and corresponding fixes, but no corresponding "mystery" updates and hidden kernel fixes to my knowledge which have a similar shape to Spectre fixes.

It's possible they could wait to try to bundle these fixes into a microcode update that arrives for another reason, but the performance impacts and kernel-side changes are harder to hide.

2) If this were the case, we'd expect independent researches to be at least in part re-discovering these attacks, rather than finding completely new ones. This would lead to a case where an attack was already resolved in a release microcode version. To my knowledge this hasn't really happened.

mabbo · on Aug 8, 2023

You don't need to patch the errors found during the initial R&D. The patch was made before the hardware was ever sold.

What's left are the attack vectors that Intel's greatest minds didn't discover.

BeeOnRope · on Aug 8, 2023

That's true, but it leads the odd assumption that the vendor managed to fix N side-channel attacks before release but 0 thereafter, while random individuals fixed M thereafter over a period of years with N >> M.

This seems to be much less likely than the conclusion that vendors are not in fact fixing many prior to the release and then stopping "cold turkey" after that. Especially since these attacks seem to cross chip versions, in many cases 6+ generations of chips: if vendors had substantial and increasing efforts on new chip versions they'd also be catching issues that applied to old released chips as well. We don't see that happening.

rzwitserloot · on Aug 9, 2023

This. Downfall affects CPUs released almost a decade ago. The argument of 'Intel caught a ton of these and downfall is the one that got away' doesn't add up. Surely Intel would find at least one issue during development of a new chip that has the property that it _also_ affects older chips. They can hardware-fix it before ever releasing their new chip, but unless they just decided to ignore the vulnerability in their older chips, they'd have starte work on a patch or at least on contacting OS vendors.

None of which is easy to do without the public at large figuring out what happend.

Conclusion: That is highly unlikely to have ever happened. Therefore, this isn't survivor bias and it really is bizarre that chip vendors (or, at least, Intel) doesn't look for this stuff or at least didn't find this.

AgentOrange1234 · on Aug 9, 2023

FWIW. Spectre/meltdown really caught the hardware development world off guard. Speculative execution was well-trodden ground and we thought it was fine. After these attacks, we had legacy designs that we needed to patch in a hurry, and the performance costs of not speculating were unfathomable. A lot of work went into mitigations like new kinds of barriers. But hardware designs are enormous and there is state lurking absolutely everywhere. It’s just a really hard problem.

kragen · on Aug 9, 2023

dan bernstein had pointed out the risk that speculative execution would lead to side-channel attacks years before, but there's a common pattern where security people point out a risk and then vendors dismiss it as impractical until it's demonstrated in practice

buildbot · on Aug 9, 2023

In computer engineering grad school, I mentioned to my PI that there was an upcoming hardware security flaw that I had heard about from … sources… and he quickly guessed it was speculative execution related without knowing anything more. I think people knew it was possibly dangerous, but the performance gains from speculative execution were huge enough nobody in Intel/AMD red-teamed the design basically.

kragen · on Aug 9, 2023

probably incentives are not well-aligned

i mean meltdown didn't exactly crater intel's sales

illiac786 · on Aug 14, 2023

To be fair, there's a lot of low quality security research out there, pointing at totally impractical attacks. But ok, these are not Dan Bernstein ;)

Panzer04 · on Aug 9, 2023

If a bug is not known, most of the incentives to the vendor are to not bother investigating, I suspect.

You could spend arbitrary amounts of time looking for these bugs and find nothing. Simpler and easier to offer a bounty or something and fix it then. If no one publicly finds the bug it doesn't matter to the mfg (and it wouldn't surprise me if there's truth to your supposition that they know about the bug but wait to fix until someone reports it - no public backlash so long as the bug is unknown)

Fixing bugs prior to release seems easy and free, though, especially since many more eyes would also be on the "new" in progress architecture, and proper hardware mitigations that don't cost a lot of performance can be made.

skolsuper · on Aug 9, 2023

What about the incentive to release "the most secure chips on the market", are you discounting that a bit too much?

Granted that human nature tends to mean these factors don't have a high enough weight, e.g. it's not the safest airplanes that sell the most, it's the cheapest ones that meet the regulations, and the regulations drive safety improvements, for the most part

Panzer04 · on Aug 9, 2023

I guess there's probably some margin in it - if both parties seem about equally vulnerable, there's not much lost. You could expend a lot of effort into security, but the nature of these bugs is still that they are fairly rare, often require pretty significant hurdles, etc. The mfg. could probably spend a lot more money and find a few extra bugs, but who knows if they would have turned into "real" exploits?

Remember that this particular bug isn't actually present on the newest chips either - and 12th/13th gen were shipping before Intel was informed of this bug - so it was fixed eventually, probably incidentally as a result of design changes.

The unknown factor is how much additional money you'd have to invest to gain additional security, given how esoteric many of these bugs are.

hinkley · on Aug 8, 2023

Or the problem could be with methodology, and the wrong people are in charge of the right people left, and so the mindset for testing is just wrong.

Also you’re dealing with a company that has been running to stand still for a long time. There’s been a lot of pressure to meet numbers that they simply cannot keep up with. At some point people cheat, unconsciously or consciously, to achieve the impossible.

BeeOnRope · on Aug 8, 2023

> Also you’re dealing with a company that has been running to stand still for a long time

I'm not just talking about Intel, but also Arm and AMD. As far as I know none of these has obviously been making proactive Spectre fixes.

api · on Aug 8, 2023

We also don't know how many are still out there unreported and part of the secret zero-day caches of various intelligence agencies.

thumbuddy · on Aug 11, 2023

There's probably thousands if not more. The way I always imagined this working was. agency and company work together to leave gaps in our hardware and software under the ruse of "it must be secure enough for our use." Agencies get a preemptive six months or so to find enough zero days to do what they want. The engineers at the company screaming about the issues are ignored.

Eventually a foreign adversary or domestic hacker finds one that can cause a lot of harm. As soon as they find one a DOD funded student simultaneously discovers it. Alternatively, if documents leak showing how these exploits could happen, same scenario.

Not to say all bugs are known, but I'd imagine a fair deal of them certainly are.

zamadatix · on Aug 8, 2023

The logic in this comment rubs me the wrong way. You could use the same train of thought to postulate programmers that have made 2 memory safety errors are nefarious instead of simply human.

When billions use something I expect them to find more problems, flaws, and exploits in it than the creator/manufacturer did. The presence of this does nothing to indicate (or refute) any further conclusion about why.

BeeOnRope · on Aug 8, 2023

I think the comparison between CPU and software exploits holds at a very high level, but in the case of software the gap between internal and external researches seems lower. Much software is open-source, in which case the play field is almost level and even closed source software is available in assembly which exposes the entire attack surface in a reasonably consumable form.

Software reverse-engineering is a hugely popular, fairly accessible field with good tools. Hardware not so much.

> When billions use something I expect them to find more problems, flaws, and exploits in it than the creator/manufacturer did. The presence of this does nothing to indicate (or refute) any further conclusion about why.

To be very clear none of these errors have been found by billions of random users but by a few interested third parties: many of them working as students with microscopic funding levels and no apparent inside information.

I'm not actually suggesting that the nefarious explanation holds: I'm genuinely curious.

zamadatix · on Aug 8, 2023

I'm not sure I agree significant gaps in the playing fields are really there. By significant I mean something that explains it should be e.g. 10x harder or something to the point it's supposed to be suspicious how the ratio is indicative of something off. Sure, you don't get to see how they laid out the transistors of the CPU but that's not how these attacks work it's by some oversight in memory handling not that different from software. They analyze how individual assembly instructions behave in certain scenarios vs how they are designed to behave. Compare this to the process of attacking closed source software like Windows, sift through a bunch of assembly in a debugger and see what gets left behind or compared incorrectly, and it's not glaringly different just because hardware is involved. Difficult, sure, but far from anything to suggest it should be uncommon. More importantly, by definition you don't really get to see all of the things they do catch. Maybe they are getting 90% of other related vulnerabilities with the patches but you only hear about the 10% that weren't covered by it because that's the only thing someone is going to publish/get a cve for/make the front page of HN.

The point isn't that billions of users all actively try to exploit software it's that if you have billions of users then even if 0.01% try to then that's still a hell of a lot more external bug finders than internal bug finders.

Yeah nothing against you or genuine curiosity it's just when a comment sets up a series of logic and concludes with a leading question then the conversation is damned to largely revolve around the leading question instead of genuine answers.

BeeOnRope · on Aug 8, 2023

I dabble in this space (hardware reverse-engineering) and write software for a living and in my opinion the gaps are huge.

I should disclose have been paid by a chip-maker for a blog post that I wrote which "disclosed" an optimization which could be uses for a side channel attack (though I did not even suggest that aspect) and which was subsequently patched away via a microcode update. The whole process was very surprising to me in that there must have been several people inside the chip-maker who knew about the optimization I described in much deeper detail ... after all they conceived and implemented it.

So by what path does a blog post mentioning it get treated as the disclosure that results it it being removed when they knew about it all along?

> that's not how these attacks work it's by some oversight in memory handling not that different from software.

I think it is very different. Assembly is merely a somewhat less convenient form of the original semantics that embeds all the relevant semantics related to the attack surface since the original source has been "erased". Many analysis tools such as fuzzers operate directly on assembly with little loss in functionality.

These attacks are against completely unspecified aspects of the instruction execution and lean heavily on the actual hardware implementation (almost at the level of "how the transistors are laid out") such as what hidden buffers are used, when they are filled, how they are shared with sibling threads, etc.

In my experience there are very few people interested in these details outside of the vendors themselves and these folks and the ones creating the exploits would fit in a modestly sized lecture hall. The scope has increased a bit lately (see Tavis's fuzzer work) but it was originally a small group with little or no funding.

zamadatix · on Aug 9, 2023

Do you have a link for the blog post? I'd love to read more about that.

Since we disagree in how big the gap is, and neither of us is going to get a satisfactory answer out of a chip maker any time soon, perhaps a different argument: there are plenty or microcode updates all the time, doing more than fix just security bugs. There are also security bugs like M1racles which have nothing to do with performance incentives. If these can all be explained by a lecture halls worth of people finding things most wouldn't post release of the chip then why does the same situation on security issues require unique explanation?

zamadatix · on Aug 10, 2023

Another example: https://news.ycombinator.com/item?id=37063459 but it gets 1/6 the visibility.

IgorPartola · on Aug 9, 2023

This reminds me of lock manufactures vs The Lockpicking Lawyer. He is able to pick nearly every damn lock out there, yet they have all the design resources and money to hire people like him to make better locks.

blowski · on Aug 9, 2023

I had a failed startup where we warned our customers they were using suppliers without sufficient qualifications, insurance, etc. Nobody wanted the product, even for free. At a system level, they chose to use cowboy tradespeople and cover the risk with "plausible deniability", because the market wouldn't pay them to only use quality.

It's like a Gresham's law - eventually the lowest quality dominates the market, because that's what maximises profit.

mike_hearn · on Aug 9, 2023

One possibility nobody mentioned yet: the chip vendors don't invest a ton of time looking for them because they don't actually matter that much.

Bear in mind, security researchers are incentivized to find things to build their reputation. It's very often the case that they claim something is a world-shaking security vulnerability when in reality it doesn't matter much for real world attackers. Has anyone ever found a speculation attack in the wild? I think the answer might be no. In which case, why would chip vendors invest tons of money into this? Real customers aren't being hurt by it except in the sense that when an external researcher forces action, they're made to release new microcode that slows things down. Note how all their mitigations for these attacks always have off switches: not something you usually see in security fixes. It's because in many, many cases, these attacks just don't matter. All software running on the same physical core or even the same physical CPU is either running at the same trust level, or sandboxed so heavily it can't mount the attack.

dijit · on Aug 9, 2023

> Has anyone ever found a speculation attack in the wild? I think the answer might be no.

this is known as the Y2K paradox.

The Y2K bug had the potential to be very dangerous, but due to a wide-reaching campaign and tonnes of investment in prevention, when the millennium came, it did so with very few issues (though there were still some); leading many to speculate that the issues were overblown

flerchin · on Aug 9, 2023

It's also the anti-tiger rock.

Tigers exist, and can kill you, so you'd better hold this rock at all times.

dijit · on Aug 9, 2023

Yeah, but we have practically usable variants of meltdown and spectre.

the only reason its not useful to deploy them is due to a massive amount of herd immunity (and the fact you consume a lot of CPU when trying the attack making it clear - bad combo for an attacker)

yencabulator · on Aug 16, 2023

> Note how all their mitigations for these attacks always have off switches: not something you usually see in security fixes. It's because in many, many cases, these attacks just don't matter.

They have off switches because they can have severe performance costs, which most security fixes don't have.

> All software running on the same physical core or even the same physical CPU is either running at the same trust level, or sandboxed so heavily it can't mount the attack.

Former is simply not true. For example, EC2 avoids mixing tenants on the same core, but even the AWS serverless stuff doesn't do that, for cost reasons. For most end user computers, this is blatantly not true.

Operating system level sandboxing is largely irrelevant if the attacker can simply read secrets from other processes running on the same core at the CPU level. Most processes will have a way to smuggle the stolen goods out.

mike_hearn · on Aug 17, 2023

By "sandboxed so heavily" I meant browsers, which don't allow shared memory multi-threading, tight control over CPU instructions or high resolution timers, so it's very hard to mount specex attacks there. I've seen claims it can be done but very few demos, and the only demo I remember trying didn't actually work.

By "running on the same physical core" I meant simultaneously. You can usually wipe uarch state when switching between tenants pretty well. If AWS aren't doing this then that's something for them to solve but I'm pretty sure major cloud users don't have to worry about these attacks as so many only affect hyperthreads that are running concurrently and it's not cost prohibitive to avoid mixing tenants that way.

l33t7332273 · on Aug 17, 2023

Software running on the same CPU as the kernel is not necessarily at the same trust level.

toyg · on Aug 8, 2023

Maybe they're simply victims of Kernighan's Law of Debugging: "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"

There is no doubt that Intel make chips "as clever as they can". Hence, by definition, they can't fully debug them.

djbusby · on Aug 9, 2023

If debugging is what we call it when fixing things, does that mean we're "bugging" when we make it?

yaris · on Aug 9, 2023

Indeed! "Every program can be reduced to a single line of code, which will have a bug". For hardware this also holds, I believe.

Hunpeter · on Aug 13, 2023

"If debugging is the process of removing bugs, then programming must be the process of putting them in" - Edsger W. Dijkstra

weebull · on Aug 9, 2023

As a CPU engineer, I can say that spectre highlighted channels of information escape that weren't previously considered as vulnerable. That's why it kicked off a new batch of exploits. There was a new idea at the heart of it and others built on that idea.

It's also important to say that these are not bugs. The design is behaving as intentioned. That the performance differs based off the previous code that the CPU has executed was understood and deameed acceptable because the cost of the alternative was considered too high (either in power, performance or area) than the alternative. That's what it means to be an engineer. You weigh up alternatives and make a choice.

In this case, CPUs became fast enough that the fractional part of a bit per iteration became high enough bandwidth to be exploitable, but it needed someone to demonstrate it for it to be understood in the industry. That changed the engineering decision.

bsder · on Aug 8, 2023

The problem is that nobody wants to admit that the old, stodgy mainframe guys were right 30 years ago and that sharing anything always results in an exfiltration surface.

Nobody wants to be the first to take proper steps because they have to either:

1) Partition hardware properly so that users are genuinely isolated. This costs silicon area that nobody wants to pay for.

2) Stop all the speculative bullshit. This throws performance into rewind and will put chip performance back a decade or two that nobody wants to pay for.

Until people are willing to value security enough to put real money behind it, this will continue ad nauseam.

tracker1 · on Aug 8, 2023

I'd add that the status quo has done pretty well, and many of these exploits are fixed. It's also worth noting that a lot of the exploits in question may be known, but the people working on them couldn't theorize a practical exploit. How many web browser sandbox breaches have there been over the years? Far less than the CPU exploits in the past several years. The latter can have a much bigger impact though.

The biggest risk target seems to be shared servers, and you often don't know who you're sharing with, so is it worth trying? It seems to be usually, no... in a specific target, maybe.

saagarjha · on Aug 9, 2023

Web browser sandboxes are broken all the time using conventional bugs.

pnut · on Aug 9, 2023

Bad news, 1993 was 30 years ago. Maybe 50+ years ago at this point.

bsder · on Aug 9, 2023

No, 1993 is about right. The mainframe guys were screaming their heads off as these dumbass, insecure, non-ECC x86 processors were eating up more and more computing--before that the mainframe guys kind of dismissed x86 as simply toys.

hackermatic · on Aug 8, 2023

A fundamental problem is that the attack surface is so, so huge. Even if their security researchers are doing blue-sky research on both very small and very broad areas of processor functionality, they're going to miss a lot.

And in line with that and

>Maybe they _did_ and are well aware of all these attacks but to save face and performance hits they simply hold on to them hoping nobody makes them public?

... maybe they have patched a number of issues and just never announced them.

BeeOnRope · on Aug 8, 2023

> A fundamental problem is that the attack surface is so, so huge. Even if their security researchers are doing blue-sky research on both very small and very broad areas of processor functionality, they're going to miss a lot.

Sure. If they had patched a bunch of Spectre vulnerabilities and independent researchers had discovered a few more that would be one thing, but as far as I can tell they have patched _zero_ while independent researches have found many and it has been years since the initial attack. Many of these follow very similar patterns and "in what cases is protected data exposed via speculative execution" is something that an architect or engineer could definitely assess.

yencabulator · on Aug 16, 2023

Generally all these workarounds have a measurable slowdown associated with them. This mitigation apparently has an up to 50% cost. It's unlikely many of them have been silently fixed without people noticing.

whoisthemachine · on Aug 9, 2023

It's the same as any product, the product team wants a faster, cheaper product, yesterday. Security and trust is secondary, because if you're lucky enough, that will fall on the next product team.

Beyond that, processors contain billions (or trillions?) of possible outcomes from a set of inputs. Testing for all of these just to verify reliability and stamp out logic bugs is really hard due to the combinatorial explosion. Putting security testing on top just complicates matter further. The best they can probably do is map out potential ways in which their general purpose processors could be used for specific nefarious uses.

dougall · on Aug 9, 2023

Yeah... I don't know if you saw Rodrigo Branco's damning "The Microarchitectures That I Saw And The Ones That I Hope To One Day See":

https://www.youtube.com/watch?v=WlcQrx7VK00 https://hardwear.io/usa-2023/presentation/the-microarchitect...

But it definitely seems to be a culture/disclosure problem.

(Also, hi - hope things are going well! We miss you on Mastodon)

yyyk · on Aug 8, 2023

The chipmakers don't have an incentive to look too hard for speculation security issues beyond a bit of PR. If they succeed, they lose money and marketshare, while their 'insecure' opponents gain and at most patches later. And in fairness, a lot of these bugs are rather theoretical. Until buyers take these bugs much more seriously, this isn't going to change.

peddling-brink · on Aug 9, 2023

Clouds take these vulns seriously, and have a lot to lose, and have deep wallets. I'd be surprised if this topic didn't come up when large purchases are discussed.

Not that there are many alternatives..

ComodoHacker · on Aug 9, 2023

And this flaw was found by researcher working for a cloud vendor. I wouldn't be surprised if they already have some special long-term agreements.

CanaryLayout · on Aug 9, 2023

They have workarounds. If you prevent multi-tenant from sharing their threads on the same core, that eliminates the most desirable goal of an attacker.

However it does not eliminate the vulnerability within a single tenants own threads.

You also have to think about all the web shops that are out there that are just running proxmox or a cloud reseller who reintroduces the vulnerability in their multicore VPS setup

yyyk · on Aug 9, 2023

Big clouds have the exact same incentive issue - if a customer is really paranoid, the customer can pay the cloud extra and ensure their own exclusive infra. For regular users, the clouds can mitigate a bit at scale (but not care much about this in practice, it's not as important as cheaper faster processors).

justinator · on Aug 8, 2023

>In principle it seems like the chipmakers should hold all the cards when it comes to discovery: they are experts in speculative execution, know exactly how their chips work and have massive existing validation suites, simulators and internal machine-readable specifications for the low-level operations of these chips.

I hope you're not in charge of hiring QA. Bugs are often found by people who AREN'T thinking like the developers whichstart wearing blinders on how they're stuff should work and stop trying stupid things.

BeeOnRope · on Aug 8, 2023

I'm not directly in charge of hiring QA, no!

I think this sort of excuses the initial blindness to Spectre style attacks in the first place, but once the basic pattern was clear it doesn't excuse the subsequent lack of discovering any of the subsequent issues.

It is as if someone found a bug by examining the assembly language of your process which was caused by unsafe inlined `strcpy` calls (though they could not see the source, so they had no idea strcpy was the problem), and then over a the subsequent 6 years other people slowly found more strcpy issues in your application using brute-force black-box engineering and meanwhile you never just grepped your (closed) source for strcpy uses and audited them or used many of the compiler or runtime mitigations against this issue.

justinator · on Aug 8, 2023

That still seems like a job for QA and you do have a good point. If there's an attack using one technique, there should be an audit to make sure that technique can't be modified to be used in other nasty ways, then have those tests part of the test suite.

margalabargala · on Aug 8, 2023

> I hope you're not in charge of hiring QA.

This seems unnecessarily harsh. The whole post would be improved by removing that sentence IMO.

justinator · on Aug 8, 2023

I'm in the camp that developers shouldn't be responsible for having a transcendental ability to predict future (example) security holes. If one appears, it's really the QA's job to document it and experiment with other similar vectors of attack. The developers are, you know: developing. Bring them back in when the security issues needs to be fixed, once it's found just how big the problem is.

BeeOnRope · on Aug 8, 2023

Shouldn't the functions of "development" and "QA" both reside under the umbrella of the chip-maker though?

In fact, chip-makers famously invest an insane amount of money into "QA" (aka "validation") and many features or lack thereof are often put down to the cost of QA rather than the cost of development.

justinator · on Aug 9, 2023

>Shouldn't the functions of "development" and "QA" both reside under the umbrella of the chip-maker though?

Sure, umbrella, but different teams. What I'm trying to get across is that it's a good thing that the developers are not wearing the QA hat, so that the QA people are thinking in ways that aren't exactly parallel to the developers. You get close to your work, you sometimes forget that you're not look at the larger picture.

Think of it like if I was building a big secure wall in front of my house, but I neglect that it's easy to just walk to the opposite side to get in. Oh, whoops, I was too focused on the solution in front of me.

Just fundamentals is all.

IIsi50MHz · on Aug 9, 2023

Possibly. When I worked at a place that designed a "simple" chip that was just a more energy efficient and very parallel version of its FPGA, the fab that was contracted to make it insisted on validating it themselves. It consisted primarily of a lot of copy-paste of the primary logic to build as many paths as would fit on the die. They described it as "unusually dense" and, I heard they later said that they'd never seen a design that dense. The validation process was partly described as people manually driving a car through a 3D model, making sure there were no unexpected junctions or other divergences. This process took months longer than their initial estimate, allegedly due to the density.

I guess you could say this was "under the umbrella of the chip-maker", though we had little say in it aside from pressing them for progress as our final product's shipdates came and left. When we finally got the first samples, power consumption was, I think, an order of magnitude higher than expected. Our lead engineer struggled to get it down without going to a smaller process that we could barely afford (and given the delays already, could probably not have afforded to wait for). We thus thought we had working logic, but our case designs were scuttled. After enlarging the cases to accommodate extra cooling, our base unit was more than ten times taller, and our next size up, while the same height, was three times longer. Highest units had a water cooling system[1].

Our QA was able to find other sorts of flaws, like misprinted unpopulated circuit boards, software faults on the host, or when we received shoddy interlink cables that either melted under test[2] or other cables that scrambled communications[3].

All of which is to say, that many other pressing issues can interfere with doing what you feel you ought to be doing. At least at our scale. I can't speak for the likes of big guys like Intel or AMD, but it's possible that unfound faults or known unpatched flaws can ship because resources were committed elsewhere or fabrication leadtimes preclude waiting. This is not to say that shipping a security flaw is okay, but rather that sometimes you think you've done you're due diligence, or sometimes your choices seem to be "Ship, ship late, or never ship.". The answer you pick can be existential, so you hope you've picked the least bad option.

[1] Misbegotten, because "beauty of the promo images".

[2] Conductors much thinner than spec, not initially observed because both ends were fitted with moulded plugs.

[3] Longer than spec, initially recieved with enthusiasm by assembly staff, before an engineer investigating a difficulty saw them and exclaimed, "No-no-no! That's longer than I am tall! Stray capacitance alone will kill the communication.". (He uncharacteristic'ly exagerated here. While they were three times longer than expected, this was at most .40 times his height.)

mekoka · on Aug 9, 2023

> I hope you're not in charge of hiring QA.

Why the personal attack? The logic is sound. Isn't QA typically part of the same organization?

specialist · on Aug 9, 2023

Spot on.

Generalizing: QA/Test and dev people just think differently.

I served as QA manager for a while. I was fortunate to have worked with some really, really good QA/Test people. They're more rare than good devs.

Some individuals can do well in both worlds.

I've had great devs who were pretty good at test. It seems to me like these devs came from outside of software and CS. Like from aerospace or ballet or history.

I've mentored QA/Test people so they could better automate and manage their work. Then they could pick up tasks like CI/CD, testing scripts, scrub data, etc.

But, in my experience, devs are bad at testing, and just terrible at QA.

Getting enough QA/Test support on a team in the 90s was a tough sell, even though everyone gave lip service to quality.

It's been a long time since I've worked with an actual Test person, dedicated to that role. And that was just 1 person vs 8 devs. So ridiculous.

These days, any kind of "test" is done by "business analysts", whatever that means. And I can't recall the last QA person I've worked with (since my stint as QA Manager in the '90s).

FWIW, I wholly agree with Dan Luu's observations about today's QA/Test standard practice in software vs hardware.

netheril96 · on Aug 9, 2023

No QA is able to find out a Spectre vulnerability. QA is irrelevant in this conversation.

CTDOCodebases · on Aug 8, 2023

And maybe just maybe when the Snowden revelations started to come out some people woke up and realised that the companies who design the processors used in the vast majority of computers are from the US.

https://www.theverge.com/2013/12/20/5231006/nsa-paid-10-mill...

CanaryLayout · on Aug 8, 2023

Since Applied Cryptograhy and everyday since publication it's been well-known and well-understood that NSA's never-ending efforts to weaken systems to make it easier for them to do their work wrecks havoc and costs billions to everyone else. Luckily their attempt to get everyone to adopt a PRNG that was broken on-purpose was thwarted.

But who's to say that any chipmaker gets bribed to backdoor their design to allow the reading of any page of RAM from any protection level doesn't happen? It would make their job super easy.

CTDOCodebases · on Aug 9, 2023

Some people might say “don’t attribute to malice what can be attributed to incompetence” but introducing bugs via bribing an insider or even getting one of your people a job at Intel or AMD would be a very clever way to give yourself the keys to (nearly) all the castles.

Just don’t forget to patch the microcode on your own systems.

mnw21cam · on Aug 9, 2023

It wreaks havoc. I think wrecking havoc is probably the opposite to what you meant.

raggi · on Aug 8, 2023

The intersection of probabilistic optimization and timing based side channels is a gift that will never ever fully go away.

Everything _really fast_ which is approximately any very mature systems gear has probabilistic optimization in it now, and that's where a great deal of modern performance comes from.

Even thermals and power draws produce side channels. Eradicating every side channel is untenable, research is required to understand what side channels are tenable, then we have to patch them up as best we can.

I'm waiting for the one where we find that binning is involved in an integrated way, and that some arbitrary sub-population of popular chips turn out to much more exploitable than others due to particular paths being disabled and leaking extra timing info. That'll be a really fun and awful day.

z3t4 · on Aug 8, 2023

You are describing a creator bias, there are probably a better name, where you think the ones who created something knows it best. For example, you could create a programming language, a game, or anything, and you think that you know it better then someone who use it for several hours every day. The larger the user base the less likely you are better or know it better then all users.

weebull · on Aug 9, 2023

The fans know Star Wars far better than George Lucas.

adgjlsfhk1 · on Aug 8, 2023

I think you're last statement is half right. They probably don't bother looking for them that hard because if they look for them they might find them and then have to make their CPUs slower before launch.

cjbprime · on Aug 9, 2023

I think there are simply very few humans performing competent vulnerability research (proactive discovery, not reactive patching) in public.

thayne · on Aug 9, 2023

> Maybe they did and are well aware of all these attacks but to save face and performance hits they simply hold on to them hoping nobody makes them public?

Or they chose not to look for these types of bugs in the first place, for those reasons.

chrsw · on Aug 8, 2023

There's more money to be made in performance and efficiency than security. If chipmakers could design and build the perfect processor, they would. But like anything else complex, there are compromises everywhere.

BeeOnRope · on Aug 9, 2023

Granted but the vendors accept these are serious problems given that they are immediately patched and the mitigation all enabled by default even at significant performance cost (most chip generations are down double digit perf % based versus "zero mitigations" at this int).

So they don't need to build the "perfect processor" but why aren't they discovering any of these issues themselves?

TZubiri · on Aug 8, 2023

It's probably that they are aware of several vulnerabilities, such is life. But they are unable to prioritize them and assess which to solve first, they need external auditors using black box techniques to help them identify which are exploitable by external attackers, so they can fix that one, and not the other non issues

DropInIn · on Aug 8, 2023

Tinfoil hat time:

They know at release about many of these bugs but between incentivizing upgrades and selling the resultant bugs as backdoor to 'three letter agencies' there's simply too much money to be made by not disclosing/patching the problems before third parties release discovery of the problems.

mad · on Aug 9, 2023

Company politics? According to this tweet, parts of Intel did know about this attack: https://twitter.com/bsdaemon/status/1688978152201015301

quickthrower2 · on Aug 8, 2023

My guess: There are a lot more researchers on the outside than inside. And also incentives - you would need a red team at Intel who tries to attack their own chips, and those people would be even fewer. And the best people may want to stay independent.

Method-X · on Aug 8, 2023

Like how car companies will do a cost/benefit analysis to see if a recall is worth it.

david-gpu · on Aug 8, 2023

Having worked in the industry, my gut feeling is that chipmakers don't invest all that much in looking for and preventing these sorts of attacks.

When working on a new feature, you are desperately trying to deliver on time something that adds value in the sorts of scenarios that it was designed for. And that is already hard enough.

pjmlp · on Aug 9, 2023

The industry as a whole is yet to have the same liability laws as the car industry.

When it comes, and it will come, that will change.

rvba · on Aug 8, 2023

Maybe Intel cut corners wherever possible - so their processors are few percent faster due to each bug, but much less safe.

But hey they were faster than AMD!

CanaryLayout · on Aug 8, 2023

shoulda bought the threadripper the intern recommended than buying that vulnshit your ISV recommended to you

IshKebab · on Aug 9, 2023

As I understand it this isn't really a Spectre style bug, it's just a straight up bug. It's pretty much the hardware equivalent of use-after-free.

Simple to fix with a microcode update. No serious performance implications.

efficax · on Aug 8, 2023

an easier explanation is that modern chipsets are incredibly complex, side channel attacks like these are hard to reason about and traditionally, processors themselves have not been the target of these kinds of attacks, so i don’t think the engineers working on them are accustomed to thinking about them as attack surfaces in this way.

CanaryLayout · on Aug 8, 2023

You didn't have to think about this when you ran your code on your own chips; so it's your fault for backdooring the front end into the datacenter.

But now we have the 21st-century mainframe we like to call the cloud, where everything is shared. So I upload a container image with the vampire vuln intending to read all the activity on the host. Other customers jobs, the OS itself, even steal internal keys used at Amazon.

The motivation to do this kind of attack now is incalculable.

CanaryLayout · on Aug 9, 2023

I kid. The entire Zen class of AMD processors are vuln to Inception so you're not safe anywhere.

There is the "just use OS/2" strategy of the 1990s (referring to the period from 1995-2002 where you could sit happy in your niche OS that researchers and BHs were not targeting): in this case s390x probably has the same problem but no one is paying attention to it (other than a 3 letter agency, of course), so the chances of a payload reaching that arch are very close to zero.

jgalt212 · on Aug 8, 2023

> Outside researches need to reverse all this by probing a black box

The kids are calling this activity "prompt engineering".

radium3d · on Aug 9, 2023

Do "they know exactly how their chips work"?

throwawaylinux · on Aug 9, 2023

There are security teams and/or methodologies inside all major CPU designers today that look at speculation and other side channels. Although these might still not be up to quite the kind of rigor that you see in traditional verification. That is to say, their unit tests and fuzzers and formal analysis and proving methodologies are all well set up to verify that architectural results are correct, my guess is that they don't all verify intermediate results or side effects can't be observed by different privilege contexts.

In many ways it is a much more difficult problem too. Going back to first principles, what execution in one context could have an affect on a subsequent context? The answer is just about everything. If you really wanted to be sure you couldn't leak information between contexts, you could only ever run one privilege level on the machine at any time. Not just core, entire machine. When switching contexts, you would have to stop everything, stop all memory and IO traffic, flush all caches and queues, and idle all silicon until all temperatures had reached ambient, voltages, fans, frequency scaling had settled. Then you could start your next program. Even then there's probably things you've forgotten -- programs can access DRAM in certain ways to cause bitflips in adjacent cells for example, so you've probably got a bunch of persistent and unfixable problems there to if you're paranoid. That's not even starting on any of the OS, hypervisor or firmware state that the attacking program might have influenced. So the real answer is that you simply can't share any silicon, software, or wires whatsoever between different trust domains if you are totally paranoid.

All of these things are well known about, but at some point you make your best estimation of whether something could realistically be exploited and that's very hard to actually prove one way or another. Multiply by all possible channels and techniques.

That's probably why you see a side channel vulnerability discovered every month by outsiders, but very few architectural defects (Pentium FDIV type bugs).

That said, this issue looks like a clear miss by their security process. Supplying data to an untrusted context, even if it can only be used speculatively, is clearly outside what is acceptable, and it is one of the things that could be discovered by an analysis of the pipeline.

Contrast with the recent AMD branch prediction vulnerability, which could plausibly fall under teh category of it being a known risk but was not thought to be realistically exploitable.

As others have said though, everyone makes mistakes, every CPU and program and every engineering project has bugs and mistakes. I don't know if you can deduce much about a CPU design company's internal process from looking at things like this.

acheong08 · on Aug 9, 2023

NSA Backdoor \s

mike_hearn · on Aug 8, 2023

The Intel paper link is dead, this seems to be the right one:

https://www.intel.com/content/www/us/en/developer/articles/t...

General caveats: are there many clouds that still run workloads from different users on the same physical core? I thought most had changed their schedulers years ago so you can't get cross-domain leaks between hyperthreads anymore. Claiming that it affects all users on the internet seems like a massive over-exaggeration, as he hasn't demonstrated any kind of browser based exploit and even if such a thing did exist, it'd affect only a tiny minority of targeted users, as AFAIK many years after the introduction of Spectre nobody has ever found a specex attack in the wild (or have they?)

I think the more interesting thing here is that it continues the long run of speculation bugs that always seem to be patchable in microcode. When this stuff first appeared there was the looming fear that we'd have to be regularly junking and replacing the physical chips en masse, but has that ever been necessary? AFAIK all of the bugs could be addressed via a mix of software and microcode changes, sometimes at the cost of some performance in some cases. But there's never been a bug that needed new physical silicon (except for the early versions of AMD SEV, which were genuinely jailbroken in unpatchable ways).

kiririn · on Aug 8, 2023

>are there many clouds that still run workloads from different users on the same physical core?

There are a vast number of VPS providers out there that aren’t AWS/GCP/Azure/etc where the answer is yes. Even the ones that sell ‘dedicated’ cores, which really just means unmetered cpu

H8crilA · on Aug 8, 2023

What about burstable instances on AWS, and whatever is the equivalent in other clouds? Hard to imagine those having a dedicated core, would probably defeat the purpose.

schlarpc · on Aug 8, 2023

https://docs.aws.amazon.com/whitepapers/latest/security-desi....

watermelon0 · on Aug 8, 2023

Not just burstable instances.

AWS Fargate, container as a service, allows specifying 0.25 or 0.5 CPU, and I would be surprised if those weren't shared.

Same probably? also applies to AWS Lambda.

tracker1 · on Aug 8, 2023

My guess is there's likely less value in trying to target those kinds of environments... Just poking random data out of lambda or low end vps neighbors is a needle in a haystack the size of the moon in terms of finding anything useful.

It's more likely useful as part of a group of exploits to hit an individual, targeted system.

TrapLord_Rhodo · on Aug 9, 2023

>Just poking random data out of lambda or low end vps neighbors is a needle in a haystack the size of the moon in terms of finding anything useful.

LLM's might change that. getting a firehouse of data and asking it to classify controls against NIST 53- rev5 produces interesting results.

c2h5oh · on Aug 8, 2023

Worse: by default it's not CPU but VCPU - a single core with multithreading = 2 VCPU

ajross · on Aug 8, 2023

Per the paper, this looks like an attack against speculated instructions that modify the store forward buffer. The details aren't super clear, but that seems extremely unlikely to survive a context switch. In practice this is probably only an attack against hyperthread code running simultaneously on the same CPU, which I'd expect cloud hosts to have eliminated long ago.

acdha · on Aug 9, 2023

Yeah, the way AWS has stock language like “AWS has designed and implemented its infrastructure with protections against this class of issues” supports the idea that you’re probably not getting anywhere with exploits of this class on a major cloud host any more.

winternewt · on Aug 8, 2023

The Spectre attack had to be patched in the kernel in a way that significantly slowed down execution on Intel CPU:s: https://www.notebookcheck.net/Spectre-v2-mitigation-wreaks-h...

dmatech · on Aug 9, 2023

What's interesting is that the FDIV bug from 1994 could also be worked around, but Intel recalled and wrote off those processors[1]. For their latest several problems, their response was more of a "sucks to be you". While they provided microcode updates and worked with OS vendors, there were performance impacts that materially affected the value of the chips.

1. https://www.intel.com/content/www/us/en/history/history-1994...

phire · on Aug 9, 2023

The software workaround for the FDIV bug required the actual userspace software to be modified and recompiled. There was a decade of pre-existing software out there that would be hard to fix, especially in the days before most people had the internet.

There was nothing the OS could do to work around the bug, short of disabling the entire FPU and falling back to expensive software emulation of all floating point math.

The workarounds for all these speculation bugs can be mostly applied at the operating system and/or microcode level, and is comparably cheap.

mike_hearn · on Aug 8, 2023

Yes, I think that's what I said? Every attack no matter how deep it seemed to be has been patchable in microcode, sometimes at a cost in performance. But so far nobody had to toss the physical silicon, at least not with Intel. The malleability of these chips is quite fascinating.

sigotirandolas · on Aug 8, 2023

On some Skylake CPUs to get full mitigations you are taking a 30% performance penalty _and_ you need to disable SMT which is often another double digit penalty. It's not a literal "toss the physical silicon", but it's getting there.

In fact my thesis is that it's never a literal "toss the physical silicon" because the kernel is able to take control when switching between tasks so (with help from the microcode) it's able to wipe all potential speculation vectors (at an arbitrarily expensive cost) before switching to the next task. This also explains why SMT is unfixably broken on some processors, since by design the kernel does not intervene in the task switching on the virtual cores.

rocqua · on Aug 8, 2023

The meltdown and spectre mitigations weren't patched in microcode, they were patched by changing the internal calling methods of the kernel.

mike_hearn · on Aug 8, 2023

I think it was both. There were initial software only patches, but those were quickly superseded by microcode+kernel patches, where microcode added features the kernel enabled. IBRS or something like that.

cbsmith · on Aug 8, 2023

It was both, and none of them completely addressed the problem. They did need new hardware for that.

vladvasiliu · on Aug 8, 2023

> General caveats: are there many clouds that still run workloads from different users on the same physical core? I thought most had changed their schedulers years ago so you can't get cross-domain leaks between hyperthreads anymore.

Isn't this the whole point of AWS' t instances? It's my understanding that they are "shared" at the core level, or else there wouldn't be a reason for the CPU credit balance thing.

BeeOnRope · on Aug 8, 2023

They are definitely time-sliced among tenants and very possibly two tenants may run at the same time on two hardware threads on the same core: but you could have a viable burstable instance with time-slicing alone.

BeeOnRope · on Aug 8, 2023

Nevermind, AWS explicitly documents that all instance types, including burstable, never co-locate different tenants on the same physical core at the same time:

https://docs.aws.amazon.com/whitepapers/latest/security-desi...

anarazel · on Aug 8, 2023

It only seems to document that there's group scheduling for SMT cores. But that doesn't prevent issues due to switching between customers on the same physical core, no?

"It is possible, however, for two burstable performance EC2 instances to run sequentially (not simultaneously) on the same core. It is also possible for physical memory pages to be reused, remapped, and swapped in and out as virtual memory pages. However, even burstable instances never share the same core at the same time, and virtual memory pages are never shared across instances. "

I only started reading the paper - so I very well might be wrong here - but it doesn't look to me like you need victim/attacker to be scheduled simultaneously on two SMT threads, but that a single core sequentially executing victim / attacker code would be vulnerable. It's possible that the cross-customer "context switch" is larger than the the vulnerable window, but I'd not want to bet on it.

BeeOnRope · on Aug 9, 2023

> But that doesn't prevent issues due to switching between customers on the same physical core, no?

Yes they are explicit that customers may be time-shared on a physical core ("burstable" instances don't really make sense without that). Most of these attacks aren't known to be possible in that scenario and in any case the mitigations are much easier since flushing sensitive state at group scheduling boundaries is much less costly than permanent dynamic changes to how concurrent SMT threads interact.

anarazel · on Aug 9, 2023

Yes, it's certainly easier to mitigate at a boundary that's already as costly as switching between VMs.

The paper documents that disabling SMT does not entirely mitigate the problem (In 9.1). They briefly mention trying instructions to avoid the microarchitectural leaks, but don't go into more detail than mentioning verw isn't sufficient.

They state that a switch to/from SGX, with SMT disabled, doesn't prevent the attacks. See 8.1. That's not the same as a cross-vm switch, but it's certainly interesting that the attempts at flushing microarchitectural state when exiting SGX don't provide protection.

CanaryLayout · on Aug 9, 2023

Think of all the cloud resellers that are out there who really aren't segregating their tenants out or it's just a web shop with proxmox who recombined their own customers onto a core even though the cloud provider specifically segregated it

pwarner · on Aug 8, 2023

I think most if not all cloud VMs dedicate a core to you. Well, there are some that share like the T series on AWS and I think other clouds have similar, but my bet is they can put in an extra "flush" between users to prevent cross tenant leakage.

Of course cross process leakage for a single tenant is an issue, in cloud or on prem, and folks will have to decide how much they trust the processes on their machine to not become evil...

CanaryLayout · on Aug 9, 2023

This concern I also share and it's probably worth converting into layman's terms so that all computer users understand what it is. Basically the job scheduler Behavior in the OS needs to surface to the user with understandable language they can read so they can make the trade-off decision.

calibas · on Aug 8, 2023

> Claiming that it affects all users on the internet seems like a massive over-exaggeration, as he hasn't demonstrated any kind of browser based exploit and even if such a thing did exist

He's saying it likely affects "everyone on the Internet" because most servers are vulnerable.

Dylan16807 · on Aug 8, 2023

Most servers being vulnerable to a local attack is generally pretty boring news.

CanaryLayout · on Aug 9, 2023

If you throw in the same vulnerability that AMD has with the list from Intel I think it pretty much covers every server available for rent at Quadra.

amarshall · on Aug 8, 2023

> same physical core…between hyperthreads

These are not the same thing. Afaik, most “vCPU” are hyperthreads, not physical cores.

> I thought most had changed their schedulers years ago so you can't get cross-domain leaks between hyperthreads anymore

It would be great to have a source on this.

BeeOnRope · on Aug 8, 2023

> These are not the same thing. Afaik, most “vCPU” are hyperthreads, not physical cores.

OP didn't say otherwise. They are saying that public clouds do not let work from different tenants run on the same physical core (on different hyperthreads) at the same time.

This doesn't prevent you from selling 1 hyperthread as 1 vCPU, it just means there are some scheduling restrictions and your smallest instance type will probably have 2 vCPUs if you have SMT-2 hardware (and that's exactly what you see on AWS outside of the burstable instance types).

tedunangst · on Aug 8, 2023

Does Digital Ocean count as a major cloud player?

CanaryLayout · on Aug 9, 2023

Yes. And Linode. And Quadra. And OVH.

A lot of people on YC are enterprisey-brained and only think there are 3 possible clouds, and then there is the rest of the planet who can't afford to park their cash at AWS and set it on fire.

Negitivefrags · on Aug 8, 2023

Once again it seems clear that running code from two security domains on the same physical processor cores is just not possible to get right, and we should probably just stop doing it.

There are really only two common cases for this anyway. VMs and JavaScript.

For VMs we just need to give up on it. Dedicate specific cores to specific VMs or at least customers.

For JavaScript it’s a bit harder.

Either way, we need to not be giving up performance for the the normal case.

marssaxman · on Aug 8, 2023

> For JavaScript it’s a bit harder.

"We should probably just stop doing it" works for me.

throwaway892238 · on Aug 8, 2023

Agreed. Browsers are now nothing but an application platform of APIs (https://developer.mozilla.org/en-US/docs/Web/API). For some reason they still retain the vestigial HTML, CSS and JS, but really all you need is bytecode that calls an ABI, and a widget toolkit that talks to a rendering API. Then we can finally ship apps to users without the shackles of how a browser wants to interpret and render some markup.

The idea of security "sandboxes" is quaint, but have been defeated pretty much since their inception. And the only reason we have "frontend developers" rather than just "developers" is HTML/CSS/JS/DOM/etc is a byzantine relic we refuse to let go of. Just let us deliver regular-old apps to users and control how they're displayed in regular programming languages. Let users can find any app in any online marketplace based on open standards.

eternityforest · on Aug 9, 2023

HTML/CSS is one of the easiest way to develop GUIs, one of the most visually flexible, and one of very few things this side of ncurses that runs everywhere. Actually, without hacks, there are probably more end user devices with a browser than a command line.

It's also highly standardized. Regular programming languages have dozens of GUI toolkits, or at least one per platform.

I'd rather we go the other way, and build the browser into the OS, so that desktop apps just serve a perfectly standard web server, with an API to launch a special OS client with slightly more native integration(Toolbar icons, closing the process when the browser closes, etc), to make native app dev easier and hopefully more popular.

Security sandboxes mostly work. People aren't constantly getting viruses from clicking the wrong link, at least not quite as much. They're not perfect, but they're better than having to completely trust 100 different sites unsandboxed, and they can be improved. I'd rather have crappy security than no security at all.

throwaway892238 · on Aug 9, 2023

If by "highly standardized" you mean "you don't get a choice in what you can do or how it works", I agree.

Native mobile apps thrive despite this magical web browser working everywhere, because the web browser simply doesn't do what native apps do. You may enjoy that, but a million businesses and billions of users out there don't agree, because they use native apps. There were 255 billion native mobile app downloads in 2022, generating billions in revenue. That's not a mistake or accident; that's a market filling a need.

If we really want an application platform that works everywhere, then let's stop dicking around with these stupid document hypertext viewers and build a real app platform that works everywhere.

eternityforest · on Aug 9, 2023

There are probably more website visits than that. People download games and bank apps and things like that, stuff they expect to use frequently, want fast access or offline ability, of they need hardware access.

There are still tons of things that don't need an app. Things that are inherently online and not accessed frequently work fine as sites.

The web doesn't limit what you can do that much, it limits how you can do it, which I think is a good thing. Less to break (Android API levels have the same effect) with fewer original lines of code. Less focus on clever and interesting code and more focus on UI and features (Although they try their best to reinvent the same js framework 1000 times).

A lot of the limitations are probably just Mozzilla hating anything that could be used for tracking, and not trusting users to manage permissions. The trend has been pretty strongly towards making the web very close to native apps, with all kinds of APIs.

Native apps fill a use case very well, that the web does not. That doesn't make the web obsolete.

pulpfictional · on Aug 9, 2023

> A lot of the limitations are probably just Mozzilla hating anything that could be used for tracking, and not trusting users to manage permissions.

Don't denigrate Mozilla for protecting users. There are a lot of privacy issues that can't be addressed by permissions models, Android serves as an example of that. Enumeration is not effective.

If Mozilla are standing in the way of invasive webpages, more power to them.

r3trohack3r · on Aug 9, 2023

> Native mobile apps thrive despite this magical web browser working everywhere

There’s also user behavior. Many users are conditioned to get software through the App Store. I’ve seen this be a driving factor for quite a few web native applications spinning up native dev teams and shipping native clients.

Many folks are surprised to see just how far you can push a browser app and how small the gap between web and browser has become for well built applications, including native-like things like Bluetooth, NFC, USB, etc. (see: https://youmightnotneedelectron.com/)

Have hacked with quite a few devs/companies that had a fully functioning offline capable web native application. The most requested feature they’d get? “I want to download it from the App Store.” (Usually in the form of “I can’t find your app in the App Store”)

I suspect this is why the PWA experience on mobile devices hasn’t been well paved and why some App Store policies call out “don’t just wrap a browser view in a native app” - they want to keep users coming in the front door of a marketplace where they collect a cut off the top of all transactions.

eternityforest · on Aug 9, 2023

We really need to standardize "just wrapping a browser view". Why are we shipping a whole browser when we could be shipping a zip file of HTML with some metadata, and maybe a few tiny native helper utilities?

asaddhamani · on Aug 9, 2023

This is how PWAs work on smartphones. I recall coming across some electron alternatives that use the system webview and those are able to generate binaries hundreds of kilobytes in size. But you lose out on access to many of the modern APIs due to Safari there.

vbezhenar · on Aug 9, 2023

This is standardized for many years and called PWA. There are ways to interact with native helper utilities as well (simplest is just run http server on localhost), but not for mobile apps.

For mobile apps you can use webview component which will use system browser. You don't need to ship the entire browser (actually you can't even do that on iOS).

andrewaylett · on Aug 9, 2023

You can ship web apps via the Play Store for Android: https://developers.google.com/codelabs/pwa-in-play

But you're not allowed to for iOS: https://developer.apple.com/app-store/review/guidelines/#min...

parasubvert · on Aug 9, 2023

You are delusional and terribly out of your depth if you think even a sizable minority has any interest in getting rid of hypertext and the web. Networked native apps are useless without the web architecture as the glue to integrate between them. It is highly likely that URIs, HTTP and hyperlinks will still be a foundational elements of our technology world in a hundred years.

lenkite · on Aug 9, 2023

HTTP != HTML/CSS. You are applying a [Strawman Argument]

parasubvert · on Aug 9, 2023

Add in URI. They're all necessary. Do you really think hypertext is dead? It's just evolving.

kbenson · on Aug 9, 2023

> If by "highly standardized" you mean "you don't get a choice in what you can do or how it works", I agree.

That is exactly what standardization means, by definition, both dictionary and colloquially.

masswerk · on Aug 9, 2023

> I'd rather we go the other way, and build the browser into the OS

This didn't work out that great with MSIE.

eternityforest · on Aug 9, 2023

I'd didn't work for Firefox OS either, and I'm not exactly sure why.

But it seems to have worked great for years for Chromebooks!

istjohn · on Aug 9, 2023

The web started as a graph of hyperlinked documents, and that use case hasn't gone away. SPA's aren't the internet.

sidewndr46 · on Aug 8, 2023

If someone created a way for WASM to talk to an SDL equivalent, it'd probably end use of HTML & CSS

tomsmeding · on Aug 8, 2023

If so, all people relying on (often limited even with html that contains text) accessibility features will have to sit in a corner and cry.

wg0 · on Aug 8, 2023

I think that should be addressable.

Terr_ · on Aug 9, 2023

Why do I get the feeling some company will address the issue by offering them a comfy chair and a free box of tissues?

eternityforest · on Aug 9, 2023

That sounds awful. People would be jamming entire 10MB toolkits into WASM to do things that HTML could do, and performance would probably suffer.

HTML is declarative. The machine understands it. I like things machines can understand and optimize. Things that you can write automated tools to work with because it's not a full turing machine. If you want to make a screen reader, you can. If you want to reflow for mobile in a better way, you can, because you know what's text and what's an image.

Just giving people a programming environment and the ability to draw some pixels, the browser has no idea what the intent is. There's nothing to optimize unless the individual sites do, and I doubt they have Google and Mozilla's budget.

We already have too many unnecessary powerful imperative systems out there.

throwaway12245 · on Aug 8, 2023

WASM has support for SDL for a long time: https://www.jamesfmackenzie.com/2019/12/01/webassembly-graph...

t0astbread · on Aug 8, 2023

If tri-state logic becomes viable it's game over for binary!

wg0 · on Aug 8, 2023

IF GTK/Qt etc can render to canvas using WebGPU while compiled to assembly, I think game is almost over than too IF there's a way to lazy load application modules.

Think Autodesk products. Certain parts (wasm modules) only load when you hover over a menu while overall app loads within milliseconds because it just has the main window and such.

slondr · on Aug 9, 2023

This has been possible for years. I actually got in the habit of porting my qt projects to the web target because they ran better than native compiled on some of my older machines.

yvdriess · on Aug 8, 2023

And so closing the great circle of UI framework technologies.

xigency · on Aug 8, 2023

I do think there’s never been a better time to reinvent the web, including basic technologies like HTML and JavaScript.

For example, picking up more elements of semantic web and distributed systems and leveraging interconnected devices.

wryanzimmerman · on Aug 11, 2023

But “regular old apps” don’t do all the same things.

The whole point of properly developing web apps is that html/css indicates a certain level of semantic understanding which allows for different interpretation in different contexts.

If you build a web app well, you build it to be a good experience on a computer for a power user with a huge screen and a keyboard for shortcuts, good for a user on a tiny touch screen, good for a blind person who doesn’t use a screen, and good for robots to parse and index. It should even be good for someone on an iPad with a mouse or a pencil which interprets the whole concept of the mouse differently from the desktop user’s mouse.

The agreed-upon semantics of HTML and the separation of visual styling into CSS is what allows you to take a step beyond just building an app and add a layer of “say what you mean” such that human and non-human users can re-interpret it into their own devices, use cases, and specific needs.

Web frontends are at their best when you don’t expect to perfectly control the end user experience, but try to convey the semantic meaning as perfectly as you can so it can be interpreted into good experiences even in situations where the display is wildly different than originally intended.

macinjosh · on Aug 9, 2023

> really all you need is bytecode that calls an ABI, and a widget toolkit that talks to a rendering API

Browsers have a 2D renderer (canvas) and you can write your GUI code in C++, Rust, or whatever and compile it to WASM. Some widget toolkits for this even exist already. If this were a superior model it would have taken over by now? I guess you are after deeper integration with the desktop environment?

ogoffart · on Aug 9, 2023

The challenge with toolkits confined to <canvas> rendering is their inability to effectively utilize the browser's integration with the OS. Key components like font rendering, input methods (including standard key shortcuts and behavior within input fields), selection handling, image decoding, network stack, native-like scrolling, and accessibility features need to be recreated from scratch. A web toolkit must use the DOM for optimal performance. (and therefore lower to HTML)

polynomial · on Aug 9, 2023

Presume you did not mean bytecode that calls ABI, but rather API? (ABI is a Web3 thing.)

efreak · on Aug 12, 2023

> In computer software, an application binary interface ( ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.

https://en.wikipedia.org/wiki/Application_binary_interface

jl6 · on Aug 8, 2023

There’s a marketing opportunity here to put multi-core back in the spotlight. Most workloads have reached the point of diminishing returns for adding more cores to a CPU, but if it turns out we need more cores just so we can run more concurrent processes (or browser tabs) securely, then here come the 128-core laptop chips…

dgb23 · on Aug 9, 2023

From a user’s perspective I often think that applications which run multiple processes, demand multiple threads and large chunks of memory are too entitled.

I know it’s a (not even) half baked thought. But there’s something to that. We never really think of “how many resources is this application allowed to demand?”

Software would be orders of magnitudes faster if there was some standard, sensible way of giving applications fixed resource buckets.

brundolf · on Aug 9, 2023

Intel playing 4D chess to sell more new hardware

demindiro · on Aug 21, 2023

I've wondered for a while whether it would make sense to split the CPU into a "IOPU" and a "SPU"

- The IOPU would be responsible for directing other hardware on the system. It doesn't need to be very performant.

- The SPU would be optimized for scalar and branch-heavy code that needs to run fast.

The SPU could have minimal security, just enough so it can't read arbitrary memory when fetching from RAM. It would only run one program at a time, so speculation shouldn't be an issue.

At least on my system few programs need a lot of processing power (and even then only intermittently), so little task switching should occur on an SPU.

kazinator · on Aug 8, 2023

> Once again it seems clear that running code from two security domains on the same physical processor cores is just not possible to get right, and we should probably just stop doing it.

Yes. This has had its heyday: the era of the time-shared systems, from the 1960's, right into the 1990's (Unix systems with multiple users having "shell accounts", at universities and ISP's and such). These CPU attacks show us that secure time-shared systems where users run arbitrary machine code is no longer feasible.

There will still be time sharing where users trust each other, like workers on the same projects accessing a build machine and whatnot.

bee_rider · on Aug 8, 2023

It is sort of like the second law of thermodynamics (before statistical mechanics came around and cleared things up): sure, maybe not well founded in some analytical or philosophical sense, but experimentally bulletproof to the point where anyone who tries to sell you otherwise would be regarded very suspiciously. The idea that any two programs running on a computer can be prevented from snooping on each other.

hinkley · on Aug 8, 2023

This seems like a job for Arm or RISC-V or ???

Sacrifice IPC/core to reduce gates per core, stuff more cores into a square centimeter, pin processes to cores or groups of cores, keep thread preemption cheap, but let thread migration take as long as it takes.

Arm already has asymmetric multiprocessing, which I feel like is halfway there. Or maybe a third. Fifteen years ago asynchrony primitives weren’t what they are today. I think there’s more flexibility to design a core around current or emerging primitives instead of the old ways. And then there are kernel primitives like io_uring meant to reduce system call overhead, amortizing over multiple calls. If you split the difference, you could afford to allow individual calls to get several times more expensive, while juggling five or ten at once for the cost of two.

hyperman1 · on Aug 8, 2023

I've wondered if we can't give a dedicated core to the browser. Of course, then web pages can steal from other web pages. Maybe task switching needs to erect much higher barriers between security contexts, a complete flush or so?

bee_rider · on Aug 8, 2023

I wish chips would come with a core devoted to running this kind of untrusted code; maybe they could take something like an old bonnell atom core, strip out the hyper threading, and run JavaScript on that.

If a script can’t run happily on a core like that, it should really be a program anyway, and I don’t want to run it.

hsbauauvhabzb · on Aug 8, 2023

I’m not saying you’re wrong, but I have a hard time believing web developers would be capable of writing code efficient enough to share a single core. LinkedIn was slamming my CPU so much that I isolated it in a separate browser

bee_rider · on Aug 8, 2023

I think you are right, my wish involves living in a slightly different universe where the web has taken a slightly less silly direction of development.

Negitivefrags · on Aug 8, 2023

It probably would be possible to add a new instruction that causes the processor to flush all state in exchange for sacrificing task switching speed. Of course it might still have bugs, but you could imagine that it would be easier to get right.

Of course, it’s not doing much for the billions of devices that exist.

I would hope that we could find a software solution that web browsers can implement so that devices can be patched.

Either way, I would want such a solution to not compromise performance in the case where code is running in the same security context.

This is what I don’t like about existing mitigations. It’s making computers slower in many contexts where they don’t need to be.

zamadatix · on Aug 8, 2023

Software can choose to invalidate all cache, fence access, and the like today. It may not be a single instruction but it's not far off. Usually something like "just don't JIT 3rd party JS to native code" is "secure enough" most don't want to go down that route though. For cloud (reputable) providers just don't allow more than 1 VM to be assigned to a core at the same time and flush everything between if they are time shared. The mitigations are the way to keep the most overall performance outside those who are most concerned with maximum security, so they are the most popular.

moonchild · on Aug 8, 2023

You would need a dedicated core per tab.

Partitioning tabs according to trust would be ~fine, but laborious and error-prone.

kzrdude · on Aug 9, 2023

Normally only one tab is in view at a time. Probably a bunch of use cases would hate this, but what if all background tabs just got very sparse time-share of a core and the foreground tab a dedicated core?

kristjank · on Aug 9, 2023

Do you really think that giving up on getting things done right is the way to progress computing? While AMD has it's own spectrum of problems and not-quite-there security features, most of their vulnyerabilities have been fixed in microcode shortly after disclosure.

We as an industry should stop excusing chipmakers from doing their jobs and reject broken products. It's brand loyalty all over again, like when Apple does something retarded like losing the headphone jack and the whole industry follows, breaking years of interoperability.

When the products/services we buy break, we should demand better, not lower our expectations.

paxys · on Aug 8, 2023

So one should never install software from more than one company on a computer?

CanaryLayout · on Aug 9, 2023

You install App X from Vendor Y on to vSystem Z.

Vector is found to get untrusted code C to run in the user area on Z via exploit in X that Y has not acknowledged, so researchers publish a CVE with an example.

C starts trying to read memory from threads shared on same vCPU, revealing db connection string used by X, the nonce and salt for hashing.

Attacker now has the keys to the entire kingdom.

accrual · on Aug 8, 2023

> Once again it seems clear that running code from two security domains on the same physical processor cores is just not possible to get right, and we should probably just stop doing it.

I believe this is why OpenBSD disabled SMT by default in June of 2018. [0]

It can still be enabled with a simple 'sysctl', though.

[0] https://www.mail-archive.com/source-changes@openbsd.org/msg9...

bhk · on Aug 8, 2023

Once you "dedicate cores to specific VMs" you will find that chip designers can also screw that up, just like they can screw up protection within a core. So you might as well proclaim that "impossible to get right" preemptively.

quickthrower2 · on Aug 8, 2023

They said physical processor. I took this to mean chip. Of course you can't trust chips on the same mother board (there could be a bug where they can read the same memory), so you need a network boundary. So different rack on the server, for each [boundary, however you define that...]

bhk · on Aug 9, 2023

Regarding VM/core/chip/motherboard boundaries ...

A number of vexing CPU security vulnerabilities have resulted from a general class of optimizations that share cache resources (memory, TLBs, branch prediction tables, etc.) across different domains, which enables cross-domain attacks.

There is a potential for vulnerabilities even when the domains are separated by a network boundary ... if those domains view the world through some shared cache at the network layer.

bhk · on Aug 9, 2023

They said physical processor core, and elsewhere just core. That does not mean chip.

malfist · on Aug 8, 2023

This is an unreasonable position. Vulnerabilities can be fixed

jjav · on Aug 8, 2023

> This is an unreasonable position. Vulnerabilities can be fixed

That's a highly optimistic position, to the point of being almost wishful thinking.

The vulnerability being talked about today has been around since 2014 according to the report. Possibly being exploited for unknown number of years since. Sure, maybe we can workaround this one, now.

Other similar ones to be published years into the future are also there today being likely exploited as we speak.

Running untrusted code on the same silicon as sensitive code (data), is unlikely to ever actually be truly safe.

alerighi · on Aug 8, 2023

The real problem here is the x86 architecture. We should stop using it. It's too complex, it's full of stuff made for backward compatibility purposes, it has too many instructions that may have unpredictable results, and for that exact reason it's extremely difficult to get things right. Somewhere in the thousands of instructions that it has you will surely find a bug.

We should move forward. Apple did a great job with the M{1,2} processors. Not only they are fantastic in terms of performance and energy usage, but also (to this moment) they don't seem to suffer from these issues. The reason is that the CPU is simpler, and a simpler thing is easy to design right in the first place.

pseudalopex · on Aug 8, 2023

Meltdown and Spectre affected ARM and POWER CPUs. Apple's included. PACMAN affected M1. The real problem is speculative execution seemingly.

CanaryLayout · on Aug 9, 2023

Performance sorta hinges on it. It could be that the cheaper way the chipmakers deal with it is to phase out the 4core set and push the cores on a die higher, and to do that--incorporate an older core design into dedicated cores for untrusted code.

This would also require changes at the OS-makers to tag thread forks for trusted and untrusted behavior.

Essentially: instead of shutting down SMT for the entire machine, make the customer "prove" the code is safe for elevation or else it gets scheduled as non-SMT.

pseudalopex · on Aug 9, 2023

> Performance sorta hinges on it.

Yes. I did not mean to imply the real problem had a simple solution.

petergeoghegan · on Aug 9, 2023

> The real problem here is the x86 architecture

Even if we could say for sure that x86 has been disproportionately affected by speculative execution bugs (which already seems dubious), that could easily be due to a kind of selection bias. Presumably security researchers as a group more or less focus on the most popular and relevant ISAs/microarchitectures.

hinkley · on Aug 8, 2023

Vulnerabilities are like unreliable cars. It doesn't matter so much if they can be fixed, it's the very, very high opportunity cost of needing to be fixed, when you were busy doing something else of high value.

Responsible people tend to pick the small consequence now over the unknowable distribution of consequences later.

pseudalopex · on Aug 8, 2023

> Vulnerabilities can be fixed

Not always the damage.

2OEH8eoCRo0 · on Aug 8, 2023

The mitigation here can incur a whopping 50% performance penalty. At what point can customers return these CPUs for either being defective or sue for false advertising? If they can't safely meet the target performance they shouldn't be doing these tricks at all.

dcow · on Aug 8, 2023

Did processor companies ever advertise that processors guaranteed certain security properties of the software they execute?

Aren't system designers at fault for coming up with the idea of a context switch and assuming that we can trust a processor not to leak details across artificial software constructed boundaries?

CanaryLayout · on Aug 9, 2023

> Aren't system designers at fault for coming up with the idea of a context switch

Context switching was in the Apollo 11 guidance computer https://www.youtube.com/watch?v=xx7Lfh5SKUQ

The problem is that cycle-speed boosts left the industry cic. 2012-or-so. All the perf boost we get these days is by optimizing the instruction sequencing and multiprocessing. This is why languages like Go popped up (making the advanced programming topic of multiprogramming an entry-level accomplishment) and you now see the 'async' decorator plastered everywhere in C#, and so-on.

Keeping the security context intact and separated is a gargantuan task.

To me it makes more sense to add "lousy cores" to the die and force the operator to declare the launching threads are safe for SMT, else the job gets scheduled to the less-performing core where the pipelining is safer. It delegates responsibility to the chip-gobbler, forces them to understand the tradeoff for performance, until some elegant solution is found for side channel attacks like this.

3np · on Aug 8, 2023

According to OP, SGX is also vulnerable. There has been quite some marketing around the supposed security properties of SGX, yes.

rvnx · on Aug 8, 2023

But wait, how is it even considered a processor bug ?

You write data inside the registers, yes, other processes can read these registers.

It always been like this, and is absolutely normal.

It's the responsibility of the operating system to clear the registers if it is switching context.

duped · on Aug 8, 2023

> But wait, how is it even considered a processor bug ?

These side channel attacks allow an attacker to read registers from other processes before any context switching has occurred, or independently of any context switching (even if the OS has been written to correctly clear state).