OpenBSD disables Intel's hyperthreading due to security concerns

jimrandomh · on June 19, 2018

> We really should not run different security domains on different processor threads of the same core. Unfortunately changing our scheduler to take this into account is far from trivial.

This suggests a long-term compromise solution where threads within a process can use hyperthreading to share a core, but threads in different processes can't. Given that hyperthreads share L1 cache, this might also be better for performance.

mww09 · on June 19, 2018

>This suggests a long-term compromise solution where threads within a process can use hyperthreading to share a core, but threads in different processes can't. Given that hyperthreads share L1 cache, this might also be better for performance.

Intuitively this may sound logical, however in practice it's often not the case. For many workloads putting two threads of the same program on a core ends up being worse than co-locating with threads from different programs. The reason is that two threads of the same program will often end up executing similar instruction streams (a really good example is when both are using vector instructions (these registers are shared between the two hyperthreads)).

ajross · on June 19, 2018

In practice it sometimes is the case, though.

SMT/hyperthreading is complicated. If you have a workload dominated by non-local DRAM fetches, it's a huge win because when the CPU pipeline is stalled on one thread it can still issue instructions from the other.

If you have a workload dominated by L1 cache bandwidth, the opposite is true because the threads compete for the same resource.

On balance, on typical workloads, it's a win. But there are real-world problems for which turning it off is a legitimate performance choice.

adrianratnapala · on June 20, 2018

> workload dominated by non-local DRAM fetches,

How often is that a polite way of saying "software that is inefficient"?

ajross · on June 20, 2018

Often. But software is what software does, and a CPU that only worked well on "efficient" code will always fail when compared with one that runs the junk faster than the competition.

Also, to be fair: sometimes a DRAM fetch is just inherent in the problem. Big RAM-resident databases are all about DRAM latency because while sure, it's a lot slower than L1 cache, it's still faster than flash. I mean, memcached is a giant monument in praise of the pipeline stall, and it's hugely successful.

adrianratnapala · on June 20, 2018

> But software is what software does, and a CPU that only

Indeed. It is arguably rational for Intel to take on the burden in a centralised place rather than expecting every two-bit software shop to to do it.

But then the existence of this kind of security issue shows that the added complexity is not always worthwhile. We might be forced to to accept that computers which actually behave well are a little bit slower than we thought. But in return they will be simpler and more amenable to software optimisation.

imtringued · on June 20, 2018

I'm not sure there is a correlation. I can think of many situations in which non-local DRAM fetches are more efficient and I can think of many other situations where the opposite is true.

Trees or hashmaps which use non-local DRAM fetches can be more efficient than a brute force linear search through a continguous array given a sufficiently high enough number of elements.

At the same time continguous arrays can be significantly more efficient than linked lists which use non-local DRAM fetches.

yk · on June 20, 2018

With most software, well most software is pretty inefficent and profits from HT. However there are a lot of reasons for that, writing in something interpreted because it is faster to develop and the software does not need to be very efficient in the first place would be one application. (Not to say that all Python/JS/etc is inefficient, just that software that needs to be efficient is precisely the kind were one would consider an unmanaged language.) Additionally, things like webservers or dbs often just don't know which piece of data they need next, simply because they don't know the next query, have a tendency to profit from HT, even though the software is hardly known for being inefficient.

paulirwin · on June 20, 2018

FWIW, you mention databases, but even some database workloads can have better performance with HT turned off. I first learned this from a DBA at a former job when I was curious as to why they turned HT off. A member of the SQL Server team back in 2005 ran some experiments and found that you can get a 10% performance improvement in some workloads with HT off [1]. I don't know how much of that is still true today, however, as nearly all of my recent experience is PaaS in the cloud.

[1]: https://blogs.msdn.microsoft.com/slavao/2005/11/12/be-aware-...

naikrovek · on June 20, 2018

> How often is that a polite way of saying "software that is inefficient"?

One could also say "software written with strong OOP patterns" because those are almost always written to benefit the developer later, rather than the CPU and RAM at runtime.

willvarfar · on June 20, 2018

There are plenty of problems with poor mechanical sympathy.

To take an extreme example, traversing graphs is notorious. Cray and Sun iirc have some fascinating processors with many many hyperthreads because all the programs do is wait on dram but luckily there are lots of searches that can be done in parallel.

wumpus · on June 19, 2018

Typical workloads? What's that? People run hugely diverse workloads on cpus, and they change over time.

ajross · on June 19, 2018

Building software, serving web pages, executing database queries, running a DOM layout, managing game logic... I mean, come on. You knew what I meant. Those are all tasks with "medium" cache residency and "occasional" stalls on DRAM. Anything that does a bunch of different things with a big-ish world of data.

Conversely: finding a task that is L1-cache-bound but does not frequently have to stall for memory is much harder. The only ones off the top of my head are streaming tasks like software video decode.

wumpus · on June 19, 2018

Oh, you meant typical for you.

One task that is L1 cache bound and does not frequently stall for memory (if you code it up well) is matrix multiply.

kbenson · on June 20, 2018

> Oh, you meant typical for you.

I'm pretty sure those are meant to be, and I think are, "typical" for the general purpose CPU in use, and thus the general case.

Both mobile and desktop CPUs will be doing DOM layout, DB queries (whether to SQLite or the registry or just the filesystem), and possibly computing game logic on a regular basis.

wumpus · on June 20, 2018

It's becoming popular to want to push machine learning tasks onto edge devices like mobile and desktop CPUs, for example apps that include some machine learning. Some of these machine learning algorithms do a lot of matrix multiplies.

"Typical" is highly varied, and it changes.

Edit: here's an example: Google brings on-device machine learning to mobile with TensorFlow Lite

https://thenextweb.com/artificial-intelligence/2017/11/15/go...

kbenson · on June 20, 2018

Would they be using mostly CPU for that, or would they offload it to the GPU or a dedicated chip? I would assume you would use your general purpose CPU only if all else wasn't available (and generally there's a GPU available on most end user devices these days).

wumpus · on June 20, 2018

If possible the GPU, but not all GPUs have either a library or enough documentation to write one. I’ve seen complaints about this issue on mobile GPUs for years, no idea how widespread it is now.

BTW, this is just one example algorithm that I picked because it does (on the cpu) what the person I replied to said was rare.

rbanffy · on June 20, 2018

Running the model is much easier than training it. On power-constrained environments, DSPs can do it.

adrianN · on June 20, 2018

I don't know whether it's still true, but a couple of years ago a majority of the world's CPU cycles were spent sorting things.

smaddox · on June 20, 2018

That's an interesting claim. Do you remember the source?

amelius · on June 19, 2018

> The reason is that two threads of the same program will often end up executing similar instruction streams

Why is that bad?

CHY872 · on June 19, 2018

Your processor has a certain number of execution units which can actually execute individual instructions, maybe like 4 floating point units, maybe 8 arithmetic ones, and maybe 1 that can do vector processing (these numbers are not real, but are like, good enough for sake of message).

So the idea with SMT is that most of the time, lots of the execution units are unused because the thread a) isn't using them at all (e.g. a process to do encryption won't use the floating point units) and/or b) can't use them all because of how the program's written (for example, if I say 'load a random memory address, then add it to a register, then load another random memory address, then add it, etc' I'm going to be spending most of my time waiting for memory to be loaded.

SMT basically means that you run another program at the same time, so even if the encryption process can't use the floating point units, maybe there's another process that we can schedule that will.

However, imagine my encryption process can use 6 of the 8 arithmetic units. If I have 2 encryption processes scheduled on the same core, I have demand for 12 when there are only 8. So now I have contention for resources, and I won't see a speedup from using SMT.

Other comments mention registers and not execution units: I'm suspicious of this, since modern processors have many registers (for Skylake, 250+) which they remap between aggressively as part of pipelining. Maybe this is different for the SIMD units.

That said, I haven't looked at this stuff since university so could well be wrong on the execution unit vs register comparison.

blattimwind · on June 19, 2018

The contention would actually not be on a per-EU level, but one level higher up. The reservation station has a bunch (~5-8) of ports and typically multiple EUs are connected to one port. Can't use one port for two different things at the same time.

Here's a simplified block diagram of a Skylake core: https://en.wikichip.org/wiki/intel/microarchitectures/skylak...

CHY872 · on June 19, 2018

Thanks! Yeah figured I'd be wrong somewhere in there!

Twirrim · on June 20, 2018

There's also some funkiness around the CPU cache, at least at one stage. If your two HT threads are working on the same data, there was a chance you'd get some great cache performance out of it. However the hyperthread when faced with a cache miss, can cause the cache to get evicted to be replaced with the data it needs. Under those circumstances, performance takes quite a nose dive as both threads are stomping over each other somewhat.

Hyperthreading can be a real mixed bag for performance, though generally good and a lot of engineering effort has gone in to making it shine. As ever it's strongly advisable that people benchmark real world conditions on a server, and it's worth giving a shot with hyperthreading turned on and off.

gpderetta · on June 20, 2018

You are pretty much right. A couple of things to add: even hand optimized asm code wont be able to use all ports all the time with a single instruction steam; the biggest win for hyperthreading is filling the pipeline bubbles caused by memory loads out of L1 (there is only so mach that OoO scheduling can do on your average load)

amelius · on June 20, 2018

So on a RISC architecture, this would happen even for non-similar programs, because the number of instructions is smaller? Or would they just duplicate the processing units?

Dylan16807 · on June 20, 2018

The units aren't dedicated to esoteric instructions, you have functions like "small alu", "big alu", "fp adder", "address generator and memory load".

RISC will perform about the same and you can hyperthread one fine.

jfoutz · on June 19, 2018

Different instruction streams use different registers. It’s like sharing a bathroom. I can shower while you brush your teeth. There’s more contention when we’re both trying to shower.

uryga · on June 19, 2018

"both are using vector instructions (these registers are shared between the two hyperthreads)"

So I guessing GP meant there's going to be contention for those registers, and thus no speedup?

xeroaura · on June 19, 2018

Taking a guess, but since they are running similar streams, they have similar loads at a specific time. Competition between main thread and hyperthread could hurt performance instead?

classichasclass · on June 19, 2018

I'm not sure that would necessarily fix the problem definitively. Say you had a browser running web-exposed JavaScript on a thread. You could still finagle a Spectre-type information leak that way by having the JavaScript thread snoop other browser threads, assuming no other mitigations.

endianswap · on June 19, 2018

Don't most browsers run one process per page/tab nowadays?

wumpus · on June 19, 2018

Chrome does, Firefox does not (I've got 5 processes for a billion tabs.)

ComputerGuru · on June 19, 2018

No, Chrome used to but now uses a heuristic to determine whether new tabs should be launched in their own process or share an existing process, as a memory usage mitigation strategy. I believe tabs from the same origin have a preference of sharing processes.

valarauca1 · on June 19, 2018

Firefox process per tab is behind a feature flag as it’s in testing still

timvisee · on June 19, 2018

I don't think the plan is to ever enable this in the comming few years. The current approach with a few tabs is much more memory efficient, which is why they've chosen it.

ams6110 · on June 19, 2018

Indeed. I cannot open Google Drive in chrome on my OpenBSD box without crashing the tab from exhausting memory, but Firefox handles it no problem.

mtgx · on June 19, 2018

And it's a mistake.

Just recently I noticed that when Firefox loads multiple tabs of the same wordpress site, it starts hanging not unlike Firefox always used to hang. That's likely because it groups all of those same site pages under one process.

I've never experienced that with Chrome. This is why I hope Firefox eventually (ASAP) switches to one process per tab, too. I can handle the browser using an extra GB of RAM. I can't handle it hanging on me and frustrating me.

Instead of pushing for 30-40% lower memory than Chrome, I say they should push for 10% lower memory with the same sanboxed process per tab model.

Amezarak · on June 20, 2018

Chrome does not use one process per tab. In fact, it does something very similar to what you say Firefox does.

http://dev.chromium.org/developers/design-documents/process-...

FWIW I do not have the problem you describe and I don't want Firefox wasting any more of my scarce memory, or for that matter, CPU.

Dylan16807 · on June 20, 2018

> very similar

Not really. Chrome uses a lot of processes for isolation. Firefox uses about four so it can take advantage of multiple cores.

girvo · on June 20, 2018

> I've never experienced that with Chrome

Right, but I doubt that it's for exactly the reason you think it is: Chrome doesn't blindly do "one process per tab" anymore, and hasn't for a bit.

Semaphor · on June 20, 2018

You can enable it (as I have) on chrome://flags/#enable-site-per-process

Strict site isolation Security mode that enables site isolation for all sites. When enabled, each renderer process will contain pages from at most one site, using out-of-process iframes when needed. When enabled, this flag forces the strictest site isolation mode (SitePerProcess). When disabled, the site isolation mode will be determined by enterprise policy or field trial. – Mac, Windows, Linux, Chrome OS, Android

wumpus · on June 19, 2018

So, you're saying because you think you've discovered one case where there might be a problem, Firefox should completely change their architecture? And you're saying this in a discussion which frequently mentions how extremely varied workloads are?

abugheratwork · on June 20, 2018

No, man. He said Firefox should change their architecture, and he gave some kind of example.

After the way you seized on the word "typical", I kind of expected you to take words at face value. I didn't see any text to the effect that he thinks his say so is good enough.

Also, you're the one frequently mentioning how varied workloads are, and you don't constitute the discussion.

I'm going to go drink some cocoa to wash down this hook, line, and sinker I swallowed with your flame bait.

wumpus · on June 20, 2018

Not intended as flame bait, sorry if that's how it comes across to you.

It just so happens that people complaining about Firefox doing it wrong is a pretty common thing in Firefox threads on HN. And they usually have an example where it's really unclear if it's a problem for more than them or not. But, usually, they have a lot of advice about what the Firefox team should do. Whereas the Firefox team has telemetry data from most of their users.

craftyguy · on June 19, 2018

The only thing I could find is a set of 'browser.tabs.remote.*' options, that are all enabled by default in FF 60.

That seems to indicate it is enabled, since the old option was 'browser.tabs.remote'? Or has it changed to something else now?

stormbrew · on June 19, 2018

In theory marking threads even within the same process as part of a different 'security domain' shouldn't be impossible, though obviously it'd involve proprietary interfaces to the kernel at first.

krylon · on June 20, 2018

.Net supports this (at least on Windows), it is called AppDomains: https://docs.microsoft.com/en-us/dotnet/framework/app-domain...

Tepix · on June 20, 2018

Once operating systems offered this mitigation mechanism, I'm sure browser vendors would use them.

chasil · on June 20, 2018

Perhaps it makes more sense to require that all processes on an individual core share the same UID.

Browsers are particularly problematic, and it would be nice to alert the scheduler that a particular process is untrusted and extra care should be taken to sanitize caches before and after its time slice.

the8472 · on June 20, 2018

> Given that hyperthreads share L1 cache, this might also be better for performance.

If userspace thread writes something into a buffer, does some syscall initiating asynchronous work in the kernel wouldn't it be better for the kernel thread to be located on the same core instead of shuffling the data into another cache?

keldaris · on June 19, 2018

So... they "strongly suspect" (but don't know and haven't shown) there may be a Spectre-class bug enabled by current HT implementations and improving their scheduler is hard, so they'll pre-emptively disable HT outright on Intel CPUs now and others in the near future?

I'm not an OpenBSD user (and glad for it, if this is anything to go by), but I'm curious - is this really how they operate, or does this decision stand out?

2trill2spill · on June 19, 2018

> I'm not an OpenBSD user (and glad for it, if this is anything to go by), but I'm curious - is this really how they operate, or does this decision stand out?

I'm not a OpenBSD user either, I use FreeBSD whenever possible. However from listening to OpenBSD devs, via blogs, conferences, HN, etc, it seems that OpenBSD is an operating system built mainly for OpenBSD developers, their goals support this[1]. OpenBSD being useful for non OpenBSD developers is more of a secondary goal compared to how FreeBSD or Linux or any other OS handles it. Also OpenBSD is much more of a research operating system then other large successful OS(Linux, Windows, MacOS, FreeBSD, etc). Meaning OpenBSD cares way more about developing features and novel security mitigations then trying to maintain backwards compatibility like other operating systems do.

> So... they "strongly suspect" (but don't know and haven't shown) there may be a Spectre-class bug enabled by current HT implementations and improving their scheduler is hard, so they'll pre-emptively disable HT outright on Intel CPUs now and others in the near future?

The OpenBSD devs strongly suspected another Intel hardware bug a week or two ago, implemented a mitigation and deployed it. Turns out they were right[2].

[1]: https://www.openbsd.org/goals.html

[2]: https://www.bleepingcomputer.com/news/security/new-lazy-fp-s...

_lwad · on June 20, 2018

> Also OpenBSD is much more of a research operating system then other large successful OS(Linux, Windows, MacOS, FreeBSD, etc). Meaning OpenBSD cares way more about developing features and novel security mitigations then trying to maintain backwards compatibility like other operating systems do.

This is not the feeling I get from OpenBSD at all. They don't act like research. They aren't keen on implementing new features just for the sake of it, or just to try it out. A better description would be that they put correctness, security and maintainability first, and simplicity often comes as a nice side effect. Deprecating old, unused features is just a consequence of striving to decrease complexity by trimming your code base. OpenBSD is one of the few OS where the number of lines of code is not skyrocketing to unmanageable numbers.

da_chicken · on June 20, 2018

> it seems that OpenBSD is an operating system built mainly for OpenBSD developers

Honestly, I would say that this is true of many open source projects. It's one of the reasons that open source development tools are so good on Linux, but end user applications fall so far behind. It's also why documentation and usability tend to be much worse. When your system is based on volunteering, the work that gets done tends to be the stuff that interests the workers.

dvfjsdhgfv · on June 20, 2018

> it seems that OpenBSD is an operating system built mainly for OpenBSD developers

I don't see it that way at all. Whenever I have to work on a project where security is top concern, I always look at OpenBSD as an option. In the Linux world, the equivalent would be the Openwall GNU/*/Linux project. Not something for an average user, but to say it's used by its devs mainly is off by an order of magnitude.

forapurpose · on June 20, 2018

> The OpenBSD devs strongly suspected another Intel hardware bug a week or two ago, implemented a mitigation and deployed it. Turns out they were right[

In fairness, my impression from the video of Theo's presentation was that they were tipped off by someone under embargo.

flurrything · on June 20, 2018

> is this really how they operate,

OpenBSD is a research operating system and security is a core component of their research. Pro-actively mitigating security risks before exploits appear is one way to improve security that has worked in the past: vulnerabilities having been fixed before they appeared.

Because they give reasonable deadlines for companies to fix security bugs (~90 days), they are kept out of the loop by hardware vendors like Intel who requested 1 year to fix meltdown.

Being in the dark, if you see some suspicious behavior, either you protect yourself from it, or you might wake up the next day and Intel will have released a new "We are sorry" post and your users would be screwed.

So this is pretty much how they operate, and if an OS is as security conscious as OpenBSD, there isn't really a different way to operate.

Note that disabling hyper threading to mitigate CPU flaws isn't anything new either: this had to be done for AMD's Ryzen because of hardware bugs last year anyways - https://www.extremetech.com/computing/254750-amd-replaces-ry...

keldaris · on June 20, 2018

> if an OS is as security conscious as OpenBSD, there isn't really a different way to operate

That's really what I was getting at with my question. There's no such thing as absolute security, it is a set of tradeoffs between usability, performance and specific security guarantees. Is there a point where the OpenBSD developers would say "okay, this is a (potential or confirmed) security bug, but the mitigation is just too costly in this case"?

In the post-Spectre world, it's not completely inconceivable to contemplate the possibility that, in order to retain the security guarantees most people thought they had, one might have to give up a substantial subset of the benefits of speculative execution in out of order processors. For some workloads that might mean up to two orders of magnitude in performance. I know roughly where the common operating systems would draw the line and I certainly know where I would, for my own usecases. I'm just curious about how OpenBSD works in this regard.

temprature · on June 20, 2018

It's a sysctl that can be toggled. If you want hyperthreading and don't care about the security concerns, you can turn it back on. The commit message specifically mentions that this was implemented because many modern machines don't allow hyperthreading to be disabled in the BIOS.

You'll find out in a couple of months why they did this. I hope you'll remember the comments you've written when that happens, because you might learn something about what OpenBSD means when they say they "strongly suspect" something.

keldaris · on June 20, 2018

As I already implied in the parent post, I've no doubt we'll see more Spectre-like issues and I don't find it hard to believe that there's yet another embargo going on, that something's leaked again and that the OpenBSD people have picked up on it. I'm just curious about how many tradeoffs in performance and usability OpenBSD would be willing to make (in the form of default settings, I'm aware this one can be toggled) in the pursuit of security and how they do the cost/benefit analysis, because their process seems vastly different from that of the main OS vendors.

On the latter point, I did take "strongly suspect" at face value rather than code for "we know for sure, just can't disclose it all yet" because I'm not familiar with the development culture of OpenBSD.

ams6110 · on June 20, 2018

Though it doesn't answer your question directly, the project goals are here: https://www.openbsd.org/goals.html. Security is important, but not the only goal. Notice that the word "performance" does not appear on this page.

If you want performance above all else, OpenBSD is not for you.

temprature · on June 20, 2018

Nice backpedaling.

keldaris · on June 20, 2018

There's no backpedaling, I still think their approach is crazy for any practical usecase I care about. Beyond that, I simply expressed curiosity to learn more about the extent of the tradeoffs they're willing to make.

_lwad · on June 20, 2018

They don't set defaults for the average use case when security is involved. This is the difference between:

"Secure by default", turn knobs if you need more speed

"Fast by default", turn knobs if you need more security

Not that the knobs will be always available for each design decision, but sometimes they are there and you can turn them at your own risk. It probably would be wise to understand the consequences. Some people will prefer the peace of mind of knowing that safe defaults are in place if they don't change anything. Those will probably align with OpenBSD here. Some people believe that security is something you bolt on afterwards. Those definitely won't like OpenBSD design decisions.

keldaris · on June 20, 2018

> They don't set defaults for the average use case when security is involved.

They certainly set the defaults for some usecase, it just happens to be more security-biased than most. They don't ship an OS for an airgapped toaster, so it can't ever literally be "secure by default", it's just a compromise on the tradeoff scale that's more security-oriented than most. It still needs to be usable (for some set of people) and it still has to achieve some baseline level of performance to be usable - I was trying to get some clarity on the latter.

_lwad · on June 20, 2018

You seem to imply that security will always result in less speed or less usability, and that is not always the case. The thing with OpenBSD is that security will always come first between the three values when they clash, but they don't always clash. And yes, it is the most secure OS out there if you are to judge by the statistics over its history. I'd say that only two remote holes in so many years pretty much grants them the "secure by default" label. Maybe looking from outside it seems like security is all they think about, but my impression is that it is more about correctness and simplicity, and that security comes as a consequence. As an example of simplicity, I am not personally aware of any install that is so simple as theirs. Except maybe ubuntu's, but then with ubuntu you end up with a mess of interdependent packages and it will be a hell to uninstall shit you don't need.

keldaris · on June 21, 2018

> You seem to imply that security will always result in less speed or less usability, and that is not always the case.

Certainly not always, but often enough and more so than usual with Spectre and Meltdown.

> As an example of simplicity, I am not personally aware of any install that is so simple as theirs. Except maybe ubuntu's, but then with ubuntu you end up with a mess of interdependent packages and it will be a hell to uninstall shit you don't need.

That's an interesting point. How does it compare in terms of simplicity to the other BSDs (FreeBSD and Dragonfly) or something like Arch Linux?

_lwad · on June 21, 2018

> That's an interesting point. How does it compare in terms of simplicity to the other BSDs (FreeBSD and Dragonfly) or something like Arch Linux?

I'm not familiar with FreeBSD and DragonFly, but I have used NetBSD in the past and a bit of Arch Linux. The system management is way more consistent in OpenBSD, things generally work and are more reliable. The package management system is a pleasure to work with, and when you want to remove unused packages or dependencies of previously installed packages, it's simple and consistent. It actually works. When you are configuring something, most of the time there is one single way to do it, and it's well documented. And the simplicity can't really be compared to Arch Linux. Fire up a vm and install OpenBSD to it, just for the experience. It's mostly just accepting the defaults, extremely simple.

keldaris · on June 21, 2018

Fair enough, thanks. I'll try it out just to see what it's like.

chasil · on June 20, 2018

There is a thread at lobste.rs that demonstrates the setting, immediately idling half of the CPUs presented in top:

https://lobste.rs/s/ifr52b/openbsd_disables_intel_s_hyperthr...

orf · on June 20, 2018

> Because they give reasonable deadlines for companies to fix security bugs (~90 days), they are kept out of the loop by hardware vendors like Intel who requested 1 year to fix meltdown.

Is 90 days really a reasonable timeframe to fix something like meltdown? I agree with your whole comment in general, but hardware/microcode issues at Intel's scale are a different beast than some buffer overflow.

gargravarr · on June 20, 2018

Not to forget, Intel have said this collection of exploits cannot be fixed in microcode, only mitigated - it's so fundamental to the CPU hardware that it requires redesigning the chip to fix properly. The mitigation has led to serious performance hits and further problems down the line. And the permanent fix requires eventually replacing the CPU.

As has been previously noted, Meltdown, Spectre and related exploits came about because nobody ever thought it would be possible to access the cache from other processes. 1 year is probably quite reasonable for Intel to redesign the architecture (in fact, is probably going to stretch to 2 years or more), but people need security fixes now. In this case, it looks like OpenBSD are taking the right approach.

temprature · on June 20, 2018

OpenBSD and DragonFly had their meltdown mitigations done in a couple of weeks from when it was publicly disclosed. If it's good enough for projects developed by handfuls of volunteers, it's good enough for the multi-hundred billion dollar megacorps.

orf · on June 20, 2018

So you're saying if a small software operation can put in place mitigations in X time then a absolutely massive hardware operation with hundreds of product lines consisting of some of the most advanced, ridiculously complex, slow to develop chips in the world can push a fix to many billions of devices in X time as well, whilst ensuring backwards compatibility and reducing the performance impact across the mind mindbogglingly large number of different workloads that their chips are used for.

Makes sense. 90 days is more than enough.

temprature · on June 20, 2018

Implementing KPTI didn't involve any of what you just said.

orf · on June 20, 2018

Exactly. So how can you compare the work that's required from Intel to patch the flaw in new designs + mitigate it with microcode vs the work that's required from 'projects developed by handfuls of volunteers'?

cat199 · on June 20, 2018

> I'm not an OpenBSD user (and glad for it, if this is anything to go by),

.. yet most likely if interacting with unix systems rely on OpenSSH.

why would relying on a feature from a vendor with known processor security issues, including undisclosed hidden application processors, for a feature which has marginal performance improvement and in some cases degredation be a preferable stance?

at best ambivalence towards this decision would be the position to take, esp. given the very recent 'oh hey fpu registers are also a problem' "discovery" which they were entirely correct about..

zokier · on June 19, 2018

Yeah, they are fairly risk-averse and not really performance oriented, so this decision feels pretty much in line with their practice.

forapurpose · on June 20, 2018

> they "strongly suspect" (but don't know and haven't shown)

Some security professionals seem to insist on having a proven exploit before they act. Doesn't that seem like poor decision-making? Their job is to provide security, not to secure proven exploits - the latter is a means to an end. If there are threats from unknown exploits, and there certainly are, then it seems that they need techniques to address unknown exploits. One of those techniques is expert analysis of potential threats.

__jal · on June 19, 2018

You may well not be in the target market (if your comment is anything to go by) but yes, their entire appeal rests on being very conservative with security decisions.

flukus · on June 20, 2018

Maybe they've got a heads up that there are more spectre like bugs incoming but not enough information to actually mitigate them.

CPU bugs seem to be a rich vein to mine at the moment.

_chris_ · on June 20, 2018

> So... they "strongly suspect" (but don't know and haven't shown) there may be a Spectre-class bug enabled by current HT implementations

Spectre is about a) leaving side-effects of misspeculation in shared resources, and b) bandwidth contention (between a misspeculated instruction stream and an attacker) to shared resources.

It is trivially obvious that HT exacerbates Spectre-class bugs, as the entire raison d'être to HT is to share pipeline resources. How quickly information can be leaked can be up for debate, but it's definitely non-zero.

ben_bai · on June 20, 2018

How about 17 seconds? https://www.blackhat.com/us-18/briefings/schedule/#tlbleed-w...

_chris_ · on June 20, 2018

Thanks for the link! I should've been more precise: the question in my mind is how many kbits/second of arbitrary target memory can be leaked. That's what made Meltdown/Spectre so scary was that the entire kernel memory could be dumped on the order of hours.

tinus_hn · on June 20, 2018

I would expect them to choose security over performance, that is how they operate. Microsoft would sweep this under the carpet, that’s what they do.

YouAreGreat · on June 20, 2018

> disable HT outright

That would describe it if they...disabled it outright.

But they made HT user configurable, just like any other performance tuning knob.

Scramblejams · on June 19, 2018

I've never trusted hyperthreading for workloads I haven't tested. Sometimes it's faster, often it's slower. Beyond that, I've been suspicious of its security implications from day one. My first trip through the BIOS on a personal machine always includes turning it off.

ailideex · on June 20, 2018

Can you give example of where it was slower for you with HT enabled?

Scramblejams · on June 20, 2018

Running finite element models with MSC NASTRAN, basically heavy matrix math. Matrices were NxN, N was around 10 million. This was on a server with 36 cores and a half terabyte of RAM, purchased in 2014.

Also seen Erlang workloads where you could get a bit of throughput increase with your VM scheduler scheduling more threads than your physical cores (so starting to use HT) but the latency would spike and become very unpredictable, which was a bad tradeoff for the use case.

gnufx · on June 20, 2018

HPC workloads normally at least won't benefit and probably take a hit from HT on Xeon-ish hardware at least. It's normally turned off on HPC compute nodes (perhaps in software so the resource manager can enable per-job if necessary). There are exceptions, particularly with KNC and, perhaps, KNL. The situation is likely different for POWER, but I don't have experience of it.

rythie · on June 20, 2018

You could just buy i5 based machines instead which don't have hyperthreading.

krylon · on June 20, 2018

From what I have seen, I think many dual-core i5 CPUs for notebooks support hyperthreading.

letsgetphysITal · on June 20, 2018

The U (ultra-low power) lines are indeed two-core with Hyperthreading. All others are 4 core without.

krylon · on June 20, 2018

Okay, that explains it. The only i5-based systems I have used (or still use) are notebooks.

berbec · on June 20, 2018

This changed with 8th-gen Intel. The low-powers are now dual-HT/quad/quad-HT i3/i5/i7 [1]/[2]/[3]

1: https://ark.intel.com/products/137977/Intel-Core-i3-8130U-Pr... 2: https://ark.intel.com/products/124969/Intel-Core-i5-8350U-Pr... 3: https://ark.intel.com/products/122589/Intel-Core-i7-8550U-Pr...

rythie · on June 21, 2018

Yes, you're right, I was thinking only about desktop CPUs, forgetting the laptop CPUs are different.

GrayShade · on June 20, 2018

There are some Linux HT benchmarks here: https://www.phoronix.com/scan.php?page=article&item=intel-ht...

dataflow · on June 19, 2018

Do you get the exact same performance characteristics by ignoring the extra virtual cores as you would have gotten if you could actually disable hyperthreading in the CPU via the firmware setup? Or does it result in some CPU resources becoming unusable that would otherwise be usable if HT were truly disabled?

notaplumber · on June 19, 2018

Operating systems can't disable HT/SMT in the same way as the BIOS/firmware can, but presumably it will be fine if the kernel only schedules the idle process on HT "cores", it will spend much of, or all its time in a lower power state (MWAIT? C-states?), presumably the CPU is smart enough to handle that.

dataflow · on June 19, 2018

I guess figuring out whether it's only "presumably" or actually "actually" was why I asked the question in the first place.

notaplumber · on June 19, 2018

Hmm. This thread on the OpenBSD lists suggests it may be adequate, of course disabling in the BIOS is certainly a better option, if you can.

https://marc.info/?t=152938773300027&r=1&w=2

blattimwind · on June 19, 2018

At least in previous generations some resources where statically shared, but most were dynamically shared.

Someone1234 · on June 19, 2018

Ouch. I will say though, Hyper-Threading is a lot less valuable these days than it was when it was first introduced (except for the few dual core CPUs still available).

When you have four-six-eight or more cores, there's less value in doubling that number. The gain is lower.

hermitdev · on June 19, 2018

Except the performance of hyper-threading today is far better than it was first introduced. I had a dual-socket P4 Xeon box w/ HT around 2003. Single-threaded performance with HT enabled was around 70% of what it was with HT disabled. Today, I think you'd see only about 95-98% of enabled vs disabled performance.

I don't have hard numbers to back this up, it's purely my personal experience/recollection. On my 2 socket P4 Xeon box, I disabled HT. On my current I7 6-core box, I have HT on.

derekp7 · on June 19, 2018

On the other side, a hyperthreaded CPU used to be about 10 - 30% gain, but in tests I've ran on recent hardware (HP DL380 Gen 10) hyperthreading gives around 70% more performance (the test I used was running pigz [parallel gzip] on a large file).

wumpus · on June 19, 2018

That's a great example of how hyperthreading's performance effects are extremely workload dependent.

moab · on June 19, 2018

It's still important to hide latency and saturate the memory controllers for programs with irregular memory accesses (e.g. graph algorithms), although the difference is not 2x, but something more like 10-15% over running without hyper-threading.

garganzol · on June 19, 2018

Depends on load. I run parallel integration tests on hyper-threaded machines and usually see 80% gains.

classichasclass · on June 19, 2018

The implication seems to be that other architectures are also soon to have SMT disabled by default. That would definitely hurt POWER, for example.

mrpippy · on June 19, 2018

I think the only other OpenBSD architecture that supports any SMT chips is sparc64 (like the US T1/T2). Unless an actual vulnerability is found, I don't see other OSes following this lead

temprature · on June 20, 2018

An "actual vulnerability" has been found. It's amazing that even after the lazy FPU fiasco, people think OpenBSD did this on a complete whim.

mrpippy · on June 20, 2018

I stand corrected. From the commit message this seemed much more speculative than the FPU vulnerability (where Theo admitted to being tipped off by someone under the embargo), but clearly it's more than just speculation.

https://www.blackhat.com/us-18/briefings/schedule/#tlbleed-w...

aade · on June 20, 2018

Why not? It’s arguably a way to make it slightly safer to run on Intel.

andreiw · on June 20, 2018

Also 64-bit Arm...

joesavage · on June 20, 2018

As far as I’m aware SMT in Arm cores is pretty uncommon actually.

equalunique · on June 20, 2018

I was going to submit this news from the source I learned it from, which has the novel peculiarity of coming from a site that's name is similar to this one: https://thehackernews.com/thn/2018/06/openbsd-hyper-threadin...

tynecomputers · on June 20, 2018

Does anyone know when they are going to patch this or is it a permanent fix?

epynonymous · on June 20, 2018

i didnt see this posed in the comments, but it was certainly tops on my mind. is this the same issue for linux kernel?

petee · on June 20, 2018

If they are using Hyper Threading, then yes, unless they already have a different architecture:

"We really should not run different security domains on different processor threads of the same core. Unfortunately changing our scheduler to take this into account is far from trivial."

aade · on June 20, 2018

The (recent) SPARC Hypervisor does a fair job at this. Fujitsu has an interesting implementation. But it would be conceivably difficult to do this with time sharing on Intel chips without exposing side channels. That kind of control should be supervisory and in control of the chip. I haven’t yet seen that on Intel, but I’ve heard there are some hardware manufacturers that are looking to do something like that.

kojon99 · on June 19, 2018

They should make it easier to find the diff behind all openbsd emails. I can’t find this one.

foodstances · on June 20, 2018

https://github.com/openbsd/src/commit/96c11352863a7f6240b4e5...

petee · on June 20, 2018

Although not ideal, and there is likely an easier way to do it in full CVS (but i lack those skills), but you can always go to their Web CVS and manually check the files listed in the commit:

https://cvsweb.openbsd.org/cgi-bin/cvsweb/

https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/amd64...

nbyouri · on June 19, 2018

[flagged]

Forbo · on June 19, 2018

I'd say it's probably closer to:

> OpenBSD finds yet another way to harden their OS.

This is a conscious choice to disable something that could potentially allow for an entire class of security vulnerabilities. I suppose a decent analogy could be they chose to amputate a damaged foot before it had time to turn gangrenous.

nbsd4life · on June 19, 2018

[flagged]

antoinealb · on June 19, 2018

Enormous cost? If you want to re-enable hyperthreading its just a sysctl away. I mean, OpenBSD is something you are expected to tweak anyway.

notsofastbuddy · on June 19, 2018

By continuing a security-first mindset that they've been establishing for decades?

zeth___ · on June 19, 2018

The most secure computer is a toaster.

The only way to minimize attacks is to have less capable language classes exposed to the outside world. The last time I checked they still have an http stack which is usually either turing complete or context sensitive.

RandomTisk · on June 19, 2018

Sounds like it's a default towards the more secure, but they're not preventing the user from enabling it. Not sure what the issue is..

DSingularity · on June 19, 2018

Ouch. Huge hit for performance.

Forbo · on June 19, 2018

From the commit message, it sounds like that might not necessarily be the case: "Note that SMT doesn't necessarily have a posive effect on performance; it highly depends on the workload. In all likelyhood it will actually slow down most workloads if you have a CPU with more than two cores."

AHTERIX5000 · on June 20, 2018

Wonder why, because of poor SMP scalability and coarse locking?

I've encountered some cases where SMT made performance worse such as with very optimized HPC libs but in general SMT can really help. Compiling projects got a nice boost when enabling HT on Intel's recent arch for example (all of this on Linux though, last time I checked OpenBSD its SMP perf was abysmal)

fer · on June 20, 2018

For I/O and memory bound processes, it makes it worse, saturating further buses that are already saturated. For regular. For CPU bound, it may help or not, depending on many factors, like cache/memory contention, nature of the operations...

Compilation can get boosts because while some threads are waiting for I/O others are crunching source files. Also the variety of computation is high enough so multiple threads don't overlap too much on functional unit usage. If you try to build from a filesystem in memory, you'll find way a less impressive speedup (if any).

AHTERIX5000 · on June 20, 2018

Yeah I've spent my time looking at perf counters and I'm aware how cache access patterns affect. But the statement was drastic enough to make me suspect there is something more OpenBSD specific going on.

mehly · on June 19, 2018

[flagged]

sctb · on June 19, 2018

If you won't stop posting snarky one-liners we'll ban the account.

https://news.ycombinator.com/newsguidelines.html

nbsd4life · on June 20, 2018

[flagged]

petee · on June 20, 2018

A) Whom? B) Why?

I can't see this flagged post, but the user in question has such quality comments as "hahahaha", "ouch", and "duhhh" -- most users here actually contribute to discussion, where as a few think this is reddit...

nbsd4life · on June 20, 2018

It's extremely punishing of negative-sounding comments. People who want to call out bullshit on the orange website end up doing it on other forums so it looks like they hate it a lot.

throwaway2048 · on June 20, 2018

Nobody here wants to look at generic comments that shit all over things, if you have something of value to say, say it, otherwise go away, nobody is going to miss you.

creo · on June 20, 2018

What scares me is that they do OS wide change based of wording "This can make", "And since we suspect" and "In all likelyhood" instead of doing actual tests. I know that open systems doesn't have required workforce, but doing changes based on subjective reasoning is slippery slope.

flurrything · on June 20, 2018

They care about making OpenBSD secure, not about producing security exploits.

Many OpenBSD devs are security researchers in academia. If they hear whisphers over beers that there are new Spectre attacks coming that exploit this or that, they might not be able to reproduce the exploit without putting a lot of work into it (it's research after all), but they might be able to prevent it by making a simple change, like disabling hyperthreading.

OpenBSD cares more about security than basically any other trade-off in OS design (performance, usability, ...), so it makes sense to me that they went this way. If you want a balance of security and performance, OpenBSD is not for you any ways.

detaro · on June 20, 2018

Did it scare you when your operating system started to support it, on the basis that it would "in all likelyhood" be fine?

For a system aiming at security, it's a completely valid choice to disable things that start to look questionable, even if it's not conclusively proven yet. Just like potential software vulnerabilities are patched even if nobody has demonstrated that they actually are exploitable yet.

monort · on June 20, 2018

If it's a response to LazyFP bug, then it's under embargo, you can't have a test yet.

gerdesj · on June 19, 2018

FFS: so far I've seen shit loads of "oooo - stuff <wave hands>" here from people who are clearly not experts or even understand the issues properly in this. Neither am I.

OP (and environs) has names on it that I have seen before and respect as knowing what the hell they are on about.