Hacker News new | past | comments | ask | show | jobs | submit login
OpenBSD disables Intel's hyperthreading due to security concerns (mail-archive.com)
478 points by mereel on June 19, 2018 | hide | past | favorite | 145 comments



> We really should not run different security domains on different processor threads of the same core. Unfortunately changing our scheduler to take this into account is far from trivial.

This suggests a long-term compromise solution where threads within a process can use hyperthreading to share a core, but threads in different processes can't. Given that hyperthreads share L1 cache, this might also be better for performance.


>This suggests a long-term compromise solution where threads within a process can use hyperthreading to share a core, but threads in different processes can't. Given that hyperthreads share L1 cache, this might also be better for performance.

Intuitively this may sound logical, however in practice it's often not the case. For many workloads putting two threads of the same program on a core ends up being worse than co-locating with threads from different programs. The reason is that two threads of the same program will often end up executing similar instruction streams (a really good example is when both are using vector instructions (these registers are shared between the two hyperthreads)).


In practice it sometimes is the case, though.

SMT/hyperthreading is complicated. If you have a workload dominated by non-local DRAM fetches, it's a huge win because when the CPU pipeline is stalled on one thread it can still issue instructions from the other.

If you have a workload dominated by L1 cache bandwidth, the opposite is true because the threads compete for the same resource.

On balance, on typical workloads, it's a win. But there are real-world problems for which turning it off is a legitimate performance choice.


> workload dominated by non-local DRAM fetches,

How often is that a polite way of saying "software that is inefficient"?


Often. But software is what software does, and a CPU that only worked well on "efficient" code will always fail when compared with one that runs the junk faster than the competition.

Also, to be fair: sometimes a DRAM fetch is just inherent in the problem. Big RAM-resident databases are all about DRAM latency because while sure, it's a lot slower than L1 cache, it's still faster than flash. I mean, memcached is a giant monument in praise of the pipeline stall, and it's hugely successful.


> But software is what software does, and a CPU that only

Indeed. It is arguably rational for Intel to take on the burden in a centralised place rather than expecting every two-bit software shop to to do it.

But then the existence of this kind of security issue shows that the added complexity is not always worthwhile. We might be forced to to accept that computers which actually behave well are a little bit slower than we thought. But in return they will be simpler and more amenable to software optimisation.


I'm not sure there is a correlation. I can think of many situations in which non-local DRAM fetches are more efficient and I can think of many other situations where the opposite is true.

Trees or hashmaps which use non-local DRAM fetches can be more efficient than a brute force linear search through a continguous array given a sufficiently high enough number of elements.

At the same time continguous arrays can be significantly more efficient than linked lists which use non-local DRAM fetches.


With most software, well most software is pretty inefficent and profits from HT. However there are a lot of reasons for that, writing in something interpreted because it is faster to develop and the software does not need to be very efficient in the first place would be one application. (Not to say that all Python/JS/etc is inefficient, just that software that needs to be efficient is precisely the kind were one would consider an unmanaged language.) Additionally, things like webservers or dbs often just don't know which piece of data they need next, simply because they don't know the next query, have a tendency to profit from HT, even though the software is hardly known for being inefficient.


FWIW, you mention databases, but even some database workloads can have better performance with HT turned off. I first learned this from a DBA at a former job when I was curious as to why they turned HT off. A member of the SQL Server team back in 2005 ran some experiments and found that you can get a 10% performance improvement in some workloads with HT off [1]. I don't know how much of that is still true today, however, as nearly all of my recent experience is PaaS in the cloud.

[1]: https://blogs.msdn.microsoft.com/slavao/2005/11/12/be-aware-...


> How often is that a polite way of saying "software that is inefficient"?

One could also say "software written with strong OOP patterns" because those are almost always written to benefit the developer later, rather than the CPU and RAM at runtime.


There are plenty of problems with poor mechanical sympathy.

To take an extreme example, traversing graphs is notorious. Cray and Sun iirc have some fascinating processors with many many hyperthreads because all the programs do is wait on dram but luckily there are lots of searches that can be done in parallel.


Typical workloads? What's that? People run hugely diverse workloads on cpus, and they change over time.


Building software, serving web pages, executing database queries, running a DOM layout, managing game logic... I mean, come on. You knew what I meant. Those are all tasks with "medium" cache residency and "occasional" stalls on DRAM. Anything that does a bunch of different things with a big-ish world of data.

Conversely: finding a task that is L1-cache-bound but does not frequently have to stall for memory is much harder. The only ones off the top of my head are streaming tasks like software video decode.


Oh, you meant typical for you.

One task that is L1 cache bound and does not frequently stall for memory (if you code it up well) is matrix multiply.


> Oh, you meant typical for you.

I'm pretty sure those are meant to be, and I think are, "typical" for the general purpose CPU in use, and thus the general case.

Both mobile and desktop CPUs will be doing DOM layout, DB queries (whether to SQLite or the registry or just the filesystem), and possibly computing game logic on a regular basis.


It's becoming popular to want to push machine learning tasks onto edge devices like mobile and desktop CPUs, for example apps that include some machine learning. Some of these machine learning algorithms do a lot of matrix multiplies.

"Typical" is highly varied, and it changes.

Edit: here's an example: Google brings on-device machine learning to mobile with TensorFlow Lite

https://thenextweb.com/artificial-intelligence/2017/11/15/go...


Would they be using mostly CPU for that, or would they offload it to the GPU or a dedicated chip? I would assume you would use your general purpose CPU only if all else wasn't available (and generally there's a GPU available on most end user devices these days).


If possible the GPU, but not all GPUs have either a library or enough documentation to write one. I’ve seen complaints about this issue on mobile GPUs for years, no idea how widespread it is now.

BTW, this is just one example algorithm that I picked because it does (on the cpu) what the person I replied to said was rare.


Running the model is much easier than training it. On power-constrained environments, DSPs can do it.


I don't know whether it's still true, but a couple of years ago a majority of the world's CPU cycles were spent sorting things.


That's an interesting claim. Do you remember the source?


> The reason is that two threads of the same program will often end up executing similar instruction streams

Why is that bad?


Your processor has a certain number of execution units which can actually execute individual instructions, maybe like 4 floating point units, maybe 8 arithmetic ones, and maybe 1 that can do vector processing (these numbers are not real, but are like, good enough for sake of message).

So the idea with SMT is that most of the time, lots of the execution units are unused because the thread a) isn't using them at all (e.g. a process to do encryption won't use the floating point units) and/or b) can't use them all because of how the program's written (for example, if I say 'load a random memory address, then add it to a register, then load another random memory address, then add it, etc' I'm going to be spending most of my time waiting for memory to be loaded.

SMT basically means that you run another program at the same time, so even if the encryption process can't use the floating point units, maybe there's another process that we can schedule that will.

However, imagine my encryption process can use 6 of the 8 arithmetic units. If I have 2 encryption processes scheduled on the same core, I have demand for 12 when there are only 8. So now I have contention for resources, and I won't see a speedup from using SMT.

Other comments mention registers and not execution units: I'm suspicious of this, since modern processors have many registers (for Skylake, 250+) which they remap between aggressively as part of pipelining. Maybe this is different for the SIMD units.

That said, I haven't looked at this stuff since university so could well be wrong on the execution unit vs register comparison.


The contention would actually not be on a per-EU level, but one level higher up. The reservation station has a bunch (~5-8) of ports and typically multiple EUs are connected to one port. Can't use one port for two different things at the same time.

Here's a simplified block diagram of a Skylake core: https://en.wikichip.org/wiki/intel/microarchitectures/skylak...


Thanks! Yeah figured I'd be wrong somewhere in there!


There's also some funkiness around the CPU cache, at least at one stage. If your two HT threads are working on the same data, there was a chance you'd get some great cache performance out of it. However the hyperthread when faced with a cache miss, can cause the cache to get evicted to be replaced with the data it needs. Under those circumstances, performance takes quite a nose dive as both threads are stomping over each other somewhat.

Hyperthreading can be a real mixed bag for performance, though generally good and a lot of engineering effort has gone in to making it shine. As ever it's strongly advisable that people benchmark real world conditions on a server, and it's worth giving a shot with hyperthreading turned on and off.


You are pretty much right. A couple of things to add: even hand optimized asm code wont be able to use all ports all the time with a single instruction steam; the biggest win for hyperthreading is filling the pipeline bubbles caused by memory loads out of L1 (there is only so mach that OoO scheduling can do on your average load)


So on a RISC architecture, this would happen even for non-similar programs, because the number of instructions is smaller? Or would they just duplicate the processing units?


The units aren't dedicated to esoteric instructions, you have functions like "small alu", "big alu", "fp adder", "address generator and memory load".

RISC will perform about the same and you can hyperthread one fine.


Different instruction streams use different registers. It’s like sharing a bathroom. I can shower while you brush your teeth. There’s more contention when we’re both trying to shower.


"both are using vector instructions (these registers are shared between the two hyperthreads)"

So I guessing GP meant there's going to be contention for those registers, and thus no speedup?


Taking a guess, but since they are running similar streams, they have similar loads at a specific time. Competition between main thread and hyperthread could hurt performance instead?


I'm not sure that would necessarily fix the problem definitively. Say you had a browser running web-exposed JavaScript on a thread. You could still finagle a Spectre-type information leak that way by having the JavaScript thread snoop other browser threads, assuming no other mitigations.


Don't most browsers run one process per page/tab nowadays?


Chrome does, Firefox does not (I've got 5 processes for a billion tabs.)


No, Chrome used to but now uses a heuristic to determine whether new tabs should be launched in their own process or share an existing process, as a memory usage mitigation strategy. I believe tabs from the same origin have a preference of sharing processes.


Firefox process per tab is behind a feature flag as it’s in testing still


I don't think the plan is to ever enable this in the comming few years. The current approach with a few tabs is much more memory efficient, which is why they've chosen it.


Indeed. I cannot open Google Drive in chrome on my OpenBSD box without crashing the tab from exhausting memory, but Firefox handles it no problem.


And it's a mistake.

Just recently I noticed that when Firefox loads multiple tabs of the same wordpress site, it starts hanging not unlike Firefox always used to hang. That's likely because it groups all of those same site pages under one process.

I've never experienced that with Chrome. This is why I hope Firefox eventually (ASAP) switches to one process per tab, too. I can handle the browser using an extra GB of RAM. I can't handle it hanging on me and frustrating me.

Instead of pushing for 30-40% lower memory than Chrome, I say they should push for 10% lower memory with the same sanboxed process per tab model.


Chrome does not use one process per tab. In fact, it does something very similar to what you say Firefox does.

http://dev.chromium.org/developers/design-documents/process-...

FWIW I do not have the problem you describe and I don't want Firefox wasting any more of my scarce memory, or for that matter, CPU.


> very similar

Not really. Chrome uses a lot of processes for isolation. Firefox uses about four so it can take advantage of multiple cores.


> I've never experienced that with Chrome

Right, but I doubt that it's for exactly the reason you think it is: Chrome doesn't blindly do "one process per tab" anymore, and hasn't for a bit.


You can enable it (as I have) on chrome://flags/#enable-site-per-process

Strict site isolation Security mode that enables site isolation for all sites. When enabled, each renderer process will contain pages from at most one site, using out-of-process iframes when needed. When enabled, this flag forces the strictest site isolation mode (SitePerProcess). When disabled, the site isolation mode will be determined by enterprise policy or field trial. – Mac, Windows, Linux, Chrome OS, Android


So, you're saying because you think you've discovered one case where there might be a problem, Firefox should completely change their architecture? And you're saying this in a discussion which frequently mentions how extremely varied workloads are?


No, man. He said Firefox should change their architecture, and he gave some kind of example.

After the way you seized on the word "typical", I kind of expected you to take words at face value. I didn't see any text to the effect that he thinks his say so is good enough.

Also, you're the one frequently mentioning how varied workloads are, and you don't constitute the discussion.

I'm going to go drink some cocoa to wash down this hook, line, and sinker I swallowed with your flame bait.


Not intended as flame bait, sorry if that's how it comes across to you.

It just so happens that people complaining about Firefox doing it wrong is a pretty common thing in Firefox threads on HN. And they usually have an example where it's really unclear if it's a problem for more than them or not. But, usually, they have a lot of advice about what the Firefox team should do. Whereas the Firefox team has telemetry data from most of their users.


The only thing I could find is a set of 'browser.tabs.remote.*' options, that are all enabled by default in FF 60.

That seems to indicate it is enabled, since the old option was 'browser.tabs.remote'? Or has it changed to something else now?


In theory marking threads even within the same process as part of a different 'security domain' shouldn't be impossible, though obviously it'd involve proprietary interfaces to the kernel at first.


.Net supports this (at least on Windows), it is called AppDomains: https://docs.microsoft.com/en-us/dotnet/framework/app-domain...


Once operating systems offered this mitigation mechanism, I'm sure browser vendors would use them.


Perhaps it makes more sense to require that all processes on an individual core share the same UID.

Browsers are particularly problematic, and it would be nice to alert the scheduler that a particular process is untrusted and extra care should be taken to sanitize caches before and after its time slice.


> Given that hyperthreads share L1 cache, this might also be better for performance.

If userspace thread writes something into a buffer, does some syscall initiating asynchronous work in the kernel wouldn't it be better for the kernel thread to be located on the same core instead of shuffling the data into another cache?


So... they "strongly suspect" (but don't know and haven't shown) there may be a Spectre-class bug enabled by current HT implementations and improving their scheduler is hard, so they'll pre-emptively disable HT outright on Intel CPUs now and others in the near future?

I'm not an OpenBSD user (and glad for it, if this is anything to go by), but I'm curious - is this really how they operate, or does this decision stand out?


> I'm not an OpenBSD user (and glad for it, if this is anything to go by), but I'm curious - is this really how they operate, or does this decision stand out?

I'm not a OpenBSD user either, I use FreeBSD whenever possible. However from listening to OpenBSD devs, via blogs, conferences, HN, etc, it seems that OpenBSD is an operating system built mainly for OpenBSD developers, their goals support this[1]. OpenBSD being useful for non OpenBSD developers is more of a secondary goal compared to how FreeBSD or Linux or any other OS handles it. Also OpenBSD is much more of a research operating system then other large successful OS(Linux, Windows, MacOS, FreeBSD, etc). Meaning OpenBSD cares way more about developing features and novel security mitigations then trying to maintain backwards compatibility like other operating systems do.

> So... they "strongly suspect" (but don't know and haven't shown) there may be a Spectre-class bug enabled by current HT implementations and improving their scheduler is hard, so they'll pre-emptively disable HT outright on Intel CPUs now and others in the near future?

The OpenBSD devs strongly suspected another Intel hardware bug a week or two ago, implemented a mitigation and deployed it. Turns out they were right[2].

[1]: https://www.openbsd.org/goals.html

[2]: https://www.bleepingcomputer.com/news/security/new-lazy-fp-s...


> Also OpenBSD is much more of a research operating system then other large successful OS(Linux, Windows, MacOS, FreeBSD, etc). Meaning OpenBSD cares way more about developing features and novel security mitigations then trying to maintain backwards compatibility like other operating systems do.

This is not the feeling I get from OpenBSD at all. They don't act like research. They aren't keen on implementing new features just for the sake of it, or just to try it out. A better description would be that they put correctness, security and maintainability first, and simplicity often comes as a nice side effect. Deprecating old, unused features is just a consequence of striving to decrease complexity by trimming your code base. OpenBSD is one of the few OS where the number of lines of code is not skyrocketing to unmanageable numbers.


> it seems that OpenBSD is an operating system built mainly for OpenBSD developers

Honestly, I would say that this is true of many open source projects. It's one of the reasons that open source development tools are so good on Linux, but end user applications fall so far behind. It's also why documentation and usability tend to be much worse. When your system is based on volunteering, the work that gets done tends to be the stuff that interests the workers.


> it seems that OpenBSD is an operating system built mainly for OpenBSD developers

I don't see it that way at all. Whenever I have to work on a project where security is top concern, I always look at OpenBSD as an option. In the Linux world, the equivalent would be the Openwall GNU/*/Linux project. Not something for an average user, but to say it's used by its devs mainly is off by an order of magnitude.


> The OpenBSD devs strongly suspected another Intel hardware bug a week or two ago, implemented a mitigation and deployed it. Turns out they were right[

In fairness, my impression from the video of Theo's presentation was that they were tipped off by someone under embargo.


> is this really how they operate,

OpenBSD is a research operating system and security is a core component of their research. Pro-actively mitigating security risks before exploits appear is one way to improve security that has worked in the past: vulnerabilities having been fixed before they appeared.

Because they give reasonable deadlines for companies to fix security bugs (~90 days), they are kept out of the loop by hardware vendors like Intel who requested 1 year to fix meltdown.

Being in the dark, if you see some suspicious behavior, either you protect yourself from it, or you might wake up the next day and Intel will have released a new "We are sorry" post and your users would be screwed.

So this is pretty much how they operate, and if an OS is as security conscious as OpenBSD, there isn't really a different way to operate.

Note that disabling hyper threading to mitigate CPU flaws isn't anything new either: this had to be done for AMD's Ryzen because of hardware bugs last year anyways - https://www.extremetech.com/computing/254750-amd-replaces-ry...


> if an OS is as security conscious as OpenBSD, there isn't really a different way to operate

That's really what I was getting at with my question. There's no such thing as absolute security, it is a set of tradeoffs between usability, performance and specific security guarantees. Is there a point where the OpenBSD developers would say "okay, this is a (potential or confirmed) security bug, but the mitigation is just too costly in this case"?

In the post-Spectre world, it's not completely inconceivable to contemplate the possibility that, in order to retain the security guarantees most people thought they had, one might have to give up a substantial subset of the benefits of speculative execution in out of order processors. For some workloads that might mean up to two orders of magnitude in performance. I know roughly where the common operating systems would draw the line and I certainly know where I would, for my own usecases. I'm just curious about how OpenBSD works in this regard.


It's a sysctl that can be toggled. If you want hyperthreading and don't care about the security concerns, you can turn it back on. The commit message specifically mentions that this was implemented because many modern machines don't allow hyperthreading to be disabled in the BIOS.

You'll find out in a couple of months why they did this. I hope you'll remember the comments you've written when that happens, because you might learn something about what OpenBSD means when they say they "strongly suspect" something.


As I already implied in the parent post, I've no doubt we'll see more Spectre-like issues and I don't find it hard to believe that there's yet another embargo going on, that something's leaked again and that the OpenBSD people have picked up on it. I'm just curious about how many tradeoffs in performance and usability OpenBSD would be willing to make (in the form of default settings, I'm aware this one can be toggled) in the pursuit of security and how they do the cost/benefit analysis, because their process seems vastly different from that of the main OS vendors.

On the latter point, I did take "strongly suspect" at face value rather than code for "we know for sure, just can't disclose it all yet" because I'm not familiar with the development culture of OpenBSD.


Though it doesn't answer your question directly, the project goals are here: https://www.openbsd.org/goals.html. Security is important, but not the only goal. Notice that the word "performance" does not appear on this page.

If you want performance above all else, OpenBSD is not for you.


Nice backpedaling.


There's no backpedaling, I still think their approach is crazy for any practical usecase I care about. Beyond that, I simply expressed curiosity to learn more about the extent of the tradeoffs they're willing to make.


They don't set defaults for the average use case when security is involved. This is the difference between:

"Secure by default", turn knobs if you need more speed

"Fast by default", turn knobs if you need more security

Not that the knobs will be always available for each design decision, but sometimes they are there and you can turn them at your own risk. It probably would be wise to understand the consequences. Some people will prefer the peace of mind of knowing that safe defaults are in place if they don't change anything. Those will probably align with OpenBSD here. Some people believe that security is something you bolt on afterwards. Those definitely won't like OpenBSD design decisions.


> They don't set defaults for the average use case when security is involved.

They certainly set the defaults for some usecase, it just happens to be more security-biased than most. They don't ship an OS for an airgapped toaster, so it can't ever literally be "secure by default", it's just a compromise on the tradeoff scale that's more security-oriented than most. It still needs to be usable (for some set of people) and it still has to achieve some baseline level of performance to be usable - I was trying to get some clarity on the latter.


You seem to imply that security will always result in less speed or less usability, and that is not always the case. The thing with OpenBSD is that security will always come first between the three values when they clash, but they don't always clash. And yes, it is the most secure OS out there if you are to judge by the statistics over its history. I'd say that only two remote holes in so many years pretty much grants them the "secure by default" label. Maybe looking from outside it seems like security is all they think about, but my impression is that it is more about correctness and simplicity, and that security comes as a consequence. As an example of simplicity, I am not personally aware of any install that is so simple as theirs. Except maybe ubuntu's, but then with ubuntu you end up with a mess of interdependent packages and it will be a hell to uninstall shit you don't need.


> You seem to imply that security will always result in less speed or less usability, and that is not always the case.

Certainly not always, but often enough and more so than usual with Spectre and Meltdown.

> As an example of simplicity, I am not personally aware of any install that is so simple as theirs. Except maybe ubuntu's, but then with ubuntu you end up with a mess of interdependent packages and it will be a hell to uninstall shit you don't need.

That's an interesting point. How does it compare in terms of simplicity to the other BSDs (FreeBSD and Dragonfly) or something like Arch Linux?


> That's an interesting point. How does it compare in terms of simplicity to the other BSDs (FreeBSD and Dragonfly) or something like Arch Linux?

I'm not familiar with FreeBSD and DragonFly, but I have used NetBSD in the past and a bit of Arch Linux. The system management is way more consistent in OpenBSD, things generally work and are more reliable. The package management system is a pleasure to work with, and when you want to remove unused packages or dependencies of previously installed packages, it's simple and consistent. It actually works. When you are configuring something, most of the time there is one single way to do it, and it's well documented. And the simplicity can't really be compared to Arch Linux. Fire up a vm and install OpenBSD to it, just for the experience. It's mostly just accepting the defaults, extremely simple.


Fair enough, thanks. I'll try it out just to see what it's like.


There is a thread at lobste.rs that demonstrates the setting, immediately idling half of the CPUs presented in top:

https://lobste.rs/s/ifr52b/openbsd_disables_intel_s_hyperthr...


> Because they give reasonable deadlines for companies to fix security bugs (~90 days), they are kept out of the loop by hardware vendors like Intel who requested 1 year to fix meltdown.

Is 90 days really a reasonable timeframe to fix something like meltdown? I agree with your whole comment in general, but hardware/microcode issues at Intel's scale are a different beast than some buffer overflow.


Not to forget, Intel have said this collection of exploits cannot be fixed in microcode, only mitigated - it's so fundamental to the CPU hardware that it requires redesigning the chip to fix properly. The mitigation has led to serious performance hits and further problems down the line. And the permanent fix requires eventually replacing the CPU.

As has been previously noted, Meltdown, Spectre and related exploits came about because nobody ever thought it would be possible to access the cache from other processes. 1 year is probably quite reasonable for Intel to redesign the architecture (in fact, is probably going to stretch to 2 years or more), but people need security fixes now. In this case, it looks like OpenBSD are taking the right approach.


OpenBSD and DragonFly had their meltdown mitigations done in a couple of weeks from when it was publicly disclosed. If it's good enough for projects developed by handfuls of volunteers, it's good enough for the multi-hundred billion dollar megacorps.


So you're saying if a small software operation can put in place mitigations in X time then a absolutely massive hardware operation with hundreds of product lines consisting of some of the most advanced, ridiculously complex, slow to develop chips in the world can push a fix to many billions of devices in X time as well, whilst ensuring backwards compatibility and reducing the performance impact across the mind mindbogglingly large number of different workloads that their chips are used for.

Makes sense. 90 days is more than enough.


Implementing KPTI didn't involve any of what you just said.


Exactly. So how can you compare the work that's required from Intel to patch the flaw in new designs + mitigate it with microcode vs the work that's required from 'projects developed by handfuls of volunteers'?


> I'm not an OpenBSD user (and glad for it, if this is anything to go by),

.. yet most likely if interacting with unix systems rely on OpenSSH.

why would relying on a feature from a vendor with known processor security issues, including undisclosed hidden application processors, for a feature which has marginal performance improvement and in some cases degredation be a preferable stance?

at best ambivalence towards this decision would be the position to take, esp. given the very recent 'oh hey fpu registers are also a problem' "discovery" which they were entirely correct about..


Yeah, they are fairly risk-averse and not really performance oriented, so this decision feels pretty much in line with their practice.


> they "strongly suspect" (but don't know and haven't shown)

Some security professionals seem to insist on having a proven exploit before they act. Doesn't that seem like poor decision-making? Their job is to provide security, not to secure proven exploits - the latter is a means to an end. If there are threats from unknown exploits, and there certainly are, then it seems that they need techniques to address unknown exploits. One of those techniques is expert analysis of potential threats.


You may well not be in the target market (if your comment is anything to go by) but yes, their entire appeal rests on being very conservative with security decisions.


Maybe they've got a heads up that there are more spectre like bugs incoming but not enough information to actually mitigate them.

CPU bugs seem to be a rich vein to mine at the moment.


> So... they "strongly suspect" (but don't know and haven't shown) there may be a Spectre-class bug enabled by current HT implementations

Spectre is about a) leaving side-effects of misspeculation in shared resources, and b) bandwidth contention (between a misspeculated instruction stream and an attacker) to shared resources.

It is trivially obvious that HT exacerbates Spectre-class bugs, as the entire raison d'être to HT is to share pipeline resources. How quickly information can be leaked can be up for debate, but it's definitely non-zero.



Thanks for the link! I should've been more precise: the question in my mind is how many kbits/second of arbitrary target memory can be leaked. That's what made Meltdown/Spectre so scary was that the entire kernel memory could be dumped on the order of hours.


I would expect them to choose security over performance, that is how they operate. Microsoft would sweep this under the carpet, that’s what they do.


> disable HT outright

That would describe it if they...disabled it outright.

But they made HT user configurable, just like any other performance tuning knob.


I've never trusted hyperthreading for workloads I haven't tested. Sometimes it's faster, often it's slower. Beyond that, I've been suspicious of its security implications from day one. My first trip through the BIOS on a personal machine always includes turning it off.


Can you give example of where it was slower for you with HT enabled?


Running finite element models with MSC NASTRAN, basically heavy matrix math. Matrices were NxN, N was around 10 million. This was on a server with 36 cores and a half terabyte of RAM, purchased in 2014.

Also seen Erlang workloads where you could get a bit of throughput increase with your VM scheduler scheduling more threads than your physical cores (so starting to use HT) but the latency would spike and become very unpredictable, which was a bad tradeoff for the use case.


HPC workloads normally at least won't benefit and probably take a hit from HT on Xeon-ish hardware at least. It's normally turned off on HPC compute nodes (perhaps in software so the resource manager can enable per-job if necessary). There are exceptions, particularly with KNC and, perhaps, KNL. The situation is likely different for POWER, but I don't have experience of it.


You could just buy i5 based machines instead which don't have hyperthreading.


From what I have seen, I think many dual-core i5 CPUs for notebooks support hyperthreading.


The U (ultra-low power) lines are indeed two-core with Hyperthreading. All others are 4 core without.


Okay, that explains it. The only i5-based systems I have used (or still use) are notebooks.



Yes, you're right, I was thinking only about desktop CPUs, forgetting the laptop CPUs are different.



Do you get the exact same performance characteristics by ignoring the extra virtual cores as you would have gotten if you could actually disable hyperthreading in the CPU via the firmware setup? Or does it result in some CPU resources becoming unusable that would otherwise be usable if HT were truly disabled?


Operating systems can't disable HT/SMT in the same way as the BIOS/firmware can, but presumably it will be fine if the kernel only schedules the idle process on HT "cores", it will spend much of, or all its time in a lower power state (MWAIT? C-states?), presumably the CPU is smart enough to handle that.


I guess figuring out whether it's only "presumably" or actually "actually" was why I asked the question in the first place.


Hmm. This thread on the OpenBSD lists suggests it may be adequate, of course disabling in the BIOS is certainly a better option, if you can.

https://marc.info/?t=152938773300027&r=1&w=2


At least in previous generations some resources where statically shared, but most were dynamically shared.


Ouch. I will say though, Hyper-Threading is a lot less valuable these days than it was when it was first introduced (except for the few dual core CPUs still available).

When you have four-six-eight or more cores, there's less value in doubling that number. The gain is lower.


Except the performance of hyper-threading today is far better than it was first introduced. I had a dual-socket P4 Xeon box w/ HT around 2003. Single-threaded performance with HT enabled was around 70% of what it was with HT disabled. Today, I think you'd see only about 95-98% of enabled vs disabled performance.

I don't have hard numbers to back this up, it's purely my personal experience/recollection. On my 2 socket P4 Xeon box, I disabled HT. On my current I7 6-core box, I have HT on.


On the other side, a hyperthreaded CPU used to be about 10 - 30% gain, but in tests I've ran on recent hardware (HP DL380 Gen 10) hyperthreading gives around 70% more performance (the test I used was running pigz [parallel gzip] on a large file).


That's a great example of how hyperthreading's performance effects are extremely workload dependent.


It's still important to hide latency and saturate the memory controllers for programs with irregular memory accesses (e.g. graph algorithms), although the difference is not 2x, but something more like 10-15% over running without hyper-threading.


Depends on load. I run parallel integration tests on hyper-threaded machines and usually see 80% gains.


The implication seems to be that other architectures are also soon to have SMT disabled by default. That would definitely hurt POWER, for example.


I think the only other OpenBSD architecture that supports any SMT chips is sparc64 (like the US T1/T2). Unless an actual vulnerability is found, I don't see other OSes following this lead


An "actual vulnerability" has been found. It's amazing that even after the lazy FPU fiasco, people think OpenBSD did this on a complete whim.


I stand corrected. From the commit message this seemed much more speculative than the FPU vulnerability (where Theo admitted to being tipped off by someone under the embargo), but clearly it's more than just speculation.

https://www.blackhat.com/us-18/briefings/schedule/#tlbleed-w...


Why not? It’s arguably a way to make it slightly safer to run on Intel.


Also 64-bit Arm...


As far as I’m aware SMT in Arm cores is pretty uncommon actually.


I was going to submit this news from the source I learned it from, which has the novel peculiarity of coming from a site that's name is similar to this one: https://thehackernews.com/thn/2018/06/openbsd-hyper-threadin...


Does anyone know when they are going to patch this or is it a permanent fix?


i didnt see this posed in the comments, but it was certainly tops on my mind. is this the same issue for linux kernel?


If they are using Hyper Threading, then yes, unless they already have a different architecture:

"We really should not run different security domains on different processor threads of the same core. Unfortunately changing our scheduler to take this into account is far from trivial."


The (recent) SPARC Hypervisor does a fair job at this. Fujitsu has an interesting implementation. But it would be conceivably difficult to do this with time sharing on Intel chips without exposing side channels. That kind of control should be supervisory and in control of the chip. I haven’t yet seen that on Intel, but I’ve heard there are some hardware manufacturers that are looking to do something like that.


They should make it easier to find the diff behind all openbsd emails. I can’t find this one.



Although not ideal, and there is likely an easier way to do it in full CVS (but i lack those skills), but you can always go to their Web CVS and manually check the files listed in the commit:

https://cvsweb.openbsd.org/cgi-bin/cvsweb/

https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/amd64...


[flagged]


I'd say it's probably closer to:

> OpenBSD finds yet another way to harden their OS.

This is a conscious choice to disable something that could potentially allow for an entire class of security vulnerabilities. I suppose a decent analogy could be they chose to amputate a damaged foot before it had time to turn gangrenous.


[flagged]


Enormous cost? If you want to re-enable hyperthreading its just a sysctl away. I mean, OpenBSD is something you are expected to tweak anyway.


By continuing a security-first mindset that they've been establishing for decades?


The most secure computer is a toaster.

The only way to minimize attacks is to have less capable language classes exposed to the outside world. The last time I checked they still have an http stack which is usually either turing complete or context sensitive.


Sounds like it's a default towards the more secure, but they're not preventing the user from enabling it. Not sure what the issue is..


Ouch. Huge hit for performance.


From the commit message, it sounds like that might not necessarily be the case: "Note that SMT doesn't necessarily have a posive effect on performance; it highly depends on the workload. In all likelyhood it will actually slow down most workloads if you have a CPU with more than two cores."


Wonder why, because of poor SMP scalability and coarse locking?

I've encountered some cases where SMT made performance worse such as with very optimized HPC libs but in general SMT can really help. Compiling projects got a nice boost when enabling HT on Intel's recent arch for example (all of this on Linux though, last time I checked OpenBSD its SMP perf was abysmal)


For I/O and memory bound processes, it makes it worse, saturating further buses that are already saturated. For regular. For CPU bound, it may help or not, depending on many factors, like cache/memory contention, nature of the operations...

Compilation can get boosts because while some threads are waiting for I/O others are crunching source files. Also the variety of computation is high enough so multiple threads don't overlap too much on functional unit usage. If you try to build from a filesystem in memory, you'll find way a less impressive speedup (if any).


Yeah I've spent my time looking at perf counters and I'm aware how cache access patterns affect. But the statement was drastic enough to make me suspect there is something more OpenBSD specific going on.


[flagged]


If you won't stop posting snarky one-liners we'll ban the account.

https://news.ycombinator.com/newsguidelines.html


[flagged]


A) Whom? B) Why?

I can't see this flagged post, but the user in question has such quality comments as "hahahaha", "ouch", and "duhhh" -- most users here actually contribute to discussion, where as a few think this is reddit...


It's extremely punishing of negative-sounding comments. People who want to call out bullshit on the orange website end up doing it on other forums so it looks like they hate it a lot.


Nobody here wants to look at generic comments that shit all over things, if you have something of value to say, say it, otherwise go away, nobody is going to miss you.


What scares me is that they do OS wide change based of wording "This can make", "And since we suspect" and "In all likelyhood" instead of doing actual tests. I know that open systems doesn't have required workforce, but doing changes based on subjective reasoning is slippery slope.


They care about making OpenBSD secure, not about producing security exploits.

Many OpenBSD devs are security researchers in academia. If they hear whisphers over beers that there are new Spectre attacks coming that exploit this or that, they might not be able to reproduce the exploit without putting a lot of work into it (it's research after all), but they might be able to prevent it by making a simple change, like disabling hyperthreading.

OpenBSD cares more about security than basically any other trade-off in OS design (performance, usability, ...), so it makes sense to me that they went this way. If you want a balance of security and performance, OpenBSD is not for you any ways.


Did it scare you when your operating system started to support it, on the basis that it would "in all likelyhood" be fine?

For a system aiming at security, it's a completely valid choice to disable things that start to look questionable, even if it's not conclusively proven yet. Just like potential software vulnerabilities are patched even if nobody has demonstrated that they actually are exploitable yet.


If it's a response to LazyFP bug, then it's under embargo, you can't have a test yet.


FFS: so far I've seen shit loads of "oooo - stuff <wave hands>" here from people who are clearly not experts or even understand the issues properly in this. Neither am I.

OP (and environs) has names on it that I have seen before and respect as knowing what the hell they are on about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: