Hacker News new | past | comments | ask | show | jobs | submit login
Core scheduling lands in Linux 5.14 (lwn.net)
375 points by phab on Nov 2, 2021 | hide | past | favorite | 72 comments



This is such a great post, I'm surprised it doesn't have more traction but maybe people aren't into reading longform plain text.

"spamming Internet users at scale requires a lot of parallel activity, after all. If those processes can be segregated so that all siblings of any given core run processes from the same customer, we can be spared the gruesome prospect of one spammer stealing another's target list — or somebody else's private keys."


Maybe I'm just being too humorless, but I found the quoted passage a bit too dismissive of all the good things that are being done in cloud environments. Some of us are developing products that actually make people's lives better, and we're selling those products as SaaS, but we're not yet big enough to operate dedicated hardware with the same level of reliability and security that, say, AWS can achieve, so we run in a shared cloud environment. Anyway, the constant cynicism in online communities like this one, and reflected in that part of the OP, gets to me.


We also do useful things in cloud/HPC which touches people's lives, but a little joke here and there doesn't hurt IMHO.

Maybe I'm too used to stab myself and laugh at it.


Yes you're being too humourless :)


Such dry humor is one of the best things about Jonathan Corbet's writing style.


The Linux kernel community has always had a bit of a suspicious attitude towards commercial use. This attitude long predates the cloud / SaaS and reflects Linux’s origins (and Linus Torvalds’ world view) inside Scandinavian academia.


I agree, especially since this huge change was a prime example of how open source is incredible ! This work was made possible because engineers working for competitors such as DigitalOcean, Google, Oracle and Microsoft worked together with the kernel community to find an acceptable software solution to a hardware problem that impacts a lot of people.


Technical writing with a sense of humor is great. Highly recommended: the user manual of the SkyRC MC3000 Li-Ion charger.


Link to PDFs [1] (scroll down, EN and DE it seems). Seems like an amazing device, too. Too bad it could only charge one at a time, so I guess you'd buy multiple of these on a tech dept.

[1] https://www.skyrc.com/MC3000_Charger


> Too bad it could only charge one at a time, so I guess you'd buy multiple of these on a tech dept.

The phone app screenshots show three charge one discharge cycle at the same time. It's a 4 channel charger which can work even with a single battery.

Currently I use IKEA's battery chargers*, but if I gonna need something more advanced, gonna look at this one.

*: IKEA's battery chargers are no slouches either. They're very intelligent with battery fault detection and very good charging characteristics.


Highly recommended, I have a whole slew of consumer and pro-sumer Li-Ion chargers and this one is by far the best, it's not the cheapest but it works very well.


What do you mean with one at a time? I've been using one of these chargers for a couple for years now and can charge multiple batteries at the same time. Even different types are not a problem. (nimh / liion)


What I mean is the very same type is going to work only once since you cannot convert it to twice the type you need. So in a work environment where you use these, you might end up with say two of these to load two of such rechargeable batteries at the same time. Whereas 'normal' chargers for things like AA and AAA have multiple spots for the same battery. Like the other comment, I also currently use IKEA's charger (my wife bought it), and it can charge a lot of batteries at the same time.


It charges up to 4 batteries at the same time (or discharges, or prepares them for long term storage etc).


I'm sorry I don't really get what you're saying. You can charge any (supported) battery type in any of those slots regardless of how many you want to charge at the same time. I can charge 4 eneloops simultaneously or 18650 or whatever...


Thanks, also jacquesm, I get/see it now. Its springs which adapt to the battery size. Neat!


Yes, most of these have that feature, what's unique about the MC3000 is that it allows you to create your own charge profiles ('programs') which you can then use to perform various operations on single batteries of batches of them. For instance, you can move batteries to a target voltage, which is wise to do before welding a pack with parallel cells (unless you like sparks, and if the difference is large enough, burning up interconnects).


I still remember the fun I had 30 years ago reading the t.c. electronics 2290 digital delay user manual.


Haha yeah the good old MC3000. I love this charger, it makes a simple task a little more exciting and sometimes (a bit) unnecessarily complicated - but at the least you've got full control about charging those damn batteries!


That's a fine charger too. Good hardware, good features, humorous user manual. A splendid combination, indeed.


A majority of long form writing is extremely low information density and, even worse, just designed to retain attention as long as possible. I now dislike majority of news articles which starts with pseudo-literary description of anecdot and a main detail is revealed 3/4 of the way in sentence or two. It’s purely filler and like sugar it has a purpose. I want all long form writing to have sub title to tell the core of story in one line, followed by abstract, or let Twitter guy summarize it for me.


This puzzled me until I realized you're talking about mainstream news-type sites. I agree on that, but these barely qualify as writing in the first place, since none of the traditional goals apply. It's like if the only kind of video you've seen are ads and you then proceed to complain about the info density of video as a whole. Though most video on YouTube also sucks, some things are very well expressed in video, like the documentaries An Inconvenient Truth or Blue Planet II.

For me "long-form writing" brings to mind textbooks, LWN, some bug report emails, some HN comments. These are a different category of writing. I don't think you should sit around to wait for someone else to summarize these for you -- that attitude must be terrible for you in the long run.


A lot of books are just as bad, especially mainstream non-fiction. May be it's because they're typically written by journalists, but I often see the same awful New Yorker style of spending three pages on a personal story of some scientist in an article about quantum physics.


That's Sturgeon's Law[0]. Most writing isn't very good, and most writers irritate readers with paragraphs of navel-gazing that tries and fails at being profound.

[0] https://en.wikipedia.org/wiki/Sturgeon%27s_law


Yes, and I have the same issue with videos: 30 min videos with something that fits in 1 written sentence. Ugh. People are so afraid of reading (or are just very slow readers?); many even here won't read beyond the first line of this comment and if they comment, they comment only on that line often).

But... this does not apply to this post. It is not that long, not low density and does not have fillers or sugar.


I have exactly this issue with video, moaned about it here a few weeks ago -- and another HN reader explained, very pithily why the trend has occurred: money. It's much easier to monetise a YouTube video than it is an easy, or a blog post. Video impressions get more currency/eyeball even outside of YouTube. Our lack of ability to skim read or rapidly search for information doesn't enter the calculus.


These days I'm supposed to watch a 30 min video in order to check if a Wikipedia source really contains the quoted statement.


I guess that's what made Twitter rise in popularity as a source of news: difficult to stuff much filler in 140 characters.


> longform plain text

Lol it's just a thousand words!


Anything over 280 characters is "longform" in 2021.


OG 140 ;-)


That's very fast in adding the feature to kernel to address the loss of performance due to Sprectre.

Assigning OS threads of the same process to the hyperthreads in the same core is a good thing anyway. The threads probably share many data in the process and can benefit from the shared cache in the core.


Everything is cool, except for the naming. At first I thought this was about implementation of Intel's new architecture.


Is there not a way to pin processes to cores in linux already, and/or why it cannot be used to achieve a similar thing (pin user1 to cores 0,1 and user2 to cores 2,3)?


Core pinning exists but it requires the administator to manually assign processes to specific cores. Core scheduling lets you group trusted processes together and then the OS can figure out which cores can run which processes dynamically. Also, core scheduling does not permit userspace attackers to "game the system" and target a specific core they want to attack.


Well can't software do that automatically? I would love a piece of software that could reserve a couple of cores for games, another couple of cores for firefox, etc. dynamically based upon some settings.


On a computer that you control completely, in Linux you can use taskset to launch programs on whichever cores you want and to migrate already running programs to whichever cores you like and there is no obstacle that prevents you to use this in some scripts to automatically implement any policies you desire.

I frequently use taskset, because otherwise Linux migrates continuously the process between cores, which can degrade the performance of the programs that do some computations with a long duration, unlike the programs that are mostly waiting for events to happen.

This new feature has a different purpose, it is intended for multiuser servers, to enable the secure partitioning of the runnable threads into groups that can be scheduled on different cores, so that they will not be able to interfere with each other, even when they would have intended that.


Security problems probably. I could only imagine the new spectre vulns if you got to pick your own core.


Sure, you could use cgroups for that.


A process running httpd httpd is pinned to CPU X. And it forks right away 10 worker sub-processes. Are all sub-processes supposed to run on CPU X?


In Linux, the CPU affinity (i.e. the list of cores on which a thread may be run) is inherited by any child process or thread, from its parent.

Nevertheless, the CPU affinity of any process or thread can be changed at any time with sched_setaffinity() so the httpd process itself could run its children on different cores, if it wanted so, but it is unlikely that any httpd program does this.

The new feature that is discussed in this thread will limit the cores on which the threads of a user can be scheduled to those having the same cookie, so a process will no longer be able to reschedule its children to run on the same core as the threads of another user.


Yes? If CPU X is the only trusted CPU, or you are only trusted to run your software on that CPU, then everything should run on that core. If you rent 12 of 128 cores, then it may run on any of those 12 cores.


The first link in the linked article mentions Linus saying (in 2019) that performance needs to be better than simply running with SMT disabled.

There is no mention of performance though. How is it? Presumably it is better!! And if so, this sounds like a concept that the OpenBSD community would be interested in since they prefer SMT disabled for security reasons.


Yea, I'm curious what they even mean by 'performance' in this context.

It would obviously impact whole-system performance, since this setting would intentionally limit what can run on each core leading to some idle cycles. That's expected though, and is merely a consequence of using this feature for security.

I'm curious if it measurably slows down the scheduler and impacts performance from there.


      int prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5);
Is there a reason why a more idiomatic void * to struct wasn't used for the args?


It's a leaky abstraction over the underlying system call mechanism. Because the user stack lives in user space and accessing it from the kernel requires the same dance as any other access to user memory, system call arguments are instead passed in registers where they are immediately available to the kernel with no possibility of faults or TOCTTOU holes.

Here 'unsigned long' is just a convenient stand-in for "generic register-sized argument".

A "void * pointing to struct" requires a copy from user space. If a particular prctl() does need a struct, it can certainly stuff a pointer into one of those 'unsigned long' parameters (in the kernel environment, a pointer can be converted to and from an unsigned long without concern).


> Here 'unsigned long' is just a convenient stand-in for "generic register-sized argument".

That wouldn't be register-sized on x86 would it?


On all Linux architectures, "unsigned long" and "long" are always register-sized. For 32-bit architectures, "long" is 32 bits (the same as "int" which is always 32 bits), and for 64-bit architectures, "long" is 64 bits (the same as "long long" which is always 64 bits). This is different from Windows, in which "long" is always 32 bits, even on 64-bit architectures (and you have to use "long long" if you want 64 bits).

Therefore, Linux-only code (including the Linux kernel itself) commonly uses "long" or "unsigned long" to mean "register-sized" and "large enough to fit a pointer"; portable code normally uses "size_t" or "uintptr_t" for that.


It's not even just Linux - way back in the mid 90s there was an agreement among UNIX vendors to use the "LP64" model for 64 bit architectures, and ILP32 was already the de facto standard for 32 bit UNIX-likes, so "long can hold a pointer" is somewhat of a tradition in the POSIXy world.


Thanks, yeah. Any idea why they still do this instead of switching to `uintptr_t`?


Afaik the kernel is built as C89 and uintptr_t was introduced in C99.


Damn. I'd have they they could just typedef it manually, but interesting, thanks.


It's not even about how the kernel is built. The userspace-side system call declarations need to be #include-able in C89 user code, so they can't use C99 types.


The minimum size for unsigned long is 4 bytes


Oh I didn't realize its size varies on x86 and x64... I thought it's always 64-bit under compilers that target Linux! Today I learned...


There is a difference between 'long' and 'long long'. I believe the latter is always 64 bits on Linux.


I'm indeed aware of that one :-)


I think the size can vary by compiler rather than just by architecture.


Yeah it can but in practice systems have a canonical model in their official headers.


And yet msvc and gcc disagree on sizeof(long) on AMD64.

Edit: but of course you meant OS with system, not hardware architecture :)


MSVC on... Linux AMD64?


:) to be fair I did note my misunderstanding in my edit.


Because it is much cleaner to name the arguments rather than to have a void* to what may or may not be the same struct floating somewhere in space.


Because this is the most general signature possible (in general syscalls can have up to 6 arguments because the kernel supports x86-32 which has only 8 general purpose registers of which one stores the syscall number, one the stack pointer which cannot be reused due to signals in case of no sigaltstack, and the remaining 6 are available).

Depending on option, the args can be pointers to structure.


prctl is responsible for setting certain privileged CPU registers on certain platforms. I can imagine some benefit of having the values already loaded into registers for such a brief call instead of having to go out to user-controlled memory (which needs to be accessed carefully within the kernel).

It’s also worth noting that most older Linux system calls do not follow that pattern in general, so I don’t know that I would consider void* idiomatic for system calls until recently.


> When one sibling is executing, the other must wait. SMT is useful because CPUs often go idle while waiting for events — usually the arrival of data from memory. While one CPU waits, the other can be executing

Is this accurate? I was under the impression SMT gains are not from running other threads when one is blocked (preemption is a old feature) but the processor having a multi stage pipeline so that the net number of instructions that are executed per cycle is more than 1 (closer to 2 in the above example)


No, it's not. SMT is specifically when you're running two threads simultaneously on a superscalar processor that can execute multiple instructions in a single cycle. During SMT one thread might be using execution reasources the other thread doesn't need further increasing performance beyond just having something to do during a cache miss.

What the article describes is temporal multithreading where only one thread executes at a time.

https://en.wikipedia.org/wiki/Multithreading_(computer_archi...


As someone else said, this is superscalar, though these days companies refer to the number of concurrent out-of-order pipelines as Execution Units (EUs).

SMT is a second set of registers that uses the same execution hardware. If a thread stalls waiting for a main memory read then the other thread can jump in after the pipeline is cleared out (or maybe before if you're clever and careful, but idk if this is done in practice). You can also impose some governance over time sharing so threads with low data memory footprints don't lock out other threads.


What you're describing sounds like a superscalar processor which utilizes instruction level parallelization: https://en.wikipedia.org/wiki/Superscalar_processor


Random question: does anyone use recent mainline kernels with Ubuntu 20.04? How? How's the experience?

I'm currently on Ubuntu HWE line, but that only goes to 5.11.0. Ubuntu kernel devs have debs for [mainline][1], but I'm not finding any good feedback/experience stories about these.

[1]: https://kernel.ubuntu.com/~kernel-ppa/mainline/


> While one might argue that cloud-computing providers are usually grumpy anyway, there is still value in anything that might improve their mood.

I'm dead haha. This is great.


I’m surprised this has made into the mainline kernel, it feels like it is a niche use case. Perhaps cloud provider pressure?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: