Hacker News new | past | comments | ask | show | jobs | submit login
C++ in the Linux Kernel (threatstack.com)
112 points by davikrr on Dec 7, 2021 | hide | past | favorite | 97 comments



> You see the problem. My C++ code expected the calling convention that pushed arguments on the stack,

that would be very weird on Linux. The x86_64 linux ABI mandates that the first arguments go on registers afaik (I'm assuming x86_64 here since the post mentions linux distros which are overwhelmingly x64). What compiler would default to a pure stack-based calling convention ? Certainly not GCC or clang, no ?

> and the kernel expected my code to pass arguments in the registers.

so, how is that a problem with C++ and not the compiler defaults ?

> I found the real gold mine of C++ kernel module development knowledge in OSDev.org. They have an entire article on C++ issues. That includes avoiding templates

bullshit it is then. https://www.youtube.com/watch?v=A_saS93Clgk

if templates are good (sometimes better even) on AVR microcontrollers with memory in kilobytes, there's no reason to not use them in a kernel meant to run on large embedded.

Also what's that rant about strings for ? In the end there is zero substance to this article, only very strange rants.


> that would be very weird on Linux. The x86_64 linux ABI mandates that the first arguments go on registers afaik (I'm assuming x86_64 here since the post mentions linux distros which are overwhelmingly x64). What compiler would default to a pure stack-based calling convention ? Certainly not GCC or clang, no ?

The System V i386 ABI passes parameters through the stack. Perhaps that is what the author is referring to, although I wouldn't be surprised if he mixed it up with the x64 ABI.


In case it is talking about the x64 calling convention, there is actually some kind of an odd case where you would get something that looks like an argument was pushed on the stack:

When a non-trivially-copyable object is passed by value to a function, you need to ensure that through the lifetime of the copy, it's address will never change because the constructor may have stored address of some of the field (for instance, a pointer to a field). The way this is handled at list in the SystemV x84_64 ABI is that the object is created on the caller's stack, and a pointer to it is stored in a register, so just like if it was passed as a pointer.

I have seen several cases where a header would have an "ifdef c++" clause with copy constructors and destructors in them ("It does not add field so it should be OK, right ?"), which make the object non trivially-copyable, leading to clashing calling convention between C and C++ codes. I am curious about if this may be the issue he encountered.


Too late to edit so I'll just write a message here: The hypothesis written above would not explain the author's complains as he mentions he was able to fix it with a compilation flag, whereas I doubt the compiler would change non-trivially-copyable being handled as reference to a copy since it would break correctness.


The 32-bit Linux kernel uses a register-based ABI internally, rather than System V.


> The System V i386 ABI passes parameters through the stack.

No. The C/C++ ABI is quite uniform across architectures. The first 1..N (N is ISA dependant) parameters that can fit into a CPU register are passed via registers. The first input parameter that _can't_ fit into a register (e.g. a structure passed by value) is pushed onto the stack, with every other following parameter being pushed onto the stack as well. N+1… parameters are always passed through the stack.


> No. The C/C++ ABI is quite uniform across architectures. The first 1..N (N is ISA dependant) parameters that can fit into a CPU register are passed via registers.

Here is the "System V i386 ABI" mentioned above: https://refspecs.linuxfoundation.org/elf/abi386-4.pdf (from https://refspecs.linuxfoundation.org/). It clearly passes all arguments on the stack, and none on registers ("Function Calling Sequence" starting on page 35). That is the ABI used on 32-bit x86 Linux if you don't specify -mregparm (which the kernel uses); since the author was calling the compiler directly (which was necessary because the kernel makefiles only have rules for building C files, not C++ files), there was a mismatch between the -mregparm used by the kernel and the default ABI used by the C++ compiler, which was fixed by also passing -mregparm to the C++ compiler.


You are not incorrect, and I shall ruminate on why I had thought that the SysV ABI on i386 used %rax ÷ %rex as input function parameters without having to resort to the use -mpregparm. Thanks for the correction.


> The C/C++ ABI is quite uniform across architectures.

How can it be? What about an architecture without conventional registers? And for example I work on an implementation of C/C++ that logically uses the heap for its ABI.


Quite uniform != completely uniform.

Especially when it comes to C (less so C++), it is a remarkably adaptable language that has been able to attune to a variety of vastly incompatible hardware architectures, including stack based ones, heap based ones and some esoteric ones as well. Yet, in the case of conventional, register based ISA's, the ABI has been remarkably similar: nonwithstanding actual ISA specific register names, registers 0…N (apart from RISC ISA's where storing into/loading from the register 0 is a no-op / zero constant) are used as input parameters and register 0 (where available) is used as the function return value (provided it can fit in); otherwise the return result is returned via stack.


> Quite uniform != completely uniform

Don't know if you're a non-native speaker, but no 'quite' usually does means 'completely'!

https://dictionary.cambridge.org/dictionary/english/quite


Only when used with non-gradable adjectives/adverbs (from the same surce: https://dictionary.cambridge.org/grammar/british-grammar/qui... ) (and yes, uniform, is quite non-gradable)

(non-native speaker here, quite frustrated about the quite different meanings of 'quite')


'Quite dead' for example means completely dead.


Only because dead is so binary. Any moderately intense modifier on "dead" signifies complete death, even though it won't mean 'completely' with most terms.


> so, how is that a problem with C++ and not [...]

I don't think you should see this article as a criticism of C++. Just a rent on how hard it is to use in the Linux kernel which is openly against it.


Perhaps the article is old enough to have been written in the 32-bit era?


The Page Info I see says this:

> article:published-time 2016-10-28T11:40:06+00:00

which is well into the era of 64-bit code.


People are still complaining about macOS dropping 32bit despite the last 32bit hardware having been dropped a decade ago.

Some people (especially game devs) are bizarrely obsessed with 32bit :-/


The LDD3 mentioned is 32-bit era, and 2.6.x kernel, which had a CONFIG_REGPARM to allow passing parameters in registers (because the default was not to do that).


AVRs don't have enough storage for templated code to explode into an unmanageable problem.


Somehow C64 can deal with them.

"CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17”"

https://www.youtube.com/watch?v=zBkNBP00wJE

"C++20 For The Commodore 64"

https://www.youtube.com/watch?v=EIKAqcLxtT0


I love these videos, Jason is doing fantastic work with his YouTube channel and C++ Weekly series.


Check his CppCon 2021 presentation, done on a C64 emulator thanks constexpr. :)


Thanks, I will :)


But a C64 is still only talking about 64kB of RAM. The post you're replying to claims that Template complexity gets unreasonable on a large machine, such as a Linux system. Not sure I agree, but "it's fine on a C64" isn't evidence in your favour unless you've forgotten Linux doesn't even run on a Commodore 64.


So it is fine on a 64KB system, but unmanageable on a platform that gets all the different kinds of boilerplate to run cloud workloads, got it


C++ or even C isn't exactly "fine" on any 8-bit system though. It's nice for a little demo, it can even be tolerable for some real-world projects when mixed with large amounts of inline assembly, but those 8-bit ISAs have been designed mainly for manual assembly coding, not high level compiled languages like C.


Honestly unless you’re on something like an ATtiny with < 1K of RAM or doing cycle-counted stuff a properly adapted high-level language is fine. I mean, Forth (doesn’t have to but usually) uses an interpreted virtual machine and people have used and liked it on 6502s since those were the new hotness.

As far as I’ve seen, two things make C and C++ specifically problematic on 8-bitters: automatic promotion to int for all expressions, with int required to be at least 16 bits (a language problem); and subpar codegen on accumulator architectures and other things that are not like desktops (a compiler problem).


Forth is much better for creating tiny executables than C on 8-bit ISAs though, performance takes a hit because of all the calls/jmps, but it's still surprisingly good. C compilers on the other hand often create "obviously" dumb and inefficient code, at least in my experience (6502 may be better than Z80 in that regard though).


C translates directly to ASM in many cases. It just makes managing offsets and other stuff easier.

C++ adds type-safety on top of that for no cost. It's great when your compiler tells you that there is no operator =|(PORTD, PINA). Did you mean |=(PORTD,PIND) or =|(PORTA,PINA).


> C translates directly to ASM in many cases.

But usually much worse ASM than what a human would write on such CPUs, because the C compiler is still restricted by artificial high-level concepts like calling conventions, and it needs to wrestle with instruction sets that are not very compiler-friendly and tiny non-orthogonal register sets. C++ just adds a whole level of code obfuscation on top, so it's harder to tweak what code the compiler actually generates.


If you really want that in C, you can either use functions and wrap everything in (incompatible but internally identical) structs, or use Sparse and annotate those integer types to be incompatible. Not that you must prefer that to C++ (even if I do), just to note that you can make do with C if you want to.


I mean, apparently this is confusing, but yes, obviously.

If your Commodore 64 template is dealing with say, foozles that might be 8-byte, 12-byte or 16-byte, the complexity incurred is pretty small, bugs with foozle<16> are likely to be something mere mortals can understand and fix.

On a more complicated system like a cloud Linux setup the template may be for foozles that can be in any colorspace and on a remote machine or locally, and now sometimes the bug with foozle<HSV,remove> involves a fifty line error message because the compiler doesn't realise all that happened is you meant to write foozle<HSV,remote> ...

It's not even as if the C++ committee isn't aware that templates are a problem. Remember template meta-programming wasn't really intended from the outset, and a big part of the point of Concepts was to at last let you write code that compilers can provide readable errors for when it's wrong.



I understand that for C++ programmers "That's possible" versus "That's a good idea" is a distinction without a difference, however for the rest of us the fact you can use templates as much as you like in, say, Windows drivers, does not magically mean it's a good idea to write complex templated code in Windows drivers.

The constraints in /kernel like forbidding exceptions are because otherwise they (Microsoft) need to do a bunch of extra work to support your bad idea. But your use of templates has no impact on their work, so knock yourself out adding as much complexity as you like this way.


Here is another example, running C++ straight on car firmware free of Linux politics via AUTOSAR certification standard.

https://www.parasoft.com/blog/breaking-down-the-autosar-c14-...

But what do they state specifically? Ah, right.

> "The document allows in particular the usage of dynamic memory, exceptions, templates, inheritance and virtual functions."

https://www.autosar.org/fileadmin/user_upload/standards/adap...


It definitely feels like we're talking past each other. I keep telling you why people think it's a bad idea, and you keep showing that you're allowed to do it anyway. We know. That's the difference between impossible and a bad idea.


You keep saying it is a bad idea like it is a given. He keeps saying 5 bazillion programmers are quite successful with your "bad idea" so maybe it isn't so bad. Is there any evidence either way on the effects of templates on code quality?


Nope, it is the difference regarding politics of what goes into the Linux kernel and the rest of the world, and it is quite clear to which side each of us belong.


Why exactly? They are a compile time concept which only generates code for types where it is needed. That C devs instead copy-paste the same code 10s of times, use some shitty slow linked list, or the worst, use textual macros doesn’t make any of them a better tradeoff imo.


To be fair, DriverKit runs in userspace. IOKit, its spiritual predecessor, only allowed for the use of Embedded C++.


True, but that isn't the case for the other more modern examples.

Also IO Kit is no more, unless one is running an outdated macOS installation.

And talking about the past, maybe discussing about dynamic dispatch of Objective-C messsages on NeXTSTEP drivers with the previous Driver Kit, would also be quite interesting regarding "bloat".


IOKit remains the only way to write kernel extensions, which are still supported but discouraged if DriverKit does the job. NeXtTSTEP using Objective-C for drivers was certainly very cute but I guess they just didn’t want to have driver makers learn the language :(


> Devices supported on macOS 11 and later require DriverKit. Use IOKit in your apps and services to discover and use devices.

https://developer.apple.com/documentation/iokit

Going forward not for long.


C++ templates are unwound at the compile time before the «expanded» template code passes along into the optimiser where most of the unused code is elided.

Unless the templates have been externalised (i.e. defined as «extern template …», of course). Even then, a modern compiler+linker combo will optimise most of the unused code away at the linking time thus reducing the final binary size. I do understand that the LTO might not be available for every embedded platform, though.

P.S. That is exactly the point of the C++ template metaprogramming – the hard lifting is delegated to the compiler, which leads to increased compile times but also to more efficient and very compact runtime code.


What do templates have to do with storage, though? My primary attraction to C++ templates is that they let me write very expressive code that will compile down to a handful of instructions. Now, actually compiling complex C++ templates on a storage-constrained system can be a problem, since templates are compile-time beasts, not runtime. Once compiled, though, they have a Cheshire-cat existence.

Edit: Unless you're doing something rather silly with the templates, but again, that's not a template problem.


The general complaint with templates is they are instantiated and if you’re not careful can bloat the binary with multiple versions of a piece of code. But this is usually something pretty easy to solve: just don’t do something that would cause that to happen :P


I have _never_ had any issue with C++ templates on _modern_ µControllers such as the ESP32. Unless you have an incredibly minuscule flash, modern GCC or LLVM are very good at deleting unused code when you compile everything with -Os. Even -Og isn't that critical either.


> I found the real gold mine of C++ kernel module development knowledge in OSDev.org. They have an entire article on C++ issues. That includes avoiding templates

No, it doesn't?!? The linked article mentions templates two times (+ 2 mentions of the standard template library), once saying that templates can be used without further setup and the other times recommending that some template based data structures should be implemented. That's pretty far from "avoiding templates".


Lol, part of me likes the effort taken just because, but the kernel devs _really_ do not want C++. One hint: "struct class"

https://elixir.bootlin.com/linux/latest/source/include/linux...


What bothers me about that is that, because C doesn't have namespaces, it's already a terrible name for a struct. What if you want another "class" of thing?

Call it device_class ffs


You're dismissing the fact that the keyword collision really well might be intentional, the worst of it is that `/sys/class` siblings `bus` and `driver`, if their internal linux rep is actually in the `class.h` siblings, are called `struct bus_type` and `struct device_driver`


I'd say it was extremely intentional given this:

https://lwn.net/ml/linux-api/20180905165436.GA25206@kroah.co...

And that's for a userspace header.


   #define class Class
   #include <some_linux_header.h>
   #undef class
/s


Wasn't there a period recently — around 1995 or so — when the Linux kernel had to be compiled with a C++ compiler?


Quoting from http://vger.kernel.org/lkml/#s15-3 :

"In the dark old days, in the time that most of you hadn't even heard of the word "Linux", the kernel was once modified to be compiled under g++. That lasted for a few revisions. People complained about the performance drop. It turned out that compiling a piece of C code with g++ would give you worse code. It shouldn't have made a difference, but it did. Been there, done that."


I wonder if it is still true. C++ compilers have come a long way. (as have C compilers). C++ is 99% a superset of C, I'm not sure how much of that last is used in the kernel, so it might be too much effort, but C++ is in a few cases stricter than C in ways that compilers can use to optimize. Many C programs run faster when compiled in C++ these days.

If there is a difference (either way) I'd expect it to be something you can measure, but not something you would notice in the real world on one computer. (though at google scale it probably shows up)


I don't think that's accurate. Maybe you're thinking of how it needs gcc extensions?


Why was that the case?


as long as nothing you're including includes that in c++, it shouldn't be an issue at the linker level, and thus not be an issue at all.

at least in general - if it's something that can't handled by a c shim then you might have an issue.


It's one of the most fundamental structures in the kernel. Pretty much all driver headers include it, if indirectly.


In the past, I wrote a unix like kernel from scratch in C++. I have summarized what I had to do to get C++ code run on bare metal in this article https://www.avabodh.com/cxxin/nostdlib.html


I've always been interested in writing my own Unix-like kernel! Could you share what resources you used to write it? How long did the whole thing take?

Just to understand the scope of the work, did you implement any of the following: memory isolation, networking, concurrency via interleaving on single thread, parallelism where n threads can run n processes simultaneously? How long did each take to get done?


I did this while I was doing my bachelor degree course. It was four year course and I started doing this sometime in 2nd year and continued till 4th year. I was not always writing code as I had to study other subjects as well. Also I was just learning coding and other computer science concepts, so it was like learning and writing code. But the writing the kernel forced me to learn many computer science concepts very deeply.

At the end, what I had was a kernel which could boot on bare metal (or VM) and provided a command line interface. It had a virtual file system layer and ext2 file systems, process management (fork, exec sys call), memory management (paging and process isolation) and device drivers for keyboard and hard disk. The kernel was able to fork and exec static ELF binary.

I did not reach to networking and threading. But that was next step which could make it complete unix kernel.

I implemented in bits of assembly(nasm) and C++. So I had to learn runtime and code generation aspect of c++. Based on that learning I wrote this articles on c++ object models and other internals. https://www.avabodh.com/cxxin/cxx.html


These are all standard C and C++ interop problems also found in userland and typically go away if the C project at hand is cooperative.


There's one problem with rants such as these, it's too easy for someone to be exposed as clueless and broadcast his lack of knowledge and assumption-heavy development process to the world. How is that as an advertisement for one's employer?


Valueless Article. Please stop posting these sort of articles which have no information content.

The article is merely a rant because the author doesn't have much of an idea of how C++ actually works. Merely knowing the syntax doesn't make one a "C++ programmer" and this is even more true when you are messing around in the Kernel. The article contains no specifics only general statements making me think this was put up to just be a "hit piece".


With all due to respect, your comment is an anti-specialization rant.

> Merely knowing the syntax doesn't make one a "C++ programmer"

Does knowing all the possible abstract layers (uh, it's an ocean) make one a C ++ programmer then?

> this is even more true when you are messing around in the Kernel

It's his right to mess around Kernel and learn things.


My comment has nothing to do with "anti-specialization" or "right to mess around" anything.

The article has zero substance with a generic rant being "i tried to use C++ to write a Kernel Module and ran into problems". There are no specifics w.r.t. C++ nor The Kernel and yet the author blames the C++ Language! Whatever is written up also betrays a certain ignorance of basic C/C++ ABI conventions leading one to surmise that the author is clueless (w.r.t. these two domains). As you can see from other comments in this thread, many others are also of the same opinion while others are guessing all over the map as to what the actual problem might be.


> A first-year computer science student can tell you that the arguments get pushed onto the stack. In other words, a call to this 3GL function results in the following assembly pseudo code

Are people this ignorant when it comes to C/C++ or any systems language? ABI & calling conventions were introduced early in my C & C++ textbooks (age 13 btw, not even close to college years).


Well, if you believed as the author did that arguments are always pushed onto the stack, you are pretty ignorant--most major architectures these days don't use the stack for arguments, at least not for the first several arguments.

(Semi-random tangent: the hardest bug I ever had the pleasure of debugging was when I discovered that the PLT glue code to load an entry into the PLT was unexpectedly clobbering a register that the calling convention said needed to be preserved. By very, very careful using non-default calling conventions across shared object boundaries!)


I think it would depend on which system you were introduced into. Also 99% sure in my classes in the mid 90s they taught stack push. Which made sense as registers were pretty valuable. It was not until RISC came along, and register renaming, that you could consider 'wasting' them on passing args in the general case. In the 'DOS'/'Win16' world calling conventions were all over the place. You could get into trouble real quick if you did not pay attention to those calling convention modifiers. Especially if you were using libs from different compilers. In the linux world where you can control the whole stack it is easier to say 'this way and if you stray away from it, good luck'.

Small sample of the remnants of that in the DOS world. https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-an...


Ab initio first years probably just about know what registers are so I can believe that.

Decoupling the compilers optimizations and the ABI (particularly what constitutes a "move" of a struct) has derailed a few conversations I've been involved with - even from very smart devs (although mainly interpretation rather than basic misunderstandings like thinking what is actually due to the ABI is an optimization)


The article has maybe three paragraphs of actual information.


> Kernel developers obsess about speed and performance. The Linux kernel is built using -mregparm=3, which is sometimes called fastcall.

I've never messed with calling convention for the sake of performance before so I found this bit interesting. I found more info about it at: https://en.wikipedia.org/wiki/X86_calling_conventions#Borlan...

Does anyone have benchmarks? Assuming I don't care about ABI stability, what's the fastest calling convention?


> Assuming I don't care about ABI stability, what's the fastest calling convention?

I'd assume a modern optimizing compiler will, in situations where it's permitted, create completely novel calling conventions depending on the situation. Whole program optimization is one area you might see this.


The compiler tends to be limited in how it can change calling conventions by external visibility of the functions. Generally if you compile a function down to an object file the compiler will want to make that object file linkable with any other object files importing that symbol properly.

Whole program optimization gives the compiler some ways around that. I am not sure how much freedom it gives the compiler.


With LTO the compiler should be free to fudge the calling convention for most calls even between translation units.

Here is GCC doing that optimization for a static noinline function: https://godbolt.org/z/cn6Wz9Kvn

Similarly, compilers can also clone functions if it makes sense to propagate constants from call sites into the function. Example: https://godbolt.org/z/59z6xT75n

I'm sure there is more room for improvement. A perfect compiler would always optimized programs as a whole and only regard function boundaries as hints at best. In practice, you have to keep complexity in check somehow.


On Linux with GCC and Clang you can use -fvisibility=internal to tell the compiler to not care at all about this and go wild with ABI. Of course it needs to be done carefully...


Compilers can clone functions (so there are two variants with different calling conventions) or even create alternate entry points.


There should be Ada in Linux kernel


and Haskell


C'mon people. Python all the way


The one true language: Threaded INTERCAL.


  PLEASE COME FROM HELL


PLEASE DO GIVE UP



and blockchain


a complete waste of time


If my boss told me to go write a Linux device driver in C++, I'd quietly go away and deliver a working device driver that happens to stick to the C subset of C++. Trying to fiddle about getting header files to include cleanly is a complete waste of time. (Maybe you're referring to something else like reading the article.) The benefits of C++ over C that is consistent and well-written in a disciplined manner is really not as great as many managers have been led to believe. And seeking forgiveness from an idiot manager is always easier than seeking permission to do things sanely.


I guess Rust is far better choise for that.


There is no reason that Rust doesn't share the same infrastructure problems. The main difference is that the kernel maintainers want Rust in the kernel, while if you want to maintain a module written in C++ then you are on your own.


Community wise, yes. It seems to have gained some momentum.

From a technical point of view, I'm not so sure. C++ still interoperate easier with C than Rust, if only because you can normally just include the headers and be done with it. (Although as the article says, there are some cleanup to do.)


From an everything point of view, there's no point in adding complexity for no tangible benefit. Rust has tangible benefits (memory safety). Very far from a silver bullet, but demonstrably better than C.

Rust isn't being experimented with in the kernel because someone decided we should really add a second language. C++ interop with C doesn't matter when there's no reason to use C++ in the kernel anyway.


The "interoperate easier" idea is a trap for both C and C++ and worth avoiding because in fact they aren't quite compatible, so you're making both languages worse to achieve this. I don't much like C++, but if you must use C++, actually use C++ and forget that it's sorta kinda "compatible" with C.


I know they are not 100% compatible, but they are 99% compatible, and that's much better than Rust.

It's easy to make it compatible by not using fields called "class", or using #ifndef __cplusplus, most C library headers are actually like that. But not the Linux kernel because they refuse it.

That's why I'm saying that the choice is not a technical one.


You can create headers that are usable from C and C++, but you have to actively maintain it that way. As C headers tend to not have function definitions in them, it's fairly easy to avoid the C-only features.

I doubt that Linux headers give a damn about usability from C++ though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: