Insisting that the environment matches the tooling seems backwards. The tooling ...

chrisseaton · on Sept 23, 2018

You're making it sound like the C language has changed - it hasn't. I think this operation has been undefined in C as long as Linux has existed. They should not have written that code in the first place - it was never correct.

CJefferson · on Sept 23, 2018

There seems to be two possibilities here:

1) you have rightly pointed out all the authors of the Linux kernel are total idiots who don't understand C.

2) It turns out writing a kernel isn't possible in pure C, and the kernel requires some carefully designed and considered methods of accessing all of raw memory, in various unusual ways.

chrisseaton · on Sept 23, 2018

> It turns out writing a kernel isn't possible in pure C

This one is the case. Writing a kernel is not possible in pure C. I'm asking - given this, why are they trying to do that? Write the parts which aren't expressible in C using assembly, where they can get the semantics they want.

CJefferson · on Sept 23, 2018

The reason is that by restricting yourself to gcc and clang, and agreeing how to define some undefined parts of C with those compiler authors, the amount of assembly you need drops massively, which is particularly useful when you are trying to support as many architectures as Linux.

While technically this means your kernel is "gcc c" rather than "true c", this doesn't cause a practical problem. I know gcc, clang and visual studio's c++ standard libraries require a flat address space and 8 bit bytes, for example, but I've never seen anyone bothered by this.

saagarjha · on Sept 24, 2018

> the amount of assembly you need drops massively

How about this as a compromise: use GCC, with all its flags, to generate the assembly, and just use that? Skip the messy transition step.

0xADD1E · on Sept 24, 2018

And what would this change, if authors are working off of the GCC-C files? If you're after a system by which you don't need GCC to run Linux, you may be interested in a concept known as "binaries" which sounds remarkably close to what you're trying to invent here

saagarjha · on Sept 24, 2018

They're not working off the C files. They're using the assembly files when performing the actual compilation going forward; using GCC is just a bootstrap to get the necessary assembly churned out quickly.

CJefferson · on Sept 24, 2018

But.. there C files has changed hundreds of times, for tens of architectures, over tens of years. Who is going to want to edit 20 (or more) assembly files when a change is made? How do you add a new architecture? Do you have to keep the C files up to date and somehow check the different assembly files don't drift out of sync?

Instead we could just mark some files as "These C files have to be run with GCC versions X to Y", problem solved. Then why not just compile everything with those versions of GCC?

chrisseaton · on Sept 24, 2018

> Instead we could just mark some files as "These C files have to be run with GCC versions X to Y", problem solved.

Is it solved? You'd still have to check the generated assembly of the entire file after every change you make, to make sure it didn't have some knock-on effect that optimised something differently, even if you kept the version of the compiler the same.

And is GCC deterministic? I don't know - do you? Did you know it has a flag `-frandom-seed`?

dfox · on Sept 23, 2018

The idea there is that gcc provides various intrinsic pseudo-functions and features that you can write kernel in "whatever gcc accepts as GNU C" dialect. Last time I looked the only linux platform that had significant assembler objects in kernel was IA-64. And this approach is very common in Unix world, with memory-mapped HW registers represented by C structs and so on (and in fact ability to do this is often cited as reason for various C++ mis-features)

[Edit: I am almost sure that the only part of Linux on i386 and amd64 that is written in assembly is the bootloader stub at the beginning of "Linux kernel image" file format. And it is somewhat funny and relevant that this part depends on abusing gas to generate 16bit code]

saagarjha · on Sept 24, 2018

> this approach is very common in Unix world, with memory-mapped HW registers represented by C structs and so on

The "correct" way to do this is to write the bare minimum assembly needed to present this view to a programmer in C (so there's no need to worry about undefined behavior), then write standards compliant code from there. Writing all this code in C is just papering over the fact that what you're doing is architecture-specific and should be treated as such.

dfox · on Sept 24, 2018

That is exactly my approach to this issue (and at least gcc seems to be clever enough to mostly optimize the abstraction layer away). On the other hand you will see the direct approach with structs directly mapping to hardware, questionable uses of volatile to make the thing actually work and so on very often in various production code.

saagarjha · on Sept 24, 2018

It's not that they don't understand C, it's that they willfully ignore the C standard and have enough clout that compilers capitulate with flags to support their slightly modified but not quite C. And so yes, you cannot write a kernel in pure C.

Sharlin · on Sept 23, 2018

Do you really not grok the difference between writing platform-independent code and writing the platform itself? You don't write for an abstract machine as specified by the standard, you're writing for a very concrete real-world machine in order to implement a part of that abstract machine. Indeed, that's exactly what C was created for in the first place, for writing an OS in something else than assembly!

kjeetgill · on Sept 23, 2018

At some level you have to eventually decide, "if it works it works."

Obviously at the application level it's frowned upon, but in the kernel? You do what you have to with the tools you have.

Think about Java. Yea sure, keep telling us sun.misc.Unsafe isn't legit. It's going to get use until that capability exists elsewhere.

saagarjha · on Sept 24, 2018

> Obviously at the application level it's frowned upon, but in the kernel?

I'd rather my kernel not have a security hole in it because a compiler decided to optimize out undefined code: https://lwn.net/Articles/342330/

pjmlp · on Sept 24, 2018

And as the ongoing Linux Kernel Self Preservation Project proves, those tools are not enough.

I appreciate using software that is actually written with security first, performance second in mind.

wruza · on Sept 23, 2018

They maybe should not have written code in C at all, since the language that you need hire a lawyer for is not a viable option. But everyone hoped that with a given team they could create something that both will meet requirements and be future-proof. While what they did was not really legal, it is not practical to make it not work anymore. If they did it right way, Linux would not exist, see? Lawyers don’t build anything.

Filligree · on Sept 23, 2018

Maybe they shouldn't have, but there was no realistic alternative. At the time, C compilers weren't as "smart" and the sort of trickery they're doing was a perfectly reasonable thing to try in C.

Today, if you were starting a kernel project from scratch, I'd think Rust would deserve a good, hard look.