Hacker News new | past | comments | ask | show | jobs | submit login
Some Were Meant For C (2017) [pdf] (kent.ac.uk)
105 points by fractalb on March 1, 2021 | hide | past | favorite | 186 comments



C and C++ code CAN be secure, but most of it is not. It is too easy to write or update C / C++ code so that it is no longer secure or has unexpected and unsafe results.

https://blog.regehr.org/archives/213 gives great insights into how undefined behavior in C and C++ can be difficult to reason about and cause problems.

The lack of bulletproof memory safety and easy-to-stray-into undefined behavior of C and C++ make it easy to create code that is difficult to fully grasp how it will behave, especially when optimizing compilers are used. The C / C++ code runs really fast, but there are hidden dangers lurking.

I don't doubt that C and C++ will be with us for a long time to come, but the growing use of Rust, Zig, Ada, and others show that better alternatives exist and that they will replace the use of C and C++ for many domains and use cases.

Edit: Downvotes? Did you read my whole comment? I am saying that C / C++ are not secure for real-world use cases.


This is one of those rare cases where I can say "It's 2021, and we know that's not true now." C and C++ can not be secure at scale without unreasonable amounts of effort. It can't be secure at any non-trivial scale through sheer discipline alone.

40 years ago, the case could be made. But C and C++ are not new languages, and the fact that just barely shy of no one can demonstrate the existence of secure C or C++ code bases without staggering levels of effort put into that process is data now, not just anecdote.

(And let me emphasize the effort as my yardstick. Writing truly secure code is arguably something nobody has ever done at scale in any language... but C and C++ are certainly unique in the sheer level of effort it takes poured in to them to even match what a number of other languages come with out of the box, let alone exceed them. If you aren't using some very high quality and fairly expensive tools like Coverity on a routine basis, you aren't even close.)


Not even 40 years ago, because exactly in 1981, C.A. Hoare stated on his Turing award speech,

"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."


I am astonished that HARDWARE does not help us much. When 'everybody' decided going like a bat-out-of-hell was more important that security and safety and easy-to-not-screw-up languages w/garbage collection and capabilities and hw-enforced typing and and ... we end up where we are: a mess that runs fast. And, specifically, I think that RISC-V rather missed the boat here, by not taking the security/safety banner and leading to a new way of building computer systems. Instead, we got a different (open) ARMish thing. Progress...


People - or rather C compiler writers - don't even use the minimal safety capabilities that today's hardware has. Trapping NaNs? C is going to turn them off. Trap on null pointer dereference? The C compiler is going to optimize that away. Etc.


There are now efforts to turn computers into C Machines with hardware memory tagging.

While Intel's MPX extensions were a failure, Solaris makes use of ADI on SPARC, iOS uses PAC, Google is researching adopting ARM MTE into Android, ARM sponsors CHERI with Morello project, Azure Sphere has Plutonium, and so forth.

However these are all pontual efforts, and their large scale effects will take years or decades to start making a difference.


Can you actually have a faster system with hardware checks compared to what we can do with x86 with all security checks in software? If not, I don't see the point. Noone is going to buy slower hardware that runs the same safe language(which we already have access to).


Yes, Solaris SPARC with ADI for example.


>> C and C++ can not be secure at scale without unreasonable amounts of effort. It can't be secure at any non-trivial scale through sheer discipline alone.

Agreed. That is why C and C++ should be replaced with more reliable alternatives.


Security is not a feature of the language or tool. Neither Rust nor C++ are fully secured even though the former could find more memory safety problems at compile time (but not all of them).

Security is the process. It contains continuous risk assessment, penetration testing, fuzzing and using various other tools throughout the product development to eliminate attack vectors. Only then you could build a secured product. Just rewriting everything in Rust won't make it.


Very concisely and correctly put, agreed.

I sometimes work on the infosec side of the house. It's easy to point at vulnerabilities due to endless memory access problems in C code and fixate on that. And it's true, so it feels satisfying.

Just rewrite everything in not-C and this is fixed! And that's true as well. But remeber here the word "this" refers to the memory access bugs. It doesn't refer to vulnerabilities in that sentence.

Plenty of systems without a single line of C/C++ code and we have no shortage of ways to break in anyway. So in one sense, everything changed. But wearing the black hat, nothing really changed since I compromised the system anyway. A new language without all the engineering process parent post describes, won't magically get there.

For an existing system "just rewrite everything" will guarantee way more bugs for years to come simply because the old system has been battle-tested and reinforced for years. In the long haul the rewrite will converge to a better state but that haul is long (and assumes budget will remain in place long enough, which might not happen so you end up with a half-baked rewrite).

For new systems starting from scratch that may in any way ever be run in a security relevant context, sure, starting with not-C is a good idea today.

Unfortunately Rust seems to be the only alternative for use cases where C was actually needed (if it could be written in Java or Go or Python or ... then it didn't really need to be in C in the first place). And sadly rust is fairly user-hostile language so my guess is plenty of new projects will start with C for a long time due to lack of friendly alternatives. And tooling.


> And sadly rust is fairly user-hostile language

Could you elaborate on that? I started using Rust at the end of last year, and have found it to be one of the most user-friendly languages I've learned. There is clear documentation that is easy to generate with the standard toolchain, along with a very good language guide.

The compiler error messages are an absolute joy as somebody coming from C++. Pointing out exactly where the borrow checker found errors, or exactly what the type signature should be for a trait, is really useful.


Well, yes, coming from C++, Rust must seem like a breath of fresh air. Coming from almost any other language however, I can understand that it feels a little bit intimidating (and getting more so with every exotic feature added - "const generics" anyone?), no matter how friendly they try to make the documentation and the error messages...


I can see that a bit. I suppose for me, the biggest questions are (1) is there a way to unambiguously express my intent and (2) is there exactly one way to do so? I'm fine with more features so long as they match some specific problem that people have that isn't already solved in another way.

For example, in the case of const generics, I've used the equivalent in C++ to make a class for a geometric vector in N dimensions. The size varies depending on the number of elements, but is known and can be checked at compile time. The existence of const generics is very closely matched to this particular use case.


I don't think "there is exactly one way to do it" is actually a good guiding principle.

Forcing it down to an overly simplistic setting, if I need to add 5 to a number I can increment 5 times, or add 2 and then 3, or just add 5. Which of those options should we remove?

Perhaps we mean, instead, that there should be one best way, but (sliding into metaphor, I hope) what about when we have a language without 5 and we are considering adding it? Is the use case already handled by 2+3? How are we to decide?


This is just false, Rust is much easier to use than Python (just the package management situation is so much better)


Really? Then there is no point in starting new projects in Python. You can write everything in Rust. You’ll get performance as a bonus.


No python has way more packages like the op said. You can't start projects in rust where you need a significant amount of dependencies because rust is not there yet.

Language wise rust is better the python in every way except for compile times.


I imagine that in addition to having more packages, it is easier to hire developers who know Python than who know Rust. Current tiobe index puts Python at #3 (rating=10.86%) and Rust at #30 (rating=0.49%). If you already have a team experienced in Rust, then it may be a good choice, but if you are expecting to grow your team, you may want to pick a language based on popularity.

Language-wise, it feels like they are intended to solve different problems, so Python vs Rust is a weird comparison. Granted, that may because I started learning Rust as a replacement for C++, so that has been my point of comparison.


Generally I wouldn't advise hiring based on the language the candidate knows anyway. Almost all mainstream languages are so similar in concepts that it doesn't really matter anyway.


> And sadly rust is fairly user-hostile language

Rust seems to me to be user-frienndly, though for the cases where it's safety features are critical and at the fore, it front-loads pain that would be dealt with eventually, which can make it feel intimidating. But its forcing you to confront things that are likely to cause subtle bugs if dealt with sloppy (or overlooked), not actually adding unnecessary complexity.


User hostile? I've found it to be the exact opposite of you consider the user is me, the developer. If anything the language and tooling has gone to lengths to be user friendly.


Surely part of the security process should be to identify parts of any system that are routinely causing security problems and replace those parts by better ones (as they become available) in any context where security is important (which seems to be rapidly converging on "almost all contexts that normal people using computers encounter in day-to-day life").

Rewriting C or C++ applications in Rust won't fix all security problems. But it would be forward progress. It's just like wearing seat belts won't stop people from being injured in car crashes. You could say that "safety isn't a product but a process," which is fine, but if your safety testing process finds that wearing seatbelts reduces injuries substantially it seems pretty obvious that using a seatbelt product is the way forward. At least until someone invents a better replacement for seat belts, or supplements them with other features like airbags, automatic braking systems, self-driving, etc...


“Security is not a feature of the language or tool […] Security is the process”

True, but if the process discovers that a lot of lives are lost because of the use of unsafe tools, it changes the tools to add safety features (https://en.wikipedia.org/wiki/Chainsaw_safety_features), or ditches them altogether (https://en.wikipedia.org/wiki/Hazard_substitution#Processes_...)

So, yes “rewrite in rust” isn’t the full answer, but that does not imply “don’t write it in C” isn’t part of the answer.


This is true, and I suspect that while Rust has this aura of compiler enforced memory safety, its quite possible that the piles of c/c++ tools which _can_ be enabled and run against C code bases make it generally just as secure in practice when those tools are actually enabled.

The simple classes of bugs enforced by rust, are also caught fairly quickly with any kind of memory sanitizer (valgrind?), combined with static analysis tools (coverty?), and automated code quality standards (misra). Run a CI/Code coverage monitor while looking for for these kinds of errors, and I would bet the results are actually better than plain rust due to the maturity of some of these tools.


That's plain wrong. If static analysis was as reliable as rust, why would all these C codebases still be full of buffer overflows and memory errors? The borrow checker is, in effect, a static analysis tool... that requires a lot of annotations from the programmer, and blocks compilation otherwise. The equivalent for C would be to annotate all your functions in some formal language and then run, say, frama-C.

About valgrind or sanitizers: they're runtime, so just like tests they can only show the presence of errors, not their absence. Like dynamic type checking.


The borrow checker is a static analysis tool that is 100% sound, unlike the tools for C/C++


And it only catches a trivial minority of actual security problems, which can occur in a lot more ways than use after free/etc.

IMHO, Rust simply isn't good enough at catching all types of bugs to justify rewrites at this point, and its likely when you look at some of the work being done at the processor manufacturing companies that they don't believe it either.

Consider: https://en.wikichip.org/wiki/arm/mte, https://en.wikipedia.org/wiki/Intel_MPX, and https://lwn.net/Articles/718888/

There are quite a number of these in the pipeline, which make some of what rust does redundant.


Uh, not at all. Rust’s compiler probably catches like half of all bugs that currently lead to security issues in memory-unsafe languages. And the things you mentioned are similar band-aids, not fixes.


The borrow checker is 150% sound, it complains about errors but also complains about a lot of things that could have been perfectly fine in reality. There are plenty of times when multiple mutable references would be perfectly safe for instance.

Valgrind might not catch 100% of errors, but at least what it catches are actual errors I care about.


That's an interesting (and correct!) objection — in stats terms, it's the choice between having false positives (your type checker rejects some valid programs) vs. false negatives (Valgrind didn't catch these cases).

If you're writing safety-critical software though, being forced to restructure your code to satisfy the type checker (which, in this case, is kind of a simple proof assistant) seems like a sane tradeoff.


You're talking about completeness: the borrow checker rejects some valid programs (any type system will do that). Soundness is talking about catching incorrect programs; the borrowck doesn't allow a single invalid program through.


Valgrind is not a static analyzer, You should use some C static analyzers to compare the false positive rate. You can use various dynamic instrumentation based tool to detect other classes of bugs that are not caught by the borrow checker in Rust.


Valgrind only catches the errors that your test suite triggers.


>> Security is the process.

Yes, but do the programming languages / tools we choose make the security process easier?

There is no perfect solution, but some programming languages / tools are better than others for preventing unexpected behavior that can lead to insecure programs.


This sounds wise, but I would bet that for the same amount of effort, "just rewriting everything in Rust" will give a better return than the kind of process you're advocating.


We can pretty much also conclude (given data on CVEs, etc.) that -- in practice -- programmers either aren't following that process or it doesn't have enough of an impact to actually get rid of memory errors of the kind Rust protects completely against.

(Caveat: Obviously 'unsafe' is relevant to this discussion, but at least you only have very small areas of code you need to check extra carefully.)


If you want to do something about it look into UBSan. Turn vague concerns into bugs, and then into commits :).

https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

UndefinedBehaviorSanitizer (UBSan) is a fast undefined behavior detector. UBSan modifies the program at compile-time to catch various kinds of undefined behavior during program execution, for example:

- Using misaligned or null pointer

- Signed integer overflow

- Conversion to, from, or between floating-point types which would overflow the destination

GCC has similar features.


Unfortunately, ubsan is not implemented on all targets, as I just found out a few seconds ago trying to enable it on my arm-none-eabi project. (I suppose technically I just need to implement all of the stubs to hard fault or something... why doesn't gcc do that for me?).


> UBSan modifies the program at compile-time to catch various kinds of undefined behavior during program execution

Which makes it unusable with binary libraries.


It’s not intended to be something you ship.


Plenty of us link to binary libraries.


UBSan is awesome, but you need an awesome test suite to really unlock its power.


If you are too lazy to write tests, either formally verify it (cbmc or satabs) or fuzz it. Libfuzzer works great finding ubsan and asan bugs fast.


C and C++ are not the same case. Modern C++ supports fairly strict and exhaustive type safety that can robustly evaluate many types of safety at compile-time, if you choose to use it to its full capability. C doesn't even have these facilities.

Anecdotally, I haven't seen an open source C++ code base that is uses type safety to that extent but it is certainly possible.


C++'s problem is that it still silently supports non-Modern style code, so the safety is opt-in. And you have to check everywhere for it. It's not as bad as C where the safety just doesn't exist, but it's not much better. As you noticed yourself, it's not actually used safely!


Rust, Zig and Ada code CAN be secure, but most of it is not, it is too easy to write or update Rust, Zig or Ada code so that it is no longer secure or has unexpected and unsafe results.

The lack of formal verification features and easy-to-stray-into logic errors in Rust, Zig and Ada make it easy to create code that is difficult to fully grasp how it will behave, especially when large projects are concerned. The Rust, Zig and Ada code runs really fast and is usually memory safe but there are hidden dangers lurking.

I don't doubt that Rust, Zig and Ada will become more popular in time to come, but formally verifiable languages such as Verifiable C or Spark are actually "safe" in some meaningful sense of the term instead of giving everyone a false sense of "safety" in the form of memory safety features as if memory safety errors were somehow the only class of security critical error.


> C and C++ code CAN be secure

Anything can be secure (and conversely anything can be insecure). The theoretical potential doesn't matter because real life is never the theoretical best case. What matters is the overall risk (is liklihood * how bad < benefit?)


>> real life is never the theoretical best case

Exactly; real life C and C++ code that does real work tends to be insecure.


I think they can easily be made secure. You just have to be willing to strongly type everything. It doesn't mean it would be conventional or easy, but it would eliminate most type safety issues that crop up during refactoring.


> I think they can easily be made secure [...] It doesn't mean it would be conventional or easy

So you can easily make it secure by doing something not easy?

Something that requires discipline, willingness and unconventional practice to be secure is by definition not secure. A secure language is the opposite: The default behavior and conventions are mostly secure and you need to go out of your way to make it unsecure.

Otherwise, with enough time and enough people you are guaranteed to be doomed.


I have a suspicion that “safe” and “secure” are two different things.


> C / C++ are not secure for real-world use cases.

Is the kernel Linux not a real-world use case?


Oracle, SQLite, and PG are real. As are the various embedded systems in many forms of machinery. I wish the security conversations would become more detailed as opposed to labeling safe/unsafe and leaving it at that not considering circumstances and environment.


As there are the issues reported to them by CVE.



Sure, there have been plenty of CVE's for the Linux kernel, and will continue to be.

Now show us the list of replacement kernels written in Rust, Ada, Lisp, or whatever that are:

1) free software (or open source as a second best)

2) as widely distributed and as portable as Linux

3) supports as much hardware as Linux

4) supports the wide range of applications that Linux does

5) free of memory safety related CVE's as well

There is always a series of tradeoffs - Linux has adopted a classic but unsafe programming language as its basis which has in turn allowed it to cover an enormous amount of ground in a (relatively) short time frame, support massive amounts of new employment and accelerate the development of the web.

It is not clear that an alternative design or implementation in another (safer) language would have moved the world as far forward as Linux has in the same time frame, or whether its even possible, as nobody has demonstrated this.


That Linux has won is probably more due to lucky historical circumstances than to technical superiority.


It's not apparent that Linux has won anything, the race is still open. Where are the real contenders is my question?

Apparently they are still stuck at the starting line, discussing endlessly how perfect their system will be when they finally choose the right language to implement it in.


Making an OS takes a lot of manpower. Linux absorbs most of the programmer hours, the BSDs take almost all of the rest. What is left over for projects like Redox is a tiny team of developers. You can't expect a ton of hardware support to come out of those efforts.


Linux and the BSD's grew to support x86/Sparc/POWER/Alpha/ARM/68K hardware in a time when hardware drivers mostly had to be reverse engineered from scratch or donated from companies that wanted support for their hardware mainlined into the kernel.

As a result, anyone working on Redox, Fuschia, or whatever popular contender now have a massive library of driver source code to refer to when working in their languages of choice. This is not a resource that the Linux and 386BSD communities had when starting out.

Rust isn't as standardized or at critical mass yet, as C was when Linux was launched. The language still lacks comparative mindshare, no matter how popular it is here on HN. That's why there aren't enough developers.


Nobody said that Linux was bad, just that it's not secure.


I didn't say that Linux wasn't bad, or that it was secure, I asked you to point to a current replacement that is in a better position, which you've declined to do.


I'm trying very hard to apply the principle of charity here, but it seems to me that you are persisting in being disingenuous as all of your replies are acting as if your original comment was actually a question rather than a rhetorical device.

Do you honestly believe that there is a widely-deployed battle-tested free software operating system implemented in a memory safe language?

I assumed the answer was "no" from your original comment, but all of your replies can only be interpreted as if the answer is "yes"


If Linux is bad and insecure, what was the point of your comment?


> I asked you to point to a current replacement that is in a better position, which you've declined to do.

I never said "Linux is bad and insecure" either, and when making blanket assertions like that, it's helpful to have a point of reference.

So I ask again, which free software, widely deployed, battle tested and highly secure operating system are you comparing Linux to? That was the very obvious point of my comment, which you have apparently missed (ignored?) twice now.


I mean you could have asked "what operating system that starts with an L out there are you comparing Linux to," so you picked those specific features for a reason. I would like to know why.


> you picked those specific features for a reason. I would like to know why.

I think this is a good question, and I was wondering if someone would ask that. To be more specific, I think points (1), (3) and (4) are of primary importance, so I would address those specifically:

1) commercial, proprietary and expensive systems were very fragmented and incompatible with each other, especially UNIX, and VMS, Windows etc. Nobody seems to want to go back to that. I think this is the biggest hurdle for any new commercial system to overcome, otherwise Linux (or other, better free systems) will continue to dominate until a better replacement comes along. Look what happened to Solaris. Better than Linux in some respects, worse in some, but it failed to retain its market share. Windows is not hugely used as a server platform either like it was in the early 2000's outside the infrastructure required to support Active Directory, Exchange and Sharepoint.

3) obviously popular server hardware needs to be supported by the system, eg. HP, Cisco UCS, DELL etc. This should be evident by the amount of effort that vendors like Redhat put into patching their kernel to ensure compatibility with hardware from major server vendors. If the proposed alternatives don't support these hardware platforms then they are going to find it hard to gain traction.

4) you aren't going to migrate people away en-masse from a C-based system like Linux without being able to support the major languages and ecosystems that rely on C and/or C++ for their runtimes. People will still need to run their Tomcat applications, Ruby on Rails, Python, NodeJS etc. all of which currently depend on C or C++ to some extent. I think at least Redox is addressing this somewhat with their "relibc" implementation, which would allow "legacy" C codebases to be hosted there in the future, but let's be pragmatic and admit that it's probably a long way off.

As somebody who builds/manages emergency services communications platform based on Linux that supports over 70,000 front line emergency services workers, who in turn provide criticial services to over 6 million people, those are the reasons why I see no currently viable platform worth migrating to in the short to medium term _for our particular application_. In addition to self-hosted environments like ours, Linux currently also has a very particular grip within cloud hosted environments as well (which aren't particularly relevant to me at this point in time).

In order for non-C based platforms to take off, you either need to support the C runtimes, or be prepared to wait a long time as other languages that don't depend on a C runtime become popular and the ecosystem grows enough.


I suppose what I've been trying to say is that Linux can be both the best choice and have security issues due to the choice of C as an implementation language.

There were no free software memory-safe languages suitable for implementing an operating system in during the 90s when a lot of the initial Linux development happened.

> In order for non-C based platforms to take off, you either need to support the C runtimes, or be prepared to wait a long time as other languages that don't depend on a C runtime become popular and the ecosystem grows enough.

C runtimes are everywhere at least partly because C runtimes are easy to implement. Most of the Lisp machines in the 80s had full support for C. And those were non-memory protected systems! Running a C program in a separate a process would let you use any existing C runtime; the C runtime doesn't care what language the system-calls are implemented in.


Free beer has nothing to do with moving the world, and rest assured that Linux will fade away when everyone involved with its creation is gone.

It is already starting with MIT/BSD/Apache OSes for IoT.


May or not eventually be true, but your response as per usual wholly misses the point and is irrelevant.


And as usual some people get up on their nerves and feel like answering irrelevant comments instead of moving on.


I'm not the slightest bit unnerved by your comment. But if you feel like answering the original question I posed then I'm all ears.

Even if I indulge your fantasy for a moment - Linux (amongst other systems) has provided myself and many others with a solid career for the better part of the last 25 years, daily security patches from vendors and all, and I don't see any solid evidence of that changing before I retire.

Further I suggest most Linux people would be able to adapt if the world were to move on to an improved competitor with enough momentum to be adopted en masse - which hasn't happened.


That has nothing to do with its technical capabilities, rather a free beer UNIX clone, that is all.


Of course they are secure. What language do you think was used for developing software for airplanes, cars, medical equipment and so on. C can be formally verified while rust currently can't be.


Undefined behavior is not the problem that people make it to be. First of all, undefined behavior is well understood by compilers, in fact it is exactly exploited by compilers to make code run faster. The only thing you need to solve UB is to ask compilers to stop exploiting it (which normally can be done by reducing optimization). And of course you can rewrite your code to stop relying on UB. Despite all the complains, I have never seen code suffering from UB that couldn't be fixed.


>> undefined behavior is well understood by compilers

The security implications of undefined behavior are poorly understood by software developers: https://arc.aiaa.org/doi/pdf/10.2514/1.I010699


> Undefined behavior is not the problem that people make it to be

No, undefined behaviour is a serious problem, especially regarding security.

There's a long history of serious security problems in C and C++ codebases due to unintended invocation of undefined behaviour. These issues continue to arise even in well-resourced C/C++ projects with highly skilled developers, such as the Linux kernel and Chromium.


It’s pretty sweet for optimization though. Doing Nothing is almost always faster than the alternatives.


The reasons for undefined behaviour being in the C/C++ language standards are performance and the ability to naturally support all sorts of different platforms, but I'm not sure the performance argument really holds up with modern optimising compilers.

I admit I don't have hard numbers to hand, and sadly it's rather rare to see decent rigorous comparisons, but I don't think C and C++ have that much of a performance advantage over, say, Ada or Safe Rust.

My inner pedant feels it's necessary to note that Ada is still an unsafe language, but still, it's much less unsafe than C. It also has excellent support for optionally disabling all sorts of runtime safety checks, whereas the design of C and C++ make it extremely difficult to implement such checks as 'opt-in' features in a compiler.


It actually is important, but it's mostly important when your program gets deployed across architectures. If you're only writing for one architecture (say x86) then you can write the C to just work well there.

One example is that undefined signed overflow is important when the compiler might need to rearrange a loop index - PPC prefers to count down, x86 doesn't care.

Anyway, I think undefined behavior + ubsan is better than defining all behavior. If something's undefined you know it's a bug every time you see it. If it's defined, how do you know it's wrong?


> If you're only writing for one architecture (say x86) then you can write the C to just work well there.

> One example is that undefined signed overflow is important when the compiler might need to rearrange a loop index - PPC prefers to count down, x86 doesn't care.

Signed overflow is undefined behaviour regardless of the target hardware architecture. The compiler is permitted to assume the absence of signed overflow and to optimise accordingly. Unless the compiler documentation specifically says its ok, you still have an undefined behaviour problem. It might happen to work fine, sure, but if you're serious about writing correct programs you should be aiming to deliver a program which is correct-by-definition rather than correct-by-coincidence.

> I think undefined behavior + ubsan is better than defining all behavior

It isn't. That's why Rust makes such a big deal of its Safe Rust subset. It's also a major selling point of SPARK Ada for safety-critical software. It's tremendously valuable to be able to categorically close the door on a whole family of potentially serious and difficult to detect bugs.

> If something's undefined you know it's a bug every time you see it.

No, you absolutely don't.

I already mentioned that high-profile projects like Chromium and the Linux kernel continue to face security vulnerabilities arising from unintended invocation of undefined behaviour. Section 7 of the paper discusses undefined behaviour but doesn't really explore its full consequences, so instead I suggest reading [0] and [1].

Undefined behaviour means exactly that: if undefined behaviour has been invoked at runtime, the behaviour of the program is not constrained by the C/C++ standard. The program is not required to explode loudly, it can do anything. It isn't required to behave the same way each time. Hopefully it will explode loudly, but it's possible everything will seem to be fine. In the worst case the undefined behaviour leads to a serious safety issue or security vulnerability.

Undefined behaviour is even permitted to 'time travel'. [0][2]

> If it's defined, how do you know it's wrong?

You use exceptions or some other well-defined means of detecting and handling runtime errors. For example, Java's NullPointerException and Ada's Constraint_Error.

[0] https://blog.regehr.org/archives/213

[1] https://blog.llvm.org/2011/05/what-every-c-programmer-should...

[2] https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...


> Signed overflow is undefined behaviour regardless of the target hardware architecture.

That's what I said. If your program is correct (it may-overflow like everything does, but doesn't dynamically overflow), then by undefining overflow, you can tell the compiler that it doesn't happen. That lets it reorder operations in a way it couldn't if every + potentially wrapped around.

> No, you absolutely don't.

You seem to have said "no" and then agreed with me. I was proposing trapping on all undefined behavior in debug mode!

> You use exceptions or some other well-defined means of detecting and handling runtime errors. For example, Java's NullPointerException and Ada's Constraint_Error.

Trapping is plausible, but that's not always how people want to fix undefined behavior. For instance some people want undefined memory reads to return 0, or want overflow to wrap. In that case it's hard to distinguish errors from intentional behavior.

I don't like exceptions very much either because control flow gets more complicated. Trapping like Swift does is fine, though.


> If your program is correct (it may-overflow like everything does, but doesn't dynamically overflow)

I don't follow the distinction here. A correct program should never invoke signed overflow, regardless of input.

> by undefining overflow, you can tell the compiler that it doesn't happen

Right, that's essentially the effect of the standard saying it's undefined behaviour: it should never happen when the code runs.

> That lets it reorder operations in a way it couldn't if every + potentially wrapped around.

Right, or more generally, it enables various compiler optimisations.

> I was proposing trapping on all undefined behavior in debug mode!

Ok, I thought that by If something's undefined you know it's a bug every time you see it you were saying that UB always results in a loud explosion.

Unfortunately it's not easy to build a C compiler that traps whenever UB is encountered at runtime. An example: the compiler can't know the size of an array passed to your library. C uses 'thin pointers', unlike most languages where, whenever you pass an array, the callee can inspect the array's length.

> Trapping is plausible, but that's not always how people want to fix undefined behavior.

Ada's solution, of raising exceptions (roughly like Java), seems sensible. Of course, part of C's appeal is that it's very compact and lacks things like exceptions.

> some people want undefined memory reads to return 0

That doesn't sound reasonable. To implement that could be pretty burdensome.

> or want overflow to wrap.

This is something some compilers support as a non-standard feature. GCC supports it with the -fwrapv flag. I suppose it would be friendlier if there were a standard and portable #pragma to tell the compiler what you want, but I'm not sure it's a big enough problem to make it into the standard.

You can 'fake it' pretty well by converting to a unsigned integer type, doing the arithmetic, and then converting back to the original signed integer type. You could write a function to do this. You could use the preprocessor to defer to a compiler-specific intrinsic where one is available. I think GCC's __builtin_add_overflow would do the job but its definition isn't terribly explicit regarding wrapping behaviour.

I think this code would do the job portably, and I don't think it relies on anything platform specific. (I'm relying on the signed/unsigned conversions using two's-complement, I believe this is guaranteed by the C/C++ language specs. I've also used fixed-length integer types for good measure.) Godbolt tells me GCC can optimise it down to a single LEA instruction on AMD64.

    #include <cstdint>
    using std::int32_t;
    using std::uint32_t;
    
    /*inline*/ int32_t wrapping_add_int32t(int32_t num1, int32_t num2)
    {
        return (int32_t)((uint32_t)num1 + (uint32_t)num2); // Compiles down to LEA instruction
        // Alternatively (also compiles down to an LEA instruction)
        // int32_t ret;
        // __builtin_add_overflow(num1, num2, &ret);
        // return ret;
    }

See also [0].

> I don't like exceptions very much either because control flow gets more complicated. Trapping like Swift does is fine, though.

I agree it introduces action at a distance flow-control. I'm afraid I don't know Swift.

[0] https://stackoverflow.com/q/59307930/

Vaguely related fun: https://github.com/MaxBarraclough/IntegerAbsoluteDifferenceC...


Ah, apologies. my comment is a bit of a troll. Once the compiler encounters undefined behavior, it's free to do whatever it wants. like, produce an executable that does exit(0); when it hits that condition. Compiler writers are generally ethical people that won't do that, but there are plenty of cases of aggressive optimizers eliminating whole code paths - if(undefined) - well, let's always take the true path. sure hope that undefined wasn't some sort of security check. :shrug:


I would prefer the program always exited on undefined behavior. That would be a hell of a lot more secure than it continuing going potentially blowing the stack or letting the attacker call arbitrary code. It’s the same with memory allocators: if malloc() fails to allocate new memory, I don’t want it to return a NULL. I want it to use a static buffer to log the problem and then exit the process. There is almost never anything you can do when you run out of heap and if you can you aren’t naively using malloc() from your system library and checking for NULL return values.

The whole problem with undefined behavior is that it is faster to not check for undefined behavior and calling exit(1) (exit(0) would be a successful exit). Think about it in slightly higher terms: you implement a linked list that can search for an item and return a pointer to it once it finds it. Your implementation explicitly says that if you search for an item that isn’t in the list you will hit an infinite loop. I disregard the warning and let it search for an item not in the list. I hit an infinite loop. Could you have added a check for “if (current == head)” and bail then returning NULL? Sure you could but that introduces a branch and slows things down. Better label what can happen as undefined behavior because maybe on some future processor you’ll have that check because it’s cheap but on x86 it isn’t so you don’t. This is essentially the same thing.


> if malloc() fails to allocate new memory, I don’t want it to return a NULL. I want it to use a static buffer to log the problem and then exit the process. There is almost never anything you can do when you run out of heap and if you can you aren’t naively using malloc() from your system library and checking for NULL return values.

In C's defence, you can easily get this behaviour by wrapping malloc in a safe_malloc function. Given that C lacks exceptions, it makes good sense to handle unable-to-allocate by returning NULL as this leaves the door open to all possible strategies.


> Once the compiler encounters undefined behavior, it's free to do whatever it wants. like, produce an executable that does exit(0); when it hits that condition. Compiler writers are generally ethical people that won't do that

This is wrong on two points.

Firstly, real world compilers very often do handle various kinds of undefined behaviour with immediate termination. On many platforms, dereferencing a null pointer will result in a segfault. Sometimes compilers generate code to trap if undefined behaviour would result. In the C++ standard, some errors are defined to result in a call to std::terminate, rather than undefined behaviour. [0]

Secondly, as IgorPartola indicates, doing this isn't irresponsible, it's the least bad way to handle undefined behaviour. If your loop has overrun the end of your array, you generally don't want the execution to silently proceed with invalid data, you want execution to end immediately.

[0] https://stackoverflow.com/a/43675980/


Undefined behavior is a problem for projects that are not testing certain conditions under certain compiler configurations. Every time you see these problems happening, the reason is that developers not only wrote incorrect (undefined) code, but also didn't test the code path that was affected. While I agree that a compiler should do a better job, ultimately the responsibility is on programmers (especially with the C language, where this has always been the unwritten contract).


> Undefined behavior is a problem for projects that are not testing certain conditions under certain compiler configurations.

Like I said, undefined behaviour is a problem even for Chromium and the Linux kernel. We've seen that just be diligent isn't a solution. We're way past that now.

> Every time you see these problems happening, the reason is that developers not only wrote incorrect (undefined) code, but also didn't test the code path that was affected.

Technically true, but not insightful. Testing can never be exhaustive, so this observation doesn't light the way to a solution.

> While I agree that a compiler should do a better job, ultimately the responsibility is on programmers

This statement is true of the C language, but it isn't a solution, it's the problem-statement.

It's not true of a language like Safe Rust, where the language itself closes the door on undefined behaviour, and where it is not the programmer's responsibility to avoid invoking undefined behaviour.


It is well known that undefined behavior are not the only bugs happening in codebases like Chromium and Linux. So pointing to these projects doesn't magically make the bug different from other categories of bugs that plague all kinds of projects, even in other languages.

The major compilers already have ways to test for undefined behavior such as -fsanitize=undefined. Projects need to use these flags and test more.


> pointing to them doesn't magically make the bug different from other categories of bugs that plague all kinds of projects, even in other languages

Undefined behaviour is responsible for a non-trivial fraction of the security vulnerabilities of C and C++ codebases. Undefined behaviour in an application can be entirely eliminated by writing the application in a safe language. That's the point.

> The major compilers already have ways to test for undefined behavior such as -fsanitize=undefined. Projects need to use these flags and test more.

Do you really think the Chromium team isn't aware of that flag in GCC and Clang? If there were an easy fix to the problem of accidental invocation of undefined behaviour, the problem would have gone away years ago.

It's useful for a C/C++ compiler to offer to add runtime checks for a subset of the possible causes of undefined behaviour. We agree more projects should use such tools. As we're seeing, though, this isn't a silver bullet. Even extremely well-resourced and security-sensitive codebases end up with UB issues.


The fact that X doesn't solve all security problems doesn't mean that X doesn't eliminate a whole class of problems.

Memory safety problems are still a sizable proportion of the CVEs associated with C and C++ programs.

> The major compilers already have ways to test for undefined behavior such as -fsanitize=undefined. Projects need to use these flags and test more.

This is NOT a solution -- you cannot test every case. In fact, not even close: The possible state space to for signed addition of two 64-bit ints is 2^128. That is infeasible. A compiler for a 'safe' language CAN prove the absence of certain behaviors (UB being one of them).


1. Fixing/avoiding UBs requires discipline (and time) not all programmers have 2. Many programmers pointed to UB in their code would argue that it is not a problem at all - especially if code works for them and users don't complain about bugs caused by UB.

It would be interesting to make a following experiment: * find and patch UBs in multiple opensource projects which are relatively popular (at least used not only by an author) * send pull requests wich UB fixes and see which fraction will be closed as "won't fix".

It requires a lot of time, but may show that many developers don't care about UB.


> 1. Fixing/avoiding UBs requires discipline (and time) not all programmers have 2. Many programmers pointed to UB in their code would argue that it is not a problem at all

These points aren't wrong, but it understates how pernicious undefined behaviour can be. As I stated earlier, even well-resourced high-profile codebases like Chromium and the Linux kernel play cat-and-mouse with security issues originating in undefined behaviour. It's not just a matter of hiring diligent coders and using the usual quality-control processes (testing, code reviews, static analysis).

It's possible for C/C++ code to be entirely free of undefined behaviour, but in practice this only seems to happen when formal methods are used, or when people use code-generators/transpilers to avoid writing C/C++ by hand.


I've entered a mid-life zen w.r.t. languages. Most important is the people and irreplaceable knowledge they have in their heads, about the tricks, methods, and environments they've worked in. Languages correlate with that, and so are a semi-useful indicator of past skills. You want a person to write the bootloader for your Mars helicopter? You hire a C programmer most likely, not a C# programmer, but who knows? You want a person to bootstrap your image processing pipelines for your scientists? Maybe Python? Maybe? This area is much more loose.

If a valuable tool or method is expressed in (or even only expressible in) a particular language, then so be it. Often, there are many more choices than people believe, and what is right for the person, so long as it serves the organization or need appropriately, is fine by me.

A language is a tool. Most languages do pretty much the same thing. Most languages' ecosystems, application adoption, and developers are much more important than the languages' seatbelts and headlights.


What is also important is the ecosystem that comes along with the language. Modern languages have a wide ranging list of libraries and plugins for them which make certain programming tasks easier. But that also means you have an ever shifting stack that is required to support them. This became apparent to me when trying to get some infrastructure software to play nice with some of the older systems we need to keep around. There were some nice Python based solutions that I had to reject because the dependencies didn't exist for some older RHEL installations we have (yes, we still need to keep a hand full of RHEL 4.x systems around because management doesn't want to tell customers "no we won't support you unless you upgrade to our latest product that works on newer OS releases". So for backup and management solutions, I have to stick with tools that can be easily compiled on the older environments.

Another example, one of our architects at work is a strong supporter of Apple, and wanted me to look into Swift. Well at the time you could get Swift for Ubuntu, but couldn't for any version of RHEL (that has finally changed now though). So again, writing my code in plain C was more of a win.


I disagree that languages "pretty much the same thing". However, I do agree that language choice is overrated. Other factors, e.g. which one you know better, weigh heavier.


OK, over-simplification.

Perhaps: Language in-class variance is small (C# ~= Java, C++, C, RUST are close-ish). Cross-class variance is big (JS vs Rust). Therefore, recruiting a programmer from the same problem / language class is more important than the particular language.


Sounds like a case of Blub Paradox: http://www.paulgraham.com/avg.html


> Most obviously in C, we note that a malloc() implementa- tion is usually written in C—or rather, in a subset of C that lacks malloc() since malloc() is mandated by the C standard.

Not always. Google’s tcmalloc is actually written in C++.

I actually would have agreed with a lot of this in 2017. However, in the past several years, I think Rust has been a game changer. It can interface with the C ABI. It can access the same low-level abstract machine that C can. It doesn’t have garbage collection or virtual machines. And it provides memory safety out of the box. In addition, features such as strong types and pattern matching help less logical bugs as well (like the compiler checking that you did not forget one arm of an Enum).

This is born out now that a lot of security facing software is starting to do at least part of their internals in Rust, where they had been in C before.

I think Rust is and will be even more in the future a game changer in how we write the foundation programs and libraries that the computing world is built on.


What they're saying is that malloc() can't be implemented in C, because C defines malloc() as having behavior that's not possible in C's memory model.

I think it is more possible in C++ due to placement new, though tcmalloc is probably still relying on language extensions.


I don't know why you think magic is needed to implement malloc(). It can be readily implemented in C. It controls a range of memory and doles out chunks upon request. Doing that in a system with an MMU requires hardware interaction to handle virtual memory but that isn't outside of the wheelhouse for C.


malloc() in C returns a new memory object each time you call it (I forget the exact wording here). There is no operation in the C standard that creates new memory objects; the only way to do it is by calling malloc(). And obviously, this doesn't let you implement it.


You don't have to "create a new memory object". C doesn't say how malloc() is implemented. It is perfectly valid to have a statically defined array and have malloc() pull from that. The majority of the computers in the world do this. You're conflating limitations imposed by layers of OS infrastructure with limitations of C.


I'm not talking about any OS concept, but about the concept of how memory works in the virtual machine defined by the C standard. That implementation isn't compliant just because it works; if an undefined behavior checker could see into the malloc implementation, it would see that it's returning pointers inside the same object, which conflicts with the (impossible to implement) rule that malloc returns pointers to different objects.

The practical consequence of this is malloc is kept in a different library and not inlined into callers, and asan doesn't work with custom allocators unless you tell it they behave like malloc.


It is outside of what ISO C defines and requires either language extensions or external Assembly.


> What they're saying is that malloc() can't be implemented in C, because C defines malloc() as having behavior that's not possible in C's memory model.

Is that due to aliasing rules?


That and free() making pointers invalid. Since if you're mallocing out of a pool, freed pointers tend to still work if you write to them after free. It just crashes later…


free() does not need to make pointers invalid. Use after free is undefined behavior.


What? Linux is written in C and it there is no malloc in the kernel. Just don't include standard library.


Linux isn't written in C either, it's written in C with a lot of language extensions. The ability to implement malloc() is just another language extension where the compiler promises to ignore the type aliasing issues.

Implementing memory barriers (pthread_mutex_lock()) used to be strictly speaking impossible in C too, but it's possible now that it has a defined memory model.


Ignoring that the kernel has lots of raw machine code in it, "not possible" and undefined or implementation defined behaviour are different things either practically or in terms of programming language semantics.


I remember when this paper came out, I tweeted at the author about D's "Better C" mode since he used that very term. It really is a better C, because C is almost a subset of D. You get nice features like array bounds checking, but no runtime, no garbage collector, etc. It's a good choice for those that prefer to stick with C but wish there was a 2021 upgrade.

https://dlang.org/spec/betterc.html


What's the developer experience like in D? I keep looking at it from time to time but haven't taken the time to learn it (even though I've spent time learning a lot of new languages) I was never sure if there was a big enough community around D to make it worthwhile but it seems to have a lot of the features I want.


Just for clarity, to keep on the topic of the article: If you compile a D program with the -betterC flag, you give up many features so that your program runs with only the C runtime. It's great for a C programmer not wanting to learn a new language or for being able to make incremental changes to a C codebase while adding things like metaprogramming. If you're satisfied with the experience writing C, you'll probably also be satisfied writing D and compiling with the -betterC flag. As an example, here's a -betterC hello world:

https://run.dlang.io/is/TKOBgA

Once you move on to other goals (using the whole language, as is usually the case) there are numerous complaints. Some don't think the VS Code plugin is good enough and that sort of thing. Some argue that Dub, the package manager, is not good enough for their needs. I suppose like every language has people that try it and don't like it.

It doesn't take much to try it. You can use the online D editor and read the official tutorial. If you like it, you can dig in further to see if it has the ecosystem you need.

https://run.dlang.io/

http://ddili.org/ders/d.en/index.html


Pretty Good - everything you actually need is there and working: There are some things we don't have, (say) the automated refactoring story is poor, but that's because the language is designed in such a way that such tools aren't required.

D has evolved (until the last few months basically) without any direct corporate backing and yet absolutely stomps quite a few very very well-funded languages from a design perspective (e.g. A template constraint in D is so simple it's one keyword, and you can even do your own error messages as a library and compose things, whereas C++ took almost my entire lifetime to standardise concepts).

D exists as a (large) group of very simple language design decisions - for example, we have

    unitest { /* code here */ }
blocks that make writing unit tests much lower friction.

We just had our monthly "Beerconf" online conference, and all I can say is that the knowledge per head in the D community is very high.


How would you compare it to Zig? I recently read through Zig’s docs and thought it was neat, though was somewhat dismayed at the fact that it isn’t actually memory safe.


I don't know much if anything about Zig I'm afraid.


Who are the designers/institution behind D and why they have not promoted it as much as newer alternatives?


https://en.wikipedia.org/wiki/Walter_Bright

https://en.wikipedia.org/wiki/Andrei_Alexandrescu

They've promoted it (holding annual conferences and such) but they don't have Mozilla or Google behind them, so resources are limited.

Edit: And here are some blog posts by Walter Bright about D as a Better C.

https://dlang.org/blog/2017/08/23/d-as-a-better-c/

https://dlang.org/blog/2018/02/07/vanquish-forever-these-bug...

https://dlang.org/blog/2018/06/11/dasbetterc-converting-make...


Well, Rust doesn't have Mozilla behind them anymore either (ok, they now have a foundation with not only Mozilla, but also Microsoft, Amazon, Google and - whoa! - Huawei as sponsors, so you could argue that's an improvement). And I'm not sure how much Google is really invested in backing Go. But yeah, I guess D looks small in comparison to both...


This goes both ways, but one thing that we need to be louder about as a (D) community is that there are features that these languages (e.g. Rust) make a lot of noise about that Rust has had for years now - e.g. D's interop with C++ is extremely good, to the point where D should be able to catch an exception from C++ and unwind correctly. Another example is the Integers-in-Generics post that hit 300 points, which D has had for years now.

There are some rust features that I want in D, but we'll see about that.


Walter Bright (original creator of D and the D compiler) posts here pretty frequently.


Walter might be asleep at the moment; I work for the D foundation if anyone has any probing questions


It's only relatively recently that D open source compilers became available. And by that point it felt too much of an incremental improvement over C++ to seriously gain traction, imho. The design also feels dated in places compared to rust (not talking about the borrow checker ; more about the type declaration syntax, not being expression based, no ADTs or pattern matching, etc.)


If curious, past threads:

Some Were Meant for C (2017) [pdf] - https://news.ycombinator.com/item?id=19736214 - April 2019 (176 comments)

Some Were Meant for C: The Endurance of an Unmanageable Language [pdf] - https://news.ycombinator.com/item?id=15179188 - Sept 2017 (240 comments)


NASA just sent a rover on Mars using software written in C. Meanwhile some Rust fanatics are busy telling everyone how it doesn’t work.


Wat

The types of analysis and programming practices used to send stuff to Mars is beyond what Rust, or D, or any other safer-systems-language tries to do. It's not that simple.

These types of projects effectively need to prove the absence of bugs using formal verification and very extensive testing. Surprise surprise, C makes it extremely expensive and theoretically difficult too.

For example: NASA wrote this project https://github.com/NASA-SW-VnV/ikos which uses abstract interpretation and would catch bugs in practically any language.


The Rover is not connected to the internet.

Do you know what subset of C NASA limits itself to? Or hw architecture? The rigour of their testing? Should all C developers follow the same restrictions as NASA?


The rover was built by NASA JPL, and they have their own coding standard based on MISRA 2004.

https://andrewbanks.com/wp-content/uploads/2019/07/JPL_Codin...

https://nodis3.gsfc.nasa.gov/displayAll.cfm?Internal_ID=N_PR...

Hardware (according to Wikipedia) is a BAE Systems RAD750 radiation-hardened single board computer based on a ruggedized PowerPC G3 microprocessor (PowerPC 750). The computer contains 128 megabytes of volatile DRAM, and runs at 133 MHz.

https://en.wikipedia.org/wiki/Perseverance_(rover)

Testing sounds pretty rigorous, at least for large projects.

https://www.quora.com/What-does-a-software-engineer-do-at-th...

Personally I firmly believe that "all C developers" do not need to follow these regulations. It might even be counter-productive to slow down the development process for some clients. For safety-critical systems, these rules make sense. For little startups, they don't.

Developers are smart enough to learn these rules, so HR shouldn't ask for "5 years MISRA experience". It's really a choice of business model, time to market, and risk management. If you're a big company looking to cut costs, be careful about outsourcing firmware development to a little startup who might not follow these rules so strictly. I won't follow these rules for the stuff I throw together in my free time and put on Github, but I will be careful before committing code to master for medical device firmware.


C is already a subset. It's basically one step above assembly.


His point is moot anyway. They could have written the entire code base in Rust and compiled it down to C for all we know.


Rovers usually have a timeline of 7-8 years for desigining and building. Rust had not hit 1.0 at the time Nasa probably started designing the rover.

So any indication on what Nasa would use on its Rovers has to be taken from projects that start from the point when Rust released 1.0


According to my information the rover software is in C++. The OS is in C. The C++ classes are mostly autogenerated by python.


The author's conclusion:

I have argued that C’s enduring popularity is wrongly ascribed to performance concerns; in reality one large component of it (the “application” component) owes to decades-old gaps in migration and integration support among proposed alternatives; another large component of it (the “systems”component) owes to a fundamental and distinctive property of the language which I have called its communicativity, and for which neither migration nor integration can be sufficient. I have also argued that the problems symptomatic of C code today are wrongly ascribed to the C language; in reality they relate to its implementations, and where for each problem the research literature presents compelling alternative implementation approaches. From this, many of the orthodox attitudes around C are ill-founded. There is no particular need to rewrite existing C code, provided the same benefit can be obtained more cheaply by alternative implementations of C. Nor is there a need to abandon C as a legitimate choice of language for new code, since C’s distinctive features offer unique value in some cases. The equivocation of “managed” with “safe”implementations, and indeed the confusion of languages with their implementations, have obscured these points. Rather than abandoning C and simply embracing new languages implemented along established, contemporary lines, I believe a more feasible path to our desired ends lies in both better and materially different implementations of both C and non-C languages alike. These implementations must subscribe to different principles, emphasising heterarchy, plurality and co-existence, placing higher premium on the concerns of (in application code) migration and interoperation, and (in the case of systems code) communicativity. My concrete suggestions—in particular, to implement a“safe C”, and to focus attention on communicativity issues in this and any proposed “better C”—remain unproven, and perhaps serve better as the beginning of a thought process than as a certain destination. C is far from sacred, and I look forward to its replacements—but they must not forget the importance of communicating with aliens.


Relevant to recent discussions, even if it was published in 2017.

It is quite more elaborate than other publications I’ve seen mentioned in those discussions.

I’ll quote section 6.2, “What is Safety Anyway?”:

> I have learned to enjoy provoking indignant incredulity by claiming that C can be implemented safely. It usually transpires that the audience have so strongly associated “safe” with “not like C” that certain knots need careful unpicking.

> In fact, the very “unsafety” of C is based on an unfortunate conflation of the language itself with how it is implemented.

> Working from first principles, it is not hard to imagine a safe C. As Krishnamurthi and Felleisen [1999] elaborated, safety is about catching errors immediately and cleanly rather than gradually and corruptingly. Ungar et al. [2005] echoed this by defining a safety property as “the behavior of any program, correct or not, can be easily understood in terms of the source-level language semantics”—that is, with a clean error report, not the arbitrary continuation of execution after the point of the error.


The paper starts with a series of snippets designed to show off the unique abilities of C that are chock full of undefined behavior, and then argues that C can be used safely if only programmers and compiler writers were to agree to stay away from undefined behavior. I'm getting mixed messages here…

And the title, to me, evokes William Blake: "Some are Born to Endless Night".


The author raises an important issue here: many people are lead to believe that using C is inherently unsafe. That's not true, and many of the most secure systems in the world were written in C. The other direction also doesn't work: software written in languages like Java can be effectively unsafe.


One of the most secure OSes is ClearPath MCP, zero lines of C on its kernel, rather NEWP.

Azure Sphere, Solaris and latest versions of iOS all rely on some variation of hardware memory tagging to tame C exploits.


iOS uses ARM "pointer authentication codes". There's actually a newer extension called ARM "memory tagging extension" and it doesn't use it as far as I know.


Correct, the point being it uses something.

It also uses a safe C dialect for iBoot.

https://support.apple.com/guide/security/memory-safe-iboot-i...


Once again.

We know from 40 years of discovering memory-related vulnerabilities in even the most carefully written, rigorously tested C programs that writing safe C is intractable for real, human software engineers. So yes, C IS INHERENTLY UNSAFE. If you claim otherwise you clearly haven't been paying attention to what's going on.


> A final interesting property of this code is that its be- haviour is undefined according to the C language standard. The reason is that it calls memcpy() across a range of memory comprising multiple distinct C objects, copying them all into memory-mapped storage in a single operation.

What's wrong with memcpy here? As long as dst and src are both non-zero and the ranges of memory are not overlapping, the behavior of memcpy is well defined.


Pointers into a copied region are invalidated. Only GC languages do such memory copying safely.

It's also insanely slow with gcc, compared to clang. Like factor 1000 with some compile-time constants.


Structure alignments…


As long as the size argument is initialized as sizeof(struct ...), it is totally fine, isn't it?


All I really want added to C is Zig's comptime.


Is the second example (the auxv stuff in section 5.1) invoking undefined behavior in here "at_null−>a_type == AT_NULL"? As far as I understand, in C you really generally can not pull out valid pointer out of thin air like author is doing there. Isn't that the whole idea behind "pointer provenance"?


It's not pulled out of thin air, it's constructed from `environ`.


Can someone explain me why do we use C for writing code for electronic control unit of a vehicle motor if it is so unsafe? It is true that ECUs are programmed with code generated though model based design,but there can be some parts manually programmed. Maybe this is why they use only a subset of C (Misra C)


I've written code for ECUs. Misra C forces a straightjacket on C to keep away from dark corners, and rather enforces a 'bland' C style that is easy for some other engineer to understand. At first, one complains about the details; then, they become built-into-your-brain 'macros' so they are no longer (much of) an impediment.

There is one other reason, and that's until recently, auto-qual MPUs with fancy floating-point units (or any floating point units) were very rare; hacks such as Qm.n notation were required to do anything semi-fancy with trig etc. This was true even 5 years ago, although I would hope by now 'decent' auto-qual parts that are cheap enough exist.

Lastly, there are a boatload of requirements for auto software; you have to fail-safe as your power is going away (e.g. a crash is happening). You need to have get reset to any value, and your CPU needs to detect something is wrong and reset itself. There are different failure tests for different subsystem; 'body' electronics isn't quite as stringent as propulsion.

There is also a misra C++ spec; I was unsuccessful getting even a pilot project with C++, as it's also substantially simplified in the 'legal' subset; and is rather nicer than C in many ways. But.... C is going to be with us for decades more, I think.


Probably because of

1. Inertia. its already being used, so why not keep using it? I can't prove this, but it feels true

2. Lots of microcontroller vendors provide tooling for it. You are basically guaranteed to be able to run C on whatever micro controller you want. Even if you don't get a standard library, you can implement your own if you need to. Languages like C++ have large runtimes and are hard to port to lots of platforms

3. Its an "easy" language to use for people who work on these devices and the problems of an ECU are not the problems of a database for example. I've worked on large enterprise databases and now I work on cameras for a car-ish company. The scale is just different and you simply don't need as many features as in C++, haskell, rust, python. The hard part on a small embedded system is not the coding per se, its the algorithms. A lot of these devices work at fixed rates on a timer (or respond to a very periodic interrupt), have a well understood amount of work to do in a fixed window of time, and then go back to sleep until the next job is queued. Memory safety won't help you when critical failure is defined as being late to respond. Would I love to have total memory safety at the moment? Totally! But its the least of my concerns at the moment and if my toolchain doesn't support it, then oh well we just have to code to the standards of misra and follow standards like iso26262.

I should say, writing C on a large project sucks. I've seen it before and I personally hate it (large portions of the database were in C). Its just too complicated and while C++ isn't perfect, having things like a destructor and templates are really nice and just makes the problems tractable.


Usually in embedded development you don't ever want to use dynamic allocation. Very few languages allow you to avoid dynamic allocation.


yeah I know that, and I always wondered how they implement basic data structure like linked list or bst with static allocation. I guess that this will be never something really needed in an embedded system, and they can live with arrays and variables.


I would like to mention that C is able (certified) for writing safety critical code while rust isn't. Therefore critical systems for aerospace, auto industry and other special industries are mostly written in C.


I believe the main issue at play here is a paradox in software engineering. The paradox is this: safe languages are more useful on complex projects that in simple projects, but complex projects suffer more from performance degradation and system integration issues when these languages are used. Put from the other side, C is perfectly fine to write short pieces of code, but despite its problems it may be the only reasonable language to write large pieces of system software (I'm including C++ here as a "kind of" of C, just like Objective-C).


> safe languages are more useful on complex projects that in simple projects, but complex projects suffer more from performance degradation

I don't think that's true. C is often slower than C++ because of how inlining works, plus some domains (e.g. compilers) it's best to just use a GC language from the get-go.


C is often slower than C++ because of how inlining works

Hang on, you’re cutting a few corners there! It’s easy to use inline functions in C too.

Are you thinking of C++ templates, and the fact that e.g. std::sort is faster than qsort because it directly calls an inlined comparison function rather than a function pointer?

That’s true, although it’s possible to achieve similar performance levels in C via hand-rolled data structures or macro hackery. I’ll grant you that the efficient C++ code is more idiomatic and likely safer. On the other hand, idiomatic C code is likely smaller when compiled, which can be important for performance too.

I don’t believe “C is often slower than C++” is true in general.


std::sort is the poster child that is supposed to demonstrate the power of C++ templates. When in fact it is awkward to use (as an infrequent user of C++, why can't I never seem to remember how to wrap / make the comparison object?) and more importantly, sort performance is most often completely irrelevant to the performance of a program.

And when it's not irrelevant, it's almost 100% certain that std::sort is not the right thing to use. Where it matters, it's probably possible to examine the context a little more closely and come up with a custom sort that runs in O(n) or at least faster than std::sort.


Adding custom comparators is trivial with lambdas;

    std::sort( begin( c ), end( c ), []( const auto& lhs, const auto& rhs ) { /* some comparison impl for the value_type */ } );
Same for everything that take a comparator in <algorithm>


Yeah, the real poster-child for C++ performance should be something like Eigen


Or, you could just call std::sort and be done with it. (And, oh, by the way, it is faster than qsort - templated code often is.)


You haven't read my comment at all.


But is there some guides or books that teach people write safe C codes? Is writing safe C codes possible?


I'm an embedded software engineer for a medical device manufacturer, and we've had training on MISRA C. It applies to the whole software lifecycle, and tries hard to make a subset of C that is appropriate for safety-critical systems.

https://en.wikipedia.org/wiki/MISRA_C

Personally I don't agree with all the guidelines. This morning a colleague said "I'm so glad someone had the foresight to include [unused API function]". Strict MISRA compliance would require "no unused code". I also prefer having more comments. Generally though, I think the advice is helpful, particularly about specific types (uint32_t not int), avoiding malloc, and complexity limits.

This is a totally different style of coding to what I've been used to at startups in the past. It's "software engineering" instead of "move fast and break things". During my 20s, I'm glad I had more broad experience of creative solutions, new languages, janky code, late nights, pretty demos, shipping /something/ then iterating on that. Now I'm 31 and settling down, this slower-paced but much more careful approach seems more suitable.



>"...I use because I’m stuck with it; I use it for positive reasons. "

I absolutely agree with that statement. When I do firmware for small MCUs I feel big fat zero need for any other language.


Isn't OpenBSD written in C? w.r.t. "unsafe", just sayin'.


The article mentions C++ three times, all in contexts equating it with C. But C++, as it is coded today, is a very different language from C, and does not suffer from the problems that make C an extremely poor choice for starting any new project that might matter.

All the article's arguments for C apply substantially moreso to C++. Thus, the article leaves us with no objectively plausible reason ever to code in C, except where artificial constraints mandate it, or where merit doesn't matter. (I leave to the reader to decide where Linux and BSD kernels fit in that.)

There is especially no excuse for systemd to be coded in C.


C++ still has a lot of rope to hang yourself. For example, there is the stuff Rusts borrow checker prevents. Or that members are initialized by declaration order and a wrong initializer list can lead to undefined behavior. Or all those implicit conversion rules (ok, the annoying tendency to convert to int is inherited from C).

The most ugly aspect is that C++ has a lock-in effect like Whatsapp. As long as your codebase is in C, you can easily interface with nearly all other languages. Once important parts are in C++ though, only C++ can reasonably interface with it.


If "lock-in" is something wrong with C++, it is equally so with Rust. But in fact anything you want callable from C, in either C++ or Rust, is easy to keep that way.

To the other point, you can write bad code in any language, Rust included. But you don't have to. A language can help by making good code easier to write than bad code. C fails so frequently by making good code much harder to write, instead.


C++ is about 17 languages


That’s C++17 you are talking about.


Nope. Ancient C++, c++98, 03, 11, 14, 17, 20

I count 7, plus msvc, which adds another one. 8


I disagree. First of all, most C code can be run as C++, so everything bad you can say about C applies to C++ as well. Second, C++ introduces its own problems that also make programming unsafe compared to other managed languages and even compared to C.


You would constraint yourself to a subset of C++ which is not C. For example, forbid all raw pointers and only use smart pointers. Now your memory is freed automatically via RAII and use-after-free errors are much harder to create.


As you said, this is a subset of C++. But everyone debates what the right subset should be. In reality C++ is incapable of solving the problem, it can only propose good practices.


Though if one only uses smart pointers they will have to accept the performance hit that that brings -- in high performance situations, this may be unacceptable.


Circumstances where using smart pointers affects performance are vanishingly rare. Where performance matters, on any critical path no allocation or deallocation is done, so a smart pointer can have no effect. On other paths, any difference is swamped by overhead in malloc and free.

So, in actual practice, no.


There's some argument to be made about bookkeeping and (maybe? I'm not sure) an extra level of indirection with shared_ptr, but the cost of unique_ptr is literally zero.


Passing unique_ptr by move works out to be surprisingly expensive, so one doesn't, where that matters. (It is usually a mistake to have unique_ptr visible in an interface at all, but passing a pimpl type, implemented with unique_ptr, by move costs exactly the same.)

And of course no one who cares for their reputation is seen using shared_ptr.


What's the expense of moving unique_ptr -- is it resetting the source to nullptr?


My guess is that because in most common x86_64 calling conventions the caller is responsible for destructing the parameters, it has to run the unique_ptr parameter destructor which needs to check if it was moved from in the callee or not. Additionally, because this destructor is not a trivial one (i.e. it does something), unique_ptr cannot be passed directly through a register but must be spilled on the stack.


This is correct. Chandler Carruth gave a talk about unfortunate pessimizations imposed by the ABI. Where the call is inlined, that stuff is optimized out, but functions that take a moved unique_ptr or pimpl object -- taking ownership of a heap object -- tend not to be inlined unless they mostly just pass it along to one that isn't inlined.

So, the cost of moving a smart pointer onto the stack and, presumably, off again to someplace less ephemeral might matter, on a critical path. But it would be a mistake to exaggerate this: we are talking about a couple of unfortunate ordinary memory operations, not traps, function calls, or atomic synchronizations, when you might have hoped to be using just registers. If you are already touching memory -- something pretty common -- it is just more of that.


Wow, did not know this, I'm going to go and have a deeper look at it now. Thanks to all in this comment thread!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: