C and C++ prioritize performance over correctness

haberman · on Aug 18, 2023

> There are undeniably power users for whom every last bit of performance translates to very large sums of money, and I don’t claim to know how to satisfy them otherwise.

That is the key, right there.

In the 1970s, C may have been considered a general-purpose programming langauge. Today, given the landscape of languages currently available, C and C++ have a much more niche role. They are appropriate for the "power users" described above, who need every last bit of performance, at the cost of more development effort.

When I'm working in C, I'm frequently watching the assembly language output closely, making sure that I'm getting the optimizations I expect. I frequently find missed optimization bugs in compilers. In these scenarios, undefined behavior is a tool that can actually help achieve my goal. The question I'm always asking myself is: what do I have to write in C to get the assembly language output I expect? Here is an example of such a journey: https://blog.reverberate.org/2021/04/21/musttail-efficient-i...

I created the https://github.com/protocolbuffers/upb project a long time ago. It's written in C, and over the years it has gotten to a state where the speed and code size are pretty compelling. Both speed and code size are very important to the use cases where it is being used. It's a relatively small code base also. I think focused, performance-oriented kernels are the area where C makes the most sense.

overgard · on Aug 18, 2023

> In the 1970s, C may have been considered a general-purpose programming langauge. Today, given the landscape of languages currently available, C and C++ have a much more niche role. They are appropriate for the "power users" described above, who need every last bit of performance, at the cost of more development effort.

I really don't think this is true. I've worked in CAD and video games and embedded software, and in all those you're likely using C++ (not to mention a lot of desktop software that can't afford to embed a chromium instance for electron.) For some reason people here just assume that anything that isn't web development or backend server CRUD stuff is a niche.

As much attention as something like Rust or whatnot gets on hacker news, the reality is that if you can't afford garbage collector pauses, and you need performance, you're using C/C++ most of the time.

kiratp · on Aug 18, 2023

Rust is not a GC language. It can and does achieve the same perf as any modem C++ with the latest flavor of the C++ safety guardrails.

The set of reasons to start a new project in C/C++ are few and the list is shrinking by the day.

MarkMarine · on Aug 18, 2023

I love Rust, but as a daily C++ user (embedded microcontrollers, not supported by Rust yet, and then native Linux tooling to interface with them) what I find most frustrating about Rust is the number of crates that need to be knit together to make working software. Often these crates are from a single dev who last committed in 2020 or something. I really wish there was something like go’s standard library in Rust. That has been my barrier to adoption, and believe me I WANT to use it at work and I don’t want to juggle chainsaws anymore.

dgacmu · on Aug 18, 2023

I'm sympathetic. I'm partway into an embedded project on an stm32 / m0 for which there _is_ good rust support through Embassy and it's utterly magical. And at the same time, trying to do something for which there isn't a good no-std crate is, er, anti-magical to the point of forcing me to change the design. The ecosystem isn't as comprehensive yet.

But when it works, wow. This has been the most fun embedded project I've ever done with the least debugging.

oll3 · on Aug 18, 2023

I have replaced my use of C (and C++) in embedded with Rust for the last couple of years. Sure, some parts are still missing or immature, but overall also very promising. And I do enjoy it so much more than the MCU manufacturer's please-insert-your-c-code-between-these-comments-and-pray kind of development.

stjohnswarts · on Aug 19, 2023

Are you doing this on personal projects or commercial projects? I'm pretty happy with c++17 on all my embedded commercial projects

oll3 · on Aug 19, 2023

Strictly commercial. But if one is happy with other language I guess there is little reason to switch.

Ar-Curunir · on Aug 18, 2023

Er how is that different from C++? It doesn't have a go-like std library either, and you can totally use Rust without crates.io in the same manner.

MarkMarine · on Aug 18, 2023

I suppose that wasn’t clear. My mistake. I use C/C++ for micro controller code every day, because Rust doesn’t support my microcontroller, doesn’t have a real RTOS, and I’m making real hardware. Something that needs to run in the field for 10+ years and I can’t get out there and ensure it’s updated, and my company doesn’t want to invest millions of dollars on beta testing embedded rust on. So embedded is out for now but I’m looking forward to when it’s supported fully and out of beta. Most embedded controllers come with some RTOS and libraries they support, written in C.

For tooling, I can context switch out of C++ which does have boost, into [rust, go, python… etc] and deal with that switch, or just be lazy and write it in C++. I’ve tried to write three tools in Rust so far, and the pain of not having a good stdlib, of essentially searching the internet for a blog post that solves my issue then finding the blog post was written by one dev 4 years ago to pump up their library that was written 4 years ago and then never supported after that… it’s a bit exhausting.

Again, before ya’ll attack. This is from the perspective of a willing, excited customer. I want to use Rust at work and advocate for it. Just saying, it’s not easy in the state it’s in.

seabass-labrax · on Aug 18, 2023

Which microcontroller are you using? Rust support for embedded targets is slowly improving, so there might be a beta build for your chip.

MarkMarine · on Aug 19, 2023

ESP-32, the Rust support is not mature enough to ship commercially.

C-410 from Qualcomm, which is a whole different world, I can’t even look at the source code.

xedrac · on Aug 18, 2023

In C++, you'll often see projects just use boost, which is a big monolith of useful libraries.

tumdum_ · on Aug 18, 2023

Embedded microcontrollers are not the place where boost is used.

oasisaimlessly · on Aug 19, 2023

Counterpoint: I use Boost on embedded microcontrollers. (Albeit only header-only components of Boost.)

hdjfkfbfbr · on Aug 19, 2023

The people behind the esp32 are writing a rust compiler.... I have played with it only a little, but it worked

Edit:spelling

MarkMarine · on Aug 19, 2023

Not a rust compiler, they provide crates to interface with their C code in ESP-IDF, which is essentially a FreeRTOS port that supports the dual core Espressif chips, which vanilla FreeRTOS does not support. Also some of their own libraries for things like MQTT, which I found unfortunately subpar in comparison to the vanilla FreeRTOS code.

It’s all beta software, but here is what they list in the docs:

Services like Wi-Fi, HTTP client/server, MQTT, OTA updates, logging etc. are exposed via Espressif's open source IoT Development Framework, ESP-IDF. It is mostly written in C and as such is exposed to Rust in the canonical split crate style:

a sys crate to provide the actual unsafe bindings (esp-idf-sys) a higher level crate offering safe and comfortable Rust abstractions (esp-idf-svc) The final piece of the puzzle is low-level hardware access, which is again provided in a split fashion:

esp-idf-hal implements the hardware-independent embedded-hal traits like analog/digital conversion, digital I/O pins, or SPI communication - as the name suggests, it also uses ESP-IDF as a foundation if direct register manipulation is required, esp32c3 provides the peripheral access crate generated by svd2rust. More information is available in the ecosystem chapter of The Rust on ESP Book.

stjohnswarts · on Aug 19, 2023

A full rust compiler instead of use the "de facto" one? on a language as complicated as rust that sounds scary unless they have a lot of financial backing and significant amount of compiler devs.

jacquesm · on Aug 19, 2023

Why would they not just adapt the code generator for the reference compiler? Rust is a moving target, committing to a new compiler at this point is a massive investment.

bayindirh · on Aug 18, 2023

However, Rust doesn't have a "I know what I'm doing, let me be" switch. No, "unsafe" is not that.

I have a couple of languages in my belt C, C++, Go, Python, Java are the primary ones. C and C++ are reserved for "power use". For power use, I mean scientific, HPC stuff or any code I need the full capacity of the processor.

To get that, sometimes you need -ffmast-math, sometimes said undefined behavior, or code patterns which translate to better code for the processor you target.

Rust doesn't allow any of that, moreover limits me the way I can write my programs, no thanks.

For security-sensitive compiled code, I can use Rust, but I can't use Rust anywhere, and everywhere. It also has its niche, and it can't and won't replace C or C++ at the end of the day. It'll get its slice (which might be sizeable), a couple of compiler implementations, and will evolve to become another mainstream , boring language, which is good.

Also, if you give half an effort, C++ is pretty secure and robust to begin with, esp. on the pointers department. Just be mindful, and run a couple of valgrind tests on your code, and you're set. Been there, done that.

SubjectToChange · on Aug 18, 2023

To get that, sometimes you need -ffmast-math, sometimes said undefined behavior, or code patterns which translate to better code for the processor you target.

There isn't really any fundamental limitation of Rust for those situations. A better reason why C++ is better than Rust for compute is the support for OpenACC, OpenCL, OpenMP, CUDA, ROCm/HIP, etc.

Also, if you give half an effort, C++ is pretty secure and robust to begin with, esp. on the pointers department.

The problem is that even if a "Safe" C++ could be defined, it would need to interoperate with the rest of the "Unsafe" C++ world. Perhaps a programmer can write absolutely safe C++ themselves but all bets are off once you're working with other programmers.

Just be mindful, and run a couple of valgrind tests on your code, and you're set. Been there, done that.

Seriously? A "couple of valgrind tests" would be insufficient for catching all but the most trivial and/or blatant memory safety issues in C++, especially when dealing with malicious users. Although I agree that dynamic analysis like valgrind and sanitizers should be the default for C++ development, industrial grade static analysis tools are at least as important. And needless to say, if you find Rust complains about code too much then just wait for what static analysis tooling will complain about.

bayindirh · on Aug 18, 2023

That kind of integration is another reason, yes, but this is a passable moat, so I'm not talking about it much (mind you, I live in HPC world).

If programming is a team sport, every team player should be up to the level your project needs. If they are not, they should raise their bars. Seriously.

Yes, seriously, be mindful. Implement allocation and destruction first, fill your code between these two steps.

A couple of Valgrind tests is actually a whole procedure. It's an unfortunate downplay by me [0], probably because I'm very used to use Valgrind as a second nature.

[0]: https://news.ycombinator.com/item?id=31218757

kaba0 · on Aug 19, 2023

So what about multithreaded code with potential race conditions that result in memory errors sometimes. Valgrind already only able to test the actually ran code paths, let alone find race condition-y bugs.

Also, this elitist notion of raising the bar is an instant “turnoff” for me, with all due respect, I can only assume from that that the given person may not be as much an expert as they think of themselves. Not saying that it applies to you as well, but you don’t leave me much other choice.

lenkite · on Aug 19, 2023

Memory safe - I will give you that. All the guarantees you listed are fine for synchronous Rust. But race conditions and deadlocks are not unknown in async Rust. Lots of race condition issues in tokio - including issues where futures leak.

kaba0 · on Aug 19, 2023

That’s true, and usually I am the one pointing this out, that Rust is only data race free :D thanks for the pointer!

kiratp · on Aug 18, 2023

> Rust doesn't have a "I know what I'm doing, let me be" switch.

Every piece of code you write is security-sensitive. IMO It is our professional responsibility to our users that we treat it that way.

For a vast majority of situations, even if you know what you are doing, you are human and therefore will make mistakes.

If the (probably) most heavily invested C++ codebase, Chrome, has memory use-after-free/overflow bugs, there is no way that truly safe C++ can be written at any product-scale.

bayindirh · on Aug 18, 2023

> Every piece of code you write is security-sensitive.

No, not every application is CRUD, not every application is interactive and no, not every application serves users and communicate with outside world in a way allowing to be abused during its lifetime.

This doesn't mean it gives me the freedom to write reckless code all over, but the security model of an authentication module and a mathematical simulation is vastly different.

> there is no way that truly safe C++ can be written at any product-scale.

I agree to disagree here, because of my own experience.

pjmlp · on Aug 19, 2023

Like every other job in the world has a concept of liability,that will eventually arrive on computing world at scale.

bayindirh · on Aug 19, 2023

The assumptions you make are almost palpable at this point.

Simulation precision and accuracy already carries tons of liability.

Talking like I or we don’t care about consequences is disturbing, esp. with the experience you have.

pjmlp · on Aug 19, 2023

Well your responses look like you don't care, as if you had any clue about my experience as well, unless the sightseeing from that high altitude cloud is giving mirages.

Any cybersecurity law that requires citzens votes for adoption, will certainly get my signature.

bayindirh · on Aug 19, 2023

If running static analyzers, testing for memory leaks, running end to end tests which takes days, adding strong bounds checking, and failing fast and with great noise as soon as something does not look right is not caring, you’re right. I don’t care.

All this is done for a code which has no network access, no kernel level work, even doesn’t accept any input after parsing the input model and making sure that it’s well formed with no loose ends.

Imagine what I’d do if it does any kind of authentication or user affecting work.

pjmlp · on Aug 20, 2023

Call me surprised. OK, fair enough.

gautamcgoel · on Aug 19, 2023

During my PhD I wrote a lot of simulation code in Matlab. This code was only ever run by me on my personal machine for the purpose of generating plots for my thesis. How is that security-sensitive?

We all agree that security is important and a lot of code running in security-sensitive contexts is bad. That's very different from saying that all code is security-sensitive.

kaba0 · on Aug 19, 2023

Matlab is memory-safe, though, which is the kind of safety issue we talk about.

defrost · on Aug 19, 2023

> Matlab is memory-safe, though

Who sold you that notion?

Matlab is proprietary glue holding together modules written in Fortran, C, C++, Java that allows the adhoc inclusion of DLLS put together by graduate students on the fly.

In what universe is any of that "memory safe"?

kaba0 · on Aug 19, 2023

AFAIK it used to be written/interpreted in Java (though it uses a JIT compiler nowadays, doesn’t it?), and in that sense the language itself is memory safe. Of course if you link to a dependency written in an unsafe language you can have memory issues, but that is not what people commonly mean by “a memory safe language”.

defrost · on Aug 19, 2023

> Of course if you link to a dependency written in an unsafe language

Like the matrix handling, plotting, and other modules dragged in from the 1990s standard FORTRAN library era.

Don't get me wrong, I have placed a lot of trust in those kinds of numeric libraries since the 1980s - NIST and other national labs code and review to a high standard - but they weren't then and aren't now "memory safe" just very probably and mostly bug free.

Unless they were, say, rewritten in Java .. and that's a whole other big ball of issues .. translated code.

I'm happy to play along and agree that the JIT compiled 'Matlab language' parts of Matlab are memory safe - but to the best of my experience with past versions that's just the glue portions - the reason so many people use Matlab is for the chunks that get glued together.

kaba0 · on Aug 19, 2023

Aren’t many of these Fortran libraries heavily used all throughout almost any kind of application, somewhere deep down their dependency tree?

defrost · on Aug 19, 2023

Absolutely .. and generally written to a very high standard with a lot of checking, rechecking and running forwards and backwards across multiple architectures with extensive technical mailing lists.

They can still have bugs, memory issues, be called with parameters that cause resource blow outs and aren't "memory safe" in the sense that modern managed languages are.

Matlab is exactly as reliable as well written Fortran and C code can be.

kouteiheika · on Aug 18, 2023

> To get that, sometimes you need -ffmast-math, sometimes said undefined behavior, or code patterns which translate to better code for the processor you target.

> Rust doesn't allow any of that, moreover limits me the way I can write my programs, no thanks.

I'm willing to prove you wrong here. Can you give some concrete examples?

For -ffast-math you can most certainly enable it, but AFAIK for now only in a localized fashion through intrinsics, e.g.:

https://doc.rust-lang.org/std/intrinsics/fn.fadd_fast.html

https://doc.rust-lang.org/std/intrinsics/fn.fmul_fast.html

So instead of doing this the dumb way and enabling -ffast-math for the whole program (which most certainly won't need it) you can profile where you're spending the majority of your time, and only do this where it'll matter, and without the possibility of randomly breaking the rest of your numeric code.

Personally I find this to be a vastly better approach.

bayindirh · on Aug 18, 2023

Thanks for the information, I'll look into that. There's another comment talking about an algorithm I have implemented, it should be around in this thread.

> So instead of doing this the dumb way and enabling -ffast-math for the whole program (which most certainly won't need it)...

When you write HPC enabled simulation code, you certainly need it, and this is where I use it.

maximilianburke · on Aug 19, 2023

Is HPC code magically immune to values containing infs or NaNs?

andrepd · on Aug 18, 2023

In what sense does rust not expose the full power of the processor to the user? Can you give a concrete example?

> Just be mindful, and run a couple of valgrind tests on your code, and you're set.

Thousands of severe CVEs every year attest to the effectiveness of that mindset. "Just git gud" is not a meaningful thing to say, as even experienced devs routinely make exploitable (and other) mistakes.

bayindirh · on Aug 18, 2023

Rust probably won’t allow me to implement a couple of lockless parallel algorithms I have implemented in my Ph.D., which are working on small matrices (3K by 3K). These algorithms allow any number of CPUs do the required processing without waiting for each other, and with no possibility of stepping on each other, by design.

In the tests, it became evident that the speed of the code is limited by the memory bandwidth of the system, yet for the processors I work with that limit is also very near to practical performance limit of the FPUs, as well. I could have squeezed a bit more performance by reordering the matrices to abuse prefetcher better, but it was good enough (~30x faster than the naive implementation with better accuracy) and I had no time.

Well, the method I verify the said codebase is here [0]. Also, if BSD guys can write secure code with C, everybody can. I think their buffer overflow error count is still <10 after all these years.

[0]: https://news.ycombinator.com/item?id=31218757

vlovich123 · on Aug 18, 2023

> Rust probably won’t allow me to implement a couple of lockless parallel algorithms I have implemented in my Ph.D., which are working on small matrices (3K by 3K)

That’s a bold claim to make without evidence. Things like that totally seem doable in Rust without a lot of ceremony. You might have to launder some lifetimes through unsafe if you can’t convince the compiler statically, but that’s not that hard.

> Also, if BSD guys can write secure code with C, everybody can. I think their buffer overflow error count is still <10 after all these years.

As was the problem with Linux enthusiasts claiming Linux was safer bedside fewer CVEs, BSD similarly benefits from having almost no attention paid to it + very few core developers so the feature growth rate is minimal. I don’t but the claim that BSD’s qualities extrapolate anywhere rather than a more mundane explanation or that BSD is the exception that proves the role somehow given no other successful project seems to be able to do the same.

bayindirh · on Aug 18, 2023

I would happily give it a try, when I have some time on the side. But that day is not today, unfortunately.

> I don’t but the claim that BSD’s qualities extrapolate anywhere rather than a more mundane explanation or that BSD is the exception that proves the role somehow given no other successful project seems to be able to do the same.

It's mostly about "it's done when it's done" mentality and always putting the discipline before writing code fast. It's a vastly different mentality when compared to modern, conventional software development. This mentality also makes what Debian is Debian.

On the other end of the spectrum, we have the stressed solo developer syndrome, which brings us OpenSSL with its all problems and grievances.

kouteiheika · on Aug 19, 2023

> Rust probably won’t allow me to implement a couple of lockless parallel algorithms I have implemented in my Ph.D.

Why? Can you be more specific? Rust has built-in support for atomics and essentially the same memory model as C++, e.g.:

https://doc.rust-lang.org/std/sync/atomic/struct.AtomicU32.h...

And you can sidestep essentially any requirements of normal Rust using an `UnsafeCell` (e.g. that's how mutexes are implemented, with an `UnsafeCell` holding the data and an atomic).

I've written my share of lockless algorithms and I haven't really noticed any deficiency compared to C++, but then, I'm not an expert in this area, so if there is a deficiency here I'd love to hear about it.

stjohnswarts · on Aug 19, 2023

A lot of people who criticize c++ seem to still think it's 2008 and there aren't the crazy amount of compiler checks/linters/standard practices that have happened since c++11 was released.

I like rust and I dick around with it a bit on some small REST projects I have for home automation but for day-to-day work I stick with c++ until rust support picks up in the industry a lot more than it currently is.

pjmlp · on Aug 19, 2023

Because many of us only know C++ best practices from conference slides, and our own side projects, while we cry in despair when looking at typical corporate code bases.

Why do you think Bjarne Stroustoup main subject on conferences is how to write C++ properly without C style coding, for several years now?

kaba0 · on Aug 19, 2023

What does Rust doesn’t allow? There really are only very few hard limits I know of, e.g. tail call elimination.

Ad absurdum it has quite great inline asm support.

overgard · on Aug 18, 2023

I phrased it poorly, I didn't mean to imply Rust is GC'd. I mean that it's still niche. C++ very much is not niche, which is my point. The other non-niche languages generally are GC'd (java, C#, javascript, etc.)

> The set of reasons to start a new project in C/C++ are few and the list is shrinking by the day.

That'd be true if every project started from scratch with only the language standard library. And yet..... almost any project you start is going to be dependent on a large chunk of code you didn't write, even on greenfield projects.

haberman · on Aug 18, 2023

> That'd be true if every project started from scratch with only the language standard library. And yet..... almost any project you start is going to be dependent on a large chunk of code you didn't write, even on greenfield projects.

I think this is true, and I'd refine my original statement accordingly. My original comment was thinking more from first principles, not as much about pragmatic considerations of ecosystem support.

If we were to disregard the momentum of existing ecosystems, I think C/C++ would be niche choices today: very important for certain, focused use cases, but not meant for the masses. Taking into account the momentum of existing ecosystems however, they still play a large role in many domains.

deterministic · on Aug 21, 2023

New large scale C++ projects are started every single day. There are more than 5 million C++ programmers out there. Compared with that Rust is but a drop in the ocean.

Cthulhu_ · on Aug 18, 2023

> For some reason people here just assume that anything that isn't web development or backend server CRUD stuff is a niche.

I think that's because anything more "low level" (using the phrase freely) quickly becomes highly specialized, whereas web / CRUD development is a dime a dozen.

Source: Am web / CRUD developer. It's a weird one, on the one side I've been building yet another numeric form field with a range validation in the past week, but on the other I can claim I've worked in diverse industries; public transit, energy, consumer investment banking, etc. But in the end it just boils down to querying, showing and updating data, what data? Doesn't matter, it's all just strings and ints with some validation rules in the end.

But there's my problem, I don't know enough about specialized software like CAD or video games or embedded software to even have an interest in heading in that direction, let alone actually being able to find a job opening for it, let alone being able to get the job.

jacquesm · on Aug 18, 2023

For me the key is that I can lay things out in memory exactly the way I want, if necessary to the point where I can fit things in cache entirely when I need the performance and only to break out of the cache when I'm done with the data. This is obviously not always possible but the longer you can keep it up the faster your code runs and the gains you can get like that are often quite unexpected. I spend a lot more time with gprof than with a debugger.

jjoonathan · on Aug 18, 2023

Perf is a niche, here's another: address space lets you talk to hardware.

VM and OS abstractions have been so successful that you can go a whole career without talking directly to hardware, but remember, at the bottom of the abstraction pile something has to resolve down to voltages on wires. Function calls, method names, and JSON blobs don't do that. So what does? What abstraction bridges the gap between ASCII spells and physical voltages?

Address space. I/O ports exist, but mostly for legacy/standard reasons. Address space is the versatile and important bridge out of a VM and into the wider world. It's no mistake that C and C++ let you touch it, and it's no mistake that other languages lock it away. Those are the correct choices for their respective abstraction levels.

packetlost · on Aug 18, 2023

Idk what you're on about, I mmaped a `/dev/uio` from Python this morning. Yeah, I had to add it in a .dts file and rebuild my image, but even slow as shit high level languages like Python let you bang on registers if you really want to.

notacoward · on Aug 18, 2023

That worked because you were on an OS that supported it, using a device with simple enough behavior that things like timing or extra/elided writes didn't matter. It's great when that works, but there are very many environments and devices for which that option won't exist.

packetlost · on Aug 18, 2023

> That worked because you were on an OS that supported it

I mean... yeah? I'd be surprised if most operating systems didn't have a facility for directly accessing memory mapped devices in some capacity. Even Windows does I think. Have any examples of an OS that doesn't support banging directly on memory?

> using a device with simple enough behavior that things like timing or extra/elided writes didn't matter

Yeah, but that's less of a function of the language like GP was referring to and more a property of using a non-RTOS that doesn't play nice with timing sensitive things from user-space. The language itself is not really the issue there, which was my point.

memefrog · on Aug 19, 2023

>I mmaped a `/dev/uio` from Python this morning.

You might want to go and have a look at the implementation of 'mmap' in CPython. Here's a clue: it's in the name 'CPython'.

packetlost · on Aug 19, 2023

There's nothing special about C that makes it possible to use mmap. Just because CPython happens to be implemented in C doesn't mean anything in this context. I could do the same thing in Go or Hare, neither of which have any C in them and be able to do the same thing.

kaba0 · on Aug 19, 2023

As it is for an OS written in C, sure it does have some helper C wrappers.

It is still “syscall”s at the end of the day, and many programming languages can output assembly just fine.

SubjectToChange · on Aug 18, 2023

Perhaps, but C and C++ assume flat address spaces and modern hardware includes many programmable devices with their own device memory, e.g. GPUs. Naturally this discontinuity causes a great deal of pain and many schemes have been developed to bridge this gap such as USM (Unified Shared Memory).

Personally I would like to see a native language which attempts to acknowledge and work with disjoint and/or "far away" address spaces. However the daunting complexity of such a feature would likely exclude it from any portable programming language.

vvanders · on Aug 18, 2023

Those disjoint memory addresses can be an absolute pain to deal with, ask anyone who had to spend time dragging performance out of the PS3 despite it being faster on paper. UMA/USM can also bring it's own set of issues when you have pathological access patterns that collide with normal system memory utilization.

For what its worth UMA/USM wasn't build to bridge a gap but rather to offer greater flexibility in resource utilization for embedded platforms, that's been moving upstream(along with tiling GPUs) over the years. With UMA you can page in other data on a use-case basis which is why they were relatively popular in phones, if you don't have a bunch of textures loaded on the GPU you can give that space back to other programs. Although come to think of it we used to stream audio data from GPU memory on certain consoles that didn't have large discrete system memory(the bus connecting System <-> GPU memory had some pretty harsh restrictions so you had to limit it to non-bursty, low throughput data which audio/music fit well into).

memefrog · on Aug 19, 2023

There is a technical report for C (from the C standards committee) called 'Embedded C' which extends C with 'named address space' storage qualifiers. So you can do 'float _Gpu myarray[1<<14];' As far as I know, nobody uses it.

wrs · on Aug 18, 2023

Rust is making advances here (look at “embedded Rust” efforts). I am curious since I haven’t written kernel code since C went sideways: how easy is it to write a driver that has to manipulate registers with arbitrary behavior at arbitrary addresses with modern C compilers and avoid all undefined behavior? I seem to recall Linus has a rant on this.

ianlevesque · on Aug 18, 2023

Thanks for giving an actual example of such optimizations. In my personal experience my C++ (and Rust) code was often outperformed by the JVM's optimizations so I've found it hard to relate to the tradeoffs C++ developers assume are obvious to the rest of us.

lbrandy · on Aug 18, 2023

I struggle to resonate with what you are saying, as my experience is the opposite. I'm curious where this discrepancy is rooted. Reckless hypothesis: are you working on majority latency or majority throughput sensitive systems?

I have seen so, so, so many examples of systems where latencies, including and especially tail latencies, end up mattering substantially and where java becomes a major liability.

In my experience, actually, carefully controlling things like p99 latency is actually the most important reason C++ is preferred rather than the more vaguely specified "performance".

ianlevesque · on Aug 18, 2023

The specific example that comes to mind was translating a Java application doing similarity search on a large dataset into fairly naive Rust doing the same. Throughput I guess. It may be possible to optimize Rust to get there but it’s also possible to (and in this case did) end up with less understandable code that runs at 30% the speed.

Edit: And probably for that specific example it’d be best to go all the way to some optimized library like FAISS, so maybe C++ still wins?

grumpyprole · on Aug 18, 2023

I've seen C++ systems that are considerably slower than equivalent Java systems, despite the lack of stack allocation and boxing in Java. It's mostly throughput, malloc is slow, the C++ smart pointers cause memory barriers and the heap fragments. Memory management for complex applications is hard and the GC often gives better results.

vvanders · on Aug 18, 2023

I've seen so may flat profiles due to shared_ptr. Rust has done a lot of things right but one thing it really did well was putting a decent amount of friction into std::sync::Arc<T>(and offering std::rc::Rc<T> when you don't want atomics!) vs &mut T or even Box<T>. Everyone reaches for shared_ptr when 99% of the time unique_ptr is the correct option.

memefrog · on Aug 19, 2023

I see comments like yours more than I see anyone suggesting actually using shared_ptr widely. In my experience, most people (that use smart pointers - many do not) prefer to use unique_ptr where they can.

vvanders · on Aug 19, 2023

I don't think anyone is suggesting shared_ptr explicitly, more that if you're coming from Python/Java/etc it's a similar to a GC memory model and a bit more familiar if that's where your experience is. I've observed in a number of codebases unless you're setting a standard of unique_ptr by default it's fairly easy for shared_ptr to become widely used.

FWIW I consider shared_ptr a bit of a code-smell that you don't have your ownership model well understood, there are cases to use it but 90% of the time you can eliminate it with a more explicit design(and avoid chances of leaks + penalties from atomics).

memefrog · on Aug 19, 2023

Bear in mind that my ultimate perspective is that you shouldn't use smart pointers (or C++) at all.

But even if you think they have some value, it isn't a flaw in a language if it's not immediately obvious how to write code in it for people that are new to it. If you're coming from Python or Java, then learn how to write C++. There are probably as many books on C++ as on any other language out there.

deterministic · on Aug 21, 2023

Badly written C++ will be as slow as well written Java. Absolutely. But there is no way any Java code will perform better than well optimised C++. High performance C++ use custom allocators to completely avoid using malloc (for example).

Gibbon1 · on Aug 18, 2023

From my experience with embedded coding you are correct. Most stuff lives and dies enclosed in a single call chain and isn't subject to spooky action at a distance. And stuff that is I often firewall it behind a well tested API.

kllrnohj · on Aug 18, 2023

It would be fascinating if you could give any such comparison examples. The only time I've seen JVM come anywhere close to C++ in normal usage is if the C++ code was written like it was Java - that is, lots of shared_ptr heap allocations for nearly everything. Or perhaps you're one of the rare few that write Java code like it was C instead? You can definitely get something fast like that, but it seems all too rare.

deterministic · on Aug 21, 2023

There is no universe where well written and optimised C++ code will be out-performed by Java code. You might get similar performance in Java if you create zero objects (except during initialisation) and then manually manage the usage of objects at runtime. However only if the matching C++ code is written naively.

overgard · on Aug 18, 2023

I think the reason to use C/C++ over java has less to do with the various optimizations and more to do with control over memory layout (and thus at least indirect control over cache usage and so on). Plus you remove a lot of "noise" in terms of performance (GC hiccups, JIT translation, VM load time, etc.).

grumpyprole · on Aug 18, 2023

+1. Part of the problem is that x86 assembly is hardly programming the metal anymore. The performance characteristics of processors has changed over the years also, compilers can be more up-to-date than humans.

overgard · on Aug 18, 2023

x86 assembly doesn't represent the actual opcodes the CPU executes anymore, but it's still the low level "API" we have to the CPU. Even if assembly isn't programming to the metal, it's definitely more to the metal than C, and C is more to the metal than Java, etc. Metalness is a gradient

SubjectToChange · on Aug 18, 2023

Metalness is a gradient

It would be better to say that "Metalness" is a sort of "Feature Set". IMO, most programmers would tend to agree that C++ is far closer to Java than C is, yet C++ is every bit as low level as C is. Indeed, even a managed language like C# supports raw pointers and even inline assembly if one is willing to get their hands dirty.

kaba0 · on Aug 19, 2023

There is expressivity and abstraction-ability which are language features, and there is “levelness” or metalness as is it being referred to.

Low and high level languages have two definitions I have seen, one is more objective, but utterly useless: only assemblies are low-level, everything else is high. The other one is more about what can be controlled in a given language’s idiomatic usage, I believe that’s what most people actually refer to intuitively - e.g. C#/Java have ways to manipulate pointers, but you would generally not say that.

In that regard, C, C++, Rust are quite close to the metal with the latter two actually being even closer as they have native control over SIMD datatypes as well, while C doesn’t have that. Note, this is only a partial ordering, as e.g. C#/Java also have control over vector instructions, yet I wouldn’t go around claiming them closer to the metal as C.

Going back to expressivity, C++ is very expressive, while C is very inexpressive, but that is a completely orthogonal axis.

grumpyprole · on Aug 18, 2023

Lol, I like the word "metalness".

mcguire · on Aug 18, 2023

In the 1970s, the alternative to C was assembly. In fact, I can remember hearing stories of the fights needed to pry every-day, normal developers---not systems programmers, not power users---off of assembly to a "higher level language".

It wasn't until the 90s that it became clear that C was not appropriate for applications work (and this was in the era of 4-16MB machines), although that took a long time to sink in.

Thiez · on Aug 18, 2023

In the 1960s they already had languages such as Lisp, Fortran, Algol, Basic... Even Pascal is two years older than C, and ML also came out around that time.

The statement "the alternative to C was assembly" is simply incorrect.

jacquesm · on Aug 18, 2023

For practical purposes - involving real world constraints in memory size, cpu power and storage - C was the tool of choice, it ran on just about everything, allowed you to get half decent performance without having to resort to assembler and was extremely memory efficient compared to other languages. Compilers were affordable (Turbo C for instance, but also many others such as Mark Williams C) even for young people like me (young then...) and there was enough documentation to get you off the ground. Other languages may have been available (besides the ubiquitous BASIC, and some niche languages such as FORTH) but they were as a rule either academic, highly proprietary and very, very expensive.

So that left C (before cfront was introduced). And we ran with it, everybody around was using C, there wasn't the enormous choice in languages that you have today, you either programmed in C or you were working in assembler for serious and time-critical work.

pjmlp · on Aug 19, 2023

Modula-2 and Pascal dialects compilers were also quite affordable, many of which being sold by companies that were selling the same C compilers, there were even product packages to get all languages in one go.

Easily proven by looking into computing magazine archives.

So lets not distorce history.

jacquesm · on Aug 19, 2023

I'm not 'distorting history', I'm just relating what was practical. Yes, there was Modula-2 and yes there was Pascal. But both felt extremely academic and restricted. Pascal still lives on today in the form of Delphi, which has to be a record for being in continuous use as a complete environment. I even paid for both Pascal and Modula-2 on the BBC Micro but performance was just horrible compared to the BASIC that came with that machine in spite of being compiled, compilation took a long time, even for simple programs. Wirth's books were enlightening though and I took a lot of lessons and algorithms from them and ported them to C (some of those I'm still using today!).

pjmlp · on Aug 19, 2023

Yes you definitly are, given how Modula-2 and Pascal were widely adopted across the European continent, specially in demoscene circles in the PC and Amiga ecosystems.

As if the BBC Micro was an example of the industry adoption at large during those days.

Any Byte, Computer Shopper, DDJ, The C Users Journal (later The C/C++ Users Journal), Crash, Amiga Format, Input, PC Techniques, You Computer, Your Sinclair,... from 1980 - 1995 thereabouts, shows a different picture in article contents and ad sections, in terms of language adoption and available set of compilers being sold.

mcguire · on Aug 19, 2023

One of my favorite quotes comes from DDJ in the late 80s, from a letter to the editor. Something like:

"Object oriented programming is just functions with persistent state and multiple entry points, and we all learned how bad that was in the 70s."

jacquesm · on Aug 19, 2023

I'm not sure why this upsets you as much as it does. For me the 'common' languages in use were various flavors of BASIC, COBOL, various flavors of Assembly and C in that order and those were doing a lot of work compared to the rest of the large number of languages that was technically available. I personally liked and worked with many more but they weren't exactly in mainstream use even though you could get them if you wanted to.

It's a bit like me saying that today Erlang is widely adopted. Yes it is, in some circles and I absolutely love it. But even though it has some adoption the big heavy lifters today are Python, C++, C and Java. Maybe C# on account of MS and probably Javascript should rate a mention. That doesn't mean that the other 2500 or so computer programming languages do not exist or do not have relevance. But it is a realistic reflection of market share. At the computer store where I occasionally worked you could see this reflected in the requests for boxed copies of compilers and interpreters for various languages.

FoxPRO + Delphi became a very powerful combination for SME administrative systems (and before that DBIII) and once Apple launched Objective-C that got some significant market share on their platform too. But any kind of commercially developed software outside of those niches was likely to be built on the list above.

I've taken a great interest in various programming languages and learned a lot of them to the point that I could write code in them to compare them, so I'm well aware all of these (and many more!) existed. The machines I worked with were: Dragon32, BBC Micro, Acorn Atom, Apple II, C64 (though, mostly from a hardware point of view, to fix them), Atari ST (lots of work on that one), IBM PC, some mainframes (notably: IBM 4381, Sperry 1100/90) and probably others that I don't remember right now.

Some of those only had proprietary languages and compilers on them, and on some things were more free. If there was a language on any of those systems that made it large enough to rate an article in the magazines you mentioned I probably played with it. But across all of the people that I worked with at the time I've met exactly one person the programmed in Pascal and they abandoned the project because it was dog slow (it got ported to assembly...) and Modula-2 or Oberon I've never come across in the wild other than with my own experiments.

BASIC, COBOL, Assembly and C were the languages that I recall being in mainstream use. That does not invalidate your experience, it may simply not match mine.

pjmlp · on Aug 19, 2023

Yeah, my experience was indeed something else, regarding the software industry in Iberian Peninsula, Germany, France and Switzerland from 1970's until modern times.

What triggers me is the usual kind of message that C wiped everything away, being the first of its kind, when on other realities, it only started to matter when UNIX tooling became part of the picture.

On my bubble C became a thing, when Windows 3.0 went mainstream.

Coding in C on MS-DOS was mostly preparing exercises for Xenix programming classes, a single tower that students had to take turns on, so we need to prepare everything in advance on Turbo C 2.0.

On Amiga it was mostly Assembly and AMOS, on the demoscene community I was part of.

jacquesm · on Aug 19, 2023

In what year did you start programming professionally and in what languages was that done and on what kind of machines?

pjmlp · on Aug 20, 2023

Ah, trying a trick question?

In 1992, using a mix of Turbo Pascal, DBase III+ and Clipper on MS-DOS computers, and Novell Netware.

jacquesm · on Aug 20, 2023

> Ah, trying a trick question?

No, I was just curious. Why would you even think I was trying to trick you? And into what?

You normally make a ton of sense but in this thread I can't follow you and I sense a ton of emotion and projection that are entirely out of character for you.

marcosdumay · on Aug 18, 2023

On practice those were either proprietary or required beefy machines that the people writing C couldn't get.

On the limited environment where C got created, it was the only option. And everybody suddenly adopted that limited environment, because it was cheap. And then it improved until you could use any of those other languages, but at that point everybody was using C already.

StableAlkyne · on Aug 18, 2023

Fortran predates C by over a decade, and dozens of compilers existed for it by the mid-60s. Much of that legacy code is still in use to this day in scientific computing. One example: John Pople won his Nobel prize for work he did (and implemented in Fortran) in computational chemistry - the prize was delayed until '98 but he did the work in the 60s. The software he commercialized it into, Gaussian, still gets releases to this day and remains one of the most popular computational chemistry packages.

It's really dependent on which field you're in. Not all scientific computing requires a beefy computer, but for a very long time it (and I guess LISP) dominated scientific computing. That said, I think it's a very good point to bring up the network effect of using C - if I need to hire a developer in 1985, it's probably easier to find someone with industry (not academic) experience who knows C than it is to find someone who knows Fortran.

I do kinda prefer Fortran to C though, it's so much cleaner to do math in. Maybe somewhere there's an alternate universe where IBM invented C and Bell invented Fortran to win the popularity war.

jacquesm · on Aug 18, 2023

I tried to get access to FORTRAN and it was just way too expensive and required machines that I would not have been able to get close to. C ran on anything from Atari STs, IBM PCs, Amiga's and any other machine that an ordinary person could get their hands on.

The other mainstream language at the time was BASIC, comparable to the way PHP is viewed today by many.

And with the advent of the 32 bit x86 era GCC and djgpp as well as early Linux systems really unlocked a lot of power. Before then you'd have to have access to a VAX or a fancy workstation to get that kind of performance out of PC hardware. It's funny how long it took for the 386 to become fully operational, many years after it was launched you had to jump through all kinds of hoops to be able to use the system properly, whereas on a 68K based system costing a tiny fraction that was considered completely normal.

StableAlkyne · on Aug 18, 2023

My perspective is a bit biased by scientific computing, I do more of that than enterprise stuff (and Python has been fine for personal use). It's cool to see the perspective of someone who was around for the early stages of it though.

How did people see Fortran back then - nowadays it's seen as outdated but fast, but was it seen as interesting, and what drove you to seek it out?

Other side question if it's okay, I keep seeing references to VAXen around historical documents and usenet posts from the 80s and 90s, what made them special compared to other hardware you had back then?

jacquesm · on Aug 18, 2023

My one experience with FORTRAN was when working for a big dutch architect who made spaceframes, I built their cad system and a colleague built the finite element analysis module based on an existing library. We agreed on a common format and happily exchanged files with coordinates, wall thicknesses and information about the materials the structure was made out of and forces in one direction and a file with displacements in the other. It worked like a charm (combined C and FORTRAN). I thought it was quite readable, it felt a bit archaic but on the whole not more archaic than COBOL which I had also worked with.

The reason that library existed in FORTRAN was that it had a native 'vector' type and allowed for decent optimization on the proper hardware (ie: multiply and accumulate) which we did not have. But the library had been validated and was approved for civil engineering purposes, porting it over would have been a ton of work and would not have served any purpose, besides it would have required recertification.

As for VAXen: A VAX 11/780 is a 32 bit minicomputer, something the size of a large fridge (though later there were also smaller ones and today you could probably stick one on a business card). It had a - for the time - a relatively large amount of memory, and was a timesharing system, in other words, multiple people used the same computer via terminals.

They weren't special per se other than that a whole raft of programmers from those days cut their teeth on them either because they came across them in universities or because they worked with them professionally. They were 'affordable' in the sense that you did not need to be a multinational or a bank in order to buy one, think a few hundred thousand $ (which was still quite a fortune back then).

I had occasional access to one through the uni account of a friend, but never did any serious work with them. The first powerful machine I got my hands on was the Atari ST, which had a 68K chip in it and allowed the connection of a hard drive. Together those two things really boosted my capabilities, suddenly I had access to a whole 512K of RAM (later 1M) and some serious compute. Compared to a time shared VAX it was much better, though the VAX had more raw power.

Concurrent to that I programmed on mainframes for a bank for about a year or so, as well as on the BBC Micro computer (6502 based) and the Dragon 32 (a UK based Color Computer clone).

Fun times. Computing back then was both more and less accessible than it is today. More because the machines were so much simpler, less because you didn't have the internet at your disposal to look stuff up.

mcguire · on Aug 18, 2023

There was a freely available FORTRAN on the Fred Fish Amiga disks in the late 80s. It came with a weird license---free for everything, but it couldn't be used for military or military-related things.

The 386 was hampered by backwards compatibility with the weird memory architecture for the 286 and 8086, IIRC. 68Ks were just soooo much easier.

jacquesm · on Aug 18, 2023

Oh cool, I never knew that. 386 flat mode was easy enough, but only djgpp unlocked that properly, everything else was a hodgepodge of the weirdest memory models.

I remember making a bootloader for my own OS using TurboC that somehow had to fish four files from a minix based filesystem and dump them in memory at certain addresses and then switch to flat mode and start executing the kernel which would then initialize the system properly. That was some really weird mixed mode voodoo to switch back and forth between protected mode and real mode to be able to both use BIOS calls to fetch blocks from disk and to be able to park the data in the right spots in contiguous memory.

Very tricky stuff to get right, it took me forever before I had the first indication that something was executing after the inevitable hail Mary jump to the first instruction in the loaded kernel. But I got it to work. I actually had a guitar footpedal hooked to the reset button because it took me too many dives under the desk to recover from a crash. That was a very slow development cycle without any chance of debugging. At some point I had 8 leds hooked up to the printer port to use some 'out' opcodes to tell me where I had gotten to in the code. Poor mans emulator :)

mcguire · on Aug 18, 2023

FORTRAN didn't officially grow a program stack until FORTRAN 90 (?), although I'm told some of the earlier FORTRAN 77-era compilers supported it.

I remember meeting people who could not wrap their heads around the idea that function parameters weren't all passed by reference. The idea of recursion would have caused them to detonate.

grumpyprole · on Aug 18, 2023

IIRC, Mac OS Classic was written in Pascal, as were other operating systems. C just won the popularity contest.

marcosdumay · on Aug 18, 2023

That was about a decade after C had already won.

grumpyprole · on Aug 18, 2023

The Xerox Alto was early 70's and that GUI OS was written in Pascal. I always thought Pascal was better designed and less bizarre than C. Null terminated strings were particularly a bad idea.

mcguire · on Aug 18, 2023

"Why Pascal is Not My Favorite Programming Language", Brian W. Kernighan, April 2, 1981:

https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pas...

One of the major problems, as I recall, was that Wirth's basic Pascal definition was very limited---the max 256 character strings, for example. So the makers of "production" Pascal compilers added extensions to do all of the things that you would want to do and unfortunately those extensions were all incompatible.

pjmlp · on Aug 19, 2023

A famous rant ignoring Pascal dialects, while C still lacked a standard and was a bunch of dialects all over the place.

mcguire · on Aug 19, 2023

I've never seen a C dialect, other than not supporting bit fields. The language was usually the same although the libraries were all over the place.

pjmlp · on Aug 20, 2023

Then you haven't used any C compilers during the CP/M, 8-bit and 16-bit days.

Here are two, RatC, Small-C.

Additionally what do you think GCC C is, in regards to K&R C and ANSI C?

marcosdumay · on Aug 18, 2023

Oh, by 73. Impressive. I didn't know that.

Still, that was a much more powerful machine than the ones people wrote C for. And when the cheap segment of those "fridge computers" became powerful enough to run whatever you wanted to put on it, people started using small workstations. And when those workstations became powerful enough, we got PCs.

It's only when the PCs got powerful enough that we could start to ignore the compiler demands and go with whatever language fit us better. (The later reductions all were based on cross-compiling, so they don't matter here.)

pjmlp · on Aug 19, 2023

Go check the capabilities of systems written in high level languages like ESPOL and NEWP in 1961.

Let not say that a 1961 system is more powerful than a PDP-11.

pjmlp · on Aug 19, 2023

Mesa not Pascal.

grumpyprole · on Aug 19, 2023

Yes apologies, I misremembered. Mesa is definitely a decendant of Pascal though, more like Modula-2. Very ahead of its time.

pjmlp · on Aug 19, 2023

Sorry for being a bit pedantic, it predates Modula-2, Niklaus Wirth got in contact with it during his first sabbatical at Xerox PARC, after returning to ETHZ Modula was born, and shortly thereafter Modula-2 and Lillth Workstation, with the OS written in Modula-2.

Several years later, a similar encounter with Mesa's evolution, Mesa/Cedar, gave birth to Oberon.

As for being ahead of its time, very much indeed.

Many people relate to Smalltalk and Interlisp-D, and are unaware of strong typed systems programming at Xerox, with a tooling that only decades cater came to be.

Mesa authors were quite adamant that in order to succeed, Mesa had to support comparable tooling to Smalltalk and Interlisp-D environments.

notacoward · on Aug 18, 2023

The fact that those things existed does not refute the GP's point. Many companies well into the 80s at least had programmers and codebases that had to be weaned away from assembly, despite the fact that higher-level alternatives had existed for a while. That's just empirical fact, regardless of the reasons. I myself started programming in 68K assembly on the original Mac because I couldn't afford a compiler and interpreted languages (there were at least Lisp and Forth) couldn't be used to write things like desk accessories. Remember, gcc wasn't always something one could just assume for any given platform. The original statement is correct; yours is not, because of limited perspective.

mcguire · on Aug 18, 2023

How many Lisp Machines did Texas Instruments sell? (I mean outside of that one lab at UT Austin.) :-)

I'm talking about applications developers who came out of the small mainframe/minicomputer world of the 70s and into the workstation/microcomputer world of the 80s. They started with assembly, and prying them off of it was as hard as convincing engineers to use FORTRAN. Convincing those application developers to use a garbage collected language, Java, was hard in the 90s.

pjmlp · on Aug 18, 2023

Only inside Bell Labs, the world outside was enjoying high level systems programming languages since 1958 with the introduction of JOVIAL.

mcguire · on Aug 18, 2023

Where by "the world outside" you mean DoD contractors?

pjmlp · on Aug 19, 2023

Everyone that was using JOVIAL, ESPOL, PL/I, PL/S, PL/X, PL.8, PL/M, Mesa, ALGOL 60 and 68 based languages, Modula-2, Solo Pascal, Concurrent Pascal, Lisp, BASIC compilers,....

Burroughs, IBM, DEC, Xerox, ETHZ, MIT, Intel,....

overgard · on Aug 18, 2023

The sheer amount of software that you're likely using right now that's written in C would seem to contradict your claim.

SubjectToChange · on Aug 18, 2023

The market share of C in desktop, server, and other "enterprise" applications has drastically dropped since the 90s. Nowadays it's quite rare to see C chosen for new projects where it isn't required. In fact, despite of how pervasive it is, a huge amount of C code cannot be built on a pure C toolchain, i.e. C is essentially like Fortran.

mcguire · on Aug 18, 2023

It still hasn't entirely sunk in. :-)

Plus, legacy code.

daymanstep · on Aug 18, 2023

Why is C inappropriate for applications work?

grumpyprole · on Aug 18, 2023

The technical arguments should be obvious, e.g. spending ones complexity budget on manual memory management and avoiding footguns. But one amusing anecdote is that the open source GNOME project founders were so traumatized by their experience building complex GUI apps in C (already a dubious idea 20 years ago), that they started building a C# compiler and the Mono project was born.

seabass-labrax · on Aug 18, 2023

In 2023, you'll be hard pushed to find a GNOME application actually using C# and Mono. The vast majority of GNOME components are written in C, with a large number of them written in Vala and some in Rust and JavaScript.

grumpyprole · on Aug 18, 2023

It was indeed a big yak to shave. Cloning a Microsoft technology probably wasn't a good idea for OSS adoption either.

seabass-labrax · on Aug 19, 2023

I'm sure; there are still threads that come up from time to time in GNU circles about whether Mono is actually FOSS (TLDR: it is), in which those of a more paranoid personality allege an EEE attempt by Microsoft. If this were the case, Microsoft clearly did not succeed even without Java as a legitimate competitor! Interestingly, Mono has had most success from its use in the Unity game engine.

memefrog · on Aug 19, 2023

You don't spend "complexity budget" on manual memory management. Manual memory management is much simpler and easier than trying to manage complex arrangements where things are being allocated and deallocated all over the place, all the time.

Gnome programs are largely still written in C, but it isn't actually C. Not really. It's glib, which is almost a whole new language written on top of C. It's its own little world and it's not a good world. For example, it calls abort() whenever it fails to allocate memory.

grumpyprole · on Aug 19, 2023

> You don't spend "complexity budget" on manual memory management.

Of course you do, especially in applications where there is no benefit to be gained versus using a GC. This is why Java was such a huge success, despite offering very little else over C++ other than GC (I think OCaml is a much better example of a GC language). Consider that an entire book has been written on such details as move semantics. For most GUI apps, a GC or other automatic memory management has proved fine.

> Gnome programs are largely still written in C, but it isn't actually C.

It's still C whatever libraries are used. It's still manual memory management with footguns. This is why they created "Vala".

memefrog · on Aug 19, 2023

>Of course you do, especially in applications where there is no benefit to be gained versus using a GC.

I am afraid you misunderstood my comment. Of course C isn't good for writing code that calls 'malloc' millions of times per second. That's just bad code, which you can write in any language. You should not be dynamically allocating memory all over the place. If you have sane allocation strategies, then having to 'manually' manage them is completely fine.

>This is why Java was such a huge success, despite offering very little else over C++ other than GC (I think OCaml is a much better example of a GC language).

Java is the quintessential example of a language that was made popular through marketing and hype.

Getting reasonable performance out of the JVM requires far greater expertise than doing the same in C. The JVM's GC is insanely complicated and has a million different knobs which can drastically affect performance. Or they can apparently do nothing. Until you turn other knobs...

>Consider that an entire book has been written on such details as move semantics.

There are no 'move semantics' in C. Yet another way in which it is superior to C++. C++, which is a horrible language designed by nerds who enjoy standardese more than they love their own children, has 'move semantics' bolted on with an arcane 'rvalue references' mechanism that gets more complicated in every version of the standard.

>For most GUI apps, a GC or other automatic memory management has proved fine.

Retained-mode UI is something that is only reasonable with automatic memory management. The way that lifetimes of objects works in those APIs really does require a garbage collector because it's all so dynamic.

The immediate-mode UI approach, which has many other benefits as well, works perfectly fine with manual memory management.

>It's still C whatever libraries are used. It's still manual memory management with footguns. This is why they created "Vala".

Objective-C is more like C than "glib" C is. As the suckless guys said:

"glib - implements C++ STL on top of C (because C++ sucks so much, let's reinvent it!), adding lots of useless data types for "portability" and "readability" reasons. even worse, it is not possible to write robust applications using glib, since it aborts in out-of-memory situations. glib usage is required to write gtk+ and gnome applications, but is also used when common functionality is needed (e.g. hashlists, base64 decoder, etc). it is not suited at all for static linking due to its huge size and the authors explicitly state that "static linking is not supported"."

mcguire · on Aug 19, 2023

Immediate mode UI? Not having to allocate every where? Memory allocation that tells you when you run out of memory? Static linking?

What systems are you thinking about?

TillE · on Aug 18, 2023

Try writing a generic, reusable data structure in C. It's agony.

bluetomcat · on Aug 18, 2023

You don’t need to write generic and reusable data structures in C. You write a data structure suited for the problem at hand, which often means that it’s going to be simpler and more performant because of the known constraints around it.

grumpyprole · on Aug 18, 2023

It could be more performant because of the known constraints around it, or it could be an ad-hoc, informally-specified, bug-ridden, slow implementation of half of some data structure. At least with a generic and resuable data structure you have a known reliable building block. Again, performance over safety.

commonlisp94 · on Aug 18, 2023

> It could be more performant

No, it almost always is. The designers of a generic library can't anticipate the use case, so can't make appropriate tradeoffs.

For example, compare `std::unordered_map` to any well written C hash table. The vast majority of hash tables will never have individual items removed from them, but a significant amount of complexity and performance is lost to this feature.

SubjectToChange · on Aug 18, 2023

No, it almost always is.

A library author can spend ridiculous amounts of time refining and optimizing their implementations, far more than any application programmer could afford or justify.

The designers of a generic library can't anticipate the use case, so can't make appropriate tradeoffs.

This is definitely not true. Take C++ for instance, not only is it possible to specialize generic code for particular types, but it's absolutely routine to do so. Furthermore, with all sorts of C++ template features (type traits, SFINAE, CRTP, Concepts, etc) even user-defined types can be specialized, in fact it's possible to provide users with all sorts of dials and knobs to customize the behavior of generic code for their own use case. This functionality is not just a quality-of-life improvement for library users, it has profound implications for performance portability.

For example, compare `std::unordered_map` to any well written C hash table.

std::unordered_map is a strawman. There are a plethora of generic C++ hash tables which would match, if not soundly outperform, their C counterpart. Also, even if we blindly accepted your claim, then how do you explain qsort often being beaten by std::sort or printf and its variants being crushed by libfmt? What about the fact that Eigen is a better linear algebra library than any alternative written in C?

commonlisp94 · on Aug 18, 2023

> A library author can spend ridiculous amounts of time

That's true. But simply having knowledge of the goal and a few simplifying assumptions can beat all the optimization in the world. In other words, a polished sub-optimal approach isn't as good as just having a better approach. `std::unordered_map` is heavily optimized, but can't make any tradeoffs because it's a general tool.

> plethora of generic C++ hash tables which would match, if not soundly outperform, their C counterpart.

Post one.

> not only is it possible to specialize generic code for particular types, but it's absolutely routine to do so.

Yep, it can do type base specialization, not application based specialization though. That requires a programmer.

> how do you explain qsort often being beaten by std::sort

a standard library C function often cannot be inlined to remove the comparison function pointer call, whereas std::sort trivially can.

If you wrote one yourself for a particular problem, it would not have this issue. This is actually a great example of where C excels because the choice of sorting algorithm so much depends on the kind of data you are sorting.

Let me be clear about my claim: tailor made solutions to each problem will almost always be faster than generic solutions. Do you really disagree with that? If you want to argue that maybe it's not productive to work that way, that's a different argument.

protomolecule · on Aug 19, 2023

"If you wrote one yourself for a particular problem, it would not have this issue. This is actually a great example of where C excels because the choice of sorting algorithm so much depends on the kind of data you are sorting."

Did I get it right that you argue for re-implementation in every of your apps of some sorting algorithm which is most fit to your data?

Why not use instead a generic library implementing a particular sorting algorithm parameterized by the data type and maybe by some policies specifying minor variations of the algorithm?

"Let me be clear about my claim: tailor made solutions to each problem will almost always be faster than generic solutions. Do you really disagree with that?"

I do. I don't think even you invent a special sorting algorithm for each of your applications that need sorting.

c-cube · on Aug 19, 2023

> post one

Abseil or folly both have optimized hashtables, I believe. Rust's standard HashMap follows the same design. It involves SIMD to look for a bucket whose hash matches the query's so redoing it in C every time you need a hash table will be quite impractical.

SubjectToChange · on Aug 18, 2023

Let me be clear about my claim: tailor made solutions to each problem will almost always be faster than generic solutions. Do you really disagree with that?

I disagree with it in the sense that I disagree with the statement "A human will always be able to write the same or better assembly than a C compiler, because humans can learn the compiler's tricks and make optimizations which the compiler is not allowed to make." It's a true statement, but it's so detached and irrelevant that it hardly matters.

Generic code has proven itself time and time again, even Go caved in and supported it.

commonlisp94 · on Aug 19, 2023

The assembly example isn't a good comparison. We have compilers that can generate a lot of assembly tricks most programmers wouldn't write. We don't have a way compiler that can analyze the logical constraints of a programming problem and simplify the data structures in the library.

> Generic code has proven itself time and time again, even Go caved in and supported it.

I'm not saying anything against the language feature generics. There is plenty of use for them even in a self contained code base.

dureuill · on Aug 20, 2023

Getting generic data structures that are more efficient than his specialized C data structures is exactly what happened to Bryan Cantrill when he ported a carefully optimized C program to naive Rust.

> Yes, you read that correctly: my naive Rust was ~32% faster than my carefully implemented C.[0]

> As a result, this code spends all of its time constantly updating an efficient data structure to be able to make this decision. For the C version, this is a binary search tree (an AVL tree), but Rust (interestingly) doesn’t offer a binary search tree — and it is instead implemented with a BTreeSet, which implements a B-tree. B-trees are common when dealing with on-disk state, where the cost of loading a node contained in a disk block is much, much less than the cost of searching that node for a desired datum, but they are less common as a replacement for an in-memory BST[1]

> So, where does all of this leave us? Certainly, Rust’s foundational data structures perform very well. Indeed, it might be tempting to conclude that, because a significant fraction of the delta here is the difference in data structures (i.e., BST vs. B-tree), the difference in language (i.e., C vs. Rust) doesn’t matter at all.[1]

> Implementing a B-tree this way, however, would be a mess. The value of a B-tree is in the contiguity of nodes — that is, it is the allocation that is a core part of the win of the data structure. I’m sure it isn’t impossible to implement an intrusive B-tree in C, but it would require so much more caller cooperation (and therefore a more complicated and more error-prone interface) that I do imagine that it would have you questioning life choices quite a bit along the way. (After all, a B-tree is a win — but it’s a constant-time win.)[1]

> All of this adds up to the existential win of Rust: powerful abstractions without sacrificing performance.[1]

[0]: http://dtrace.org/blogs/bmc/2018/09/18/falling-in-love-with-...

[1]: http://dtrace.org/blogs/bmc/2018/09/28/the-relative-performa...

memefrog · on Aug 19, 2023

Yes a library author can spend a lot of time refining and optimising their generic data structure, but can never escape that it is generic.

No amount of optimisation will make a hash table designed for items to be removed competitive with one where items do not need to be removed.

>Take C++ for instance, not only is it possible to specialize generic code for particular types, but it's absolutely routine to do so.

So it's not a generic data structure, then. When you specialise a template, you essentially write a concrete data structure for a particular type. Rather than writing a big generic data structure that's inefficient then specialise it to the particular type, it is much easier just to write that specialised data structure in the first place.

>std::unordered_map is a strawman. There are a plethora of generic C++ hash tables which would match, if not soundly outperform, their C counterpart.

How is it a strawman? It's in the standard library.

>Also, even if we blindly accepted your claim, then how do you explain qsort often being beaten by std::sort or printf and its variants being crushed by libfmt?

printf is on the order of 50 years old. libfmt as written about 5 minutes ago. Do you take into account in your comparison the many more years in which printf has been useful? Do you take into account the amount of time it takes printf to compile vs a huge C++ library like libfmt?

Do you take into account all the code that has been slowed down by C++ programmers writing bad code and assuming a sufficiently smart compiler will inline everything for them? Do you take into account all the horrifically slow iostreams code out there?

qsort and std::sort do completely different things. Comparing them is absurd. qsort takes the size and comparison operator at runtime. std::sort requires them to be specified at compile times. I frequently use qsort in a way that you simply could not use std::sort, because those things are runtime-variable.

The proper comparison to std::sort is the implementation of a sorting algorithm written in C, specialised to the code it was written to work with. Then you can debate 'is it worth using this for the minor performance gain' etc. But comparing it to qsort is inane and demonstrates you don't even know what the two functions do.

protomolecule · on Aug 19, 2023

"So it's not a generic data structure, then."

In generic C++ code, you can specialize a part of the generic algorithm to tune it to a particular use case. Usually it takes the form of a small class template which can be specialized for a particular type and is used by the generic algorithm operating on that type. This class template is called trait, policy or strategy depending on the way it is used.

"qsort and std::sort do completely different things. Comparing them is absurd. qsort takes the size and comparison operator at runtime. std::sort requires them to be specified at compile times."

Not at all, you can pass a function pointer to std::sort just as well if you need to [0]. Most of the time you don't need this indirection but in C you are stuck with it unless you copy-paste-edit qsort.

[0] https://godbolt.org/z/M1v8azojT

mcguire · on Aug 18, 2023

For one example,

https://github.com/tmmcguire/rust-toys/blob/master/alternati...

is a program that mmap's an anagram dictionary file and builds a fast-n-dirty hashmap dictionary over the file data. It took about an afternoon to write and was pretty decent.

https://maniagnosis.crsr.net/2014/08/letterpress-cheating-in...

snovv_crash · on Aug 19, 2023

Unordered map is a known design issue, on the same order as std::vector<bool>. C doesn't even have std::vector<int>>

NavinF · on Aug 18, 2023

That must be why every C project has its own string, dynamic array, hashtable, etc. It's definitely more performant to have several different implementations of the same thing fighting for icache

commonlisp94 · on Aug 18, 2023

It's a nice thought, but in practice C binaries are orders of magnitude smaller than any other language. Also compilers make that tradeoff for inlining all the time.

overgard · on Aug 18, 2023

To be fair, most C++ projects (outside of embedded and games) use STL for string/dynamic array/hashtable, and while that standardization is certainly convenient I'm not sure STL is generally faster than most hand written C data structures, even with the duplicated code.

mcguire · on Aug 18, 2023

Entirely true. Unfortunately, that comes at a cost of programming time and effort and a decent amount of risk.

StillBored · on Aug 18, 2023

Agony might be a bit much, and i'm not trying to defend C because this is one of the strong reasons for using C++/STL...

But, generally one should just reach for a library first before doing a complex data structure regardless of language. And for example, the linux kernel does a fine job of doing some fairly complex data structures in a reusable way with little more than the macro processor and sometimes a support module. In a way the ease of writing linked lists/etc are why there are so many differing implementations. So, if your application is GPL, just steal the kernel functions, they are reasonably well optimized, and tend to be largely debugged, are standalone, etc.

sn_master · on Aug 18, 2023

Try doing the same in Go and it's even worse.

commonlisp94 · on Aug 18, 2023

That's writing some other language in C syntax. You use arrays or you write a specialized version for the use case.

memefrog · on Aug 19, 2023

What does that have to do with writing an application?

mcguire · on Aug 18, 2023

The short answer is that C makes it easy to do things that you probably shouldn't ought to be doing. (The the answer to the opposite, "what work is C appropriate for?" is systems and embedded stuff, where doing-things-you-probably-shouldn't is largely the name of the game.)

Another answer is that C doesn't have much in the way of guardrails; in a world where programmer time is much more expensive than machine time, guardrails make programming much faster.

memefrog · on Aug 19, 2023

Every language makes it easy to do things that you probably shouldn't be doing. I don't know how many times I've seen naive accidentally-quadratic string handling code in high level languages, for example.

Your answer is just buzzword after buzzword. What makes C bad for writing applications?

mcguire · on Aug 19, 2023

SIGSEGV. Or those times when you really wish you had gotten a SIGSEGV.

memefrog · on Aug 21, 2023

Write correct code.

Gibbon1 · on Aug 18, 2023

When talking about hot path optimizations though assembly is still a good alternative.

memefrog · on Aug 19, 2023

It's quite clear to me that C is appropriate for applications work. Most applications I use on a daily basis are written in C (and were written after the 90s), and most of the rest are written in C++. What makes you think you can't write applications in C?

mcguire · on Aug 19, 2023

You obviously can, but doing so takes a considerable effort. Not making that effort leads to lots and lots of bugs.

khuey · on Aug 18, 2023

> When I'm working in C, I'm frequently watching the assembly language output closely, making sure that I'm getting the optimizations I expect. I frequently find missed optimization bugs in compilers.

Do you repeat this exercise when you upgrade or change compilers?

oasisaimlessly · on Aug 19, 2023

For hot functions that were carefully optimized, I have test harnesses that run them and measure their cycle counts, which are asserted to be equal to the cycle count after hand-optimization (+/- some small margin if there is non-determinism). This is sensitive to regressions while having a very low false-positive rate due to being (approximately) deterministic.

jacquesm · on Aug 18, 2023

The easy way to achieve that is to freeze the assembly once it is generated and to keep the C inlined around as documentation, as well as a way to regenerate the whole thing should it come to that (and then, indeed you'll need to audit the diff to make sure the code generator didn't just optimize away your carefully unrolled loops).

asvitkine · on Aug 18, 2023

Is there any tooling for that or are you talking hypothetically?

jacquesm · on Aug 18, 2023

Just standard unix tooling, what else do you need? It's as powerful a set of text manipulation tools as you could wish for.

8n4vidtmkvmk · on Aug 18, 2023

If the assembly is that important I'd find a way to put it into my unit tests. Create a "golden" for the assembly and it should trigger a diff if it changes.

latenightcoding · on Aug 18, 2023

Read your article (and cloudfare's) and as someone who uses musttail heavily I don't understand the hype, as you mentioned in your blog: you can get tail call optimizations with (-O2), musttail just gives you that guarantee which is nice, but the article makes it sound as if it unlocks something that was not possible before and interpreters will greatly benefit from it, but it's more reasonable to ask your user to compile with optimizations on than it is to ask them to use a compiler that supports musttail (gcc doesn't). Moreover, musstail has a lot of limitations it would be hard to use in more complex interpreter loops

mtklein · on Aug 18, 2023

Ordinarily you're at the whim of the optimizer whether calls in tail position are made in a way that grows the stack or keeps it constant. musttail guarantees that those calls can and are made in a way that does not let the stack grow unbounded, even without other conventional -On optimization. This makes the threaded interpreter design pattern safe from stack overflow, where it used to be you'd have to make sure you were optimizing and looking carefully at the output assembly.

If nothing else musttail aids testing and debugging. Unoptimized code uses a lot more stack, both because it hasn't spent the time to assign values to registers, but people often debug unoptimized code because having values live on the stack makes debugging easier. The combination of unoptimized code and calls in tail position not made in a way that keeps stack size constant means you hit stack overflow super easily. musttail means that problem is at least localized to the maximum stack use of each function, which is typically not a problem for small-step interpreters. Alternatives to musttail generally involve detecting somehow whether or not enough optimization was enabled and switching to a safer, slower interpreter if not positive... but that just means you're debug and optimized builds work totally differently, not at all ideal!

kaba0 · on Aug 19, 2023

I do buy it in case of C++, but C? You can’t even have a proper data structure with it due to it being so inexpressive - and surely data structures are one of the most significant factors of performance.

Besides, the industry hands down chose C++ for the truly performance-oriented workloads.

stjohnswarts · on Aug 19, 2023

Yeah I don't use c any longer unless I absolutely have to. I mostly do embedded so that happens sometimes. I much prefer c++11 on up.

dkersten · on Aug 19, 2023

I just want to thank you for that blog post! I’m currently writing a little bytecode vm for something and this really improves my code! Luckily I’m currently already only targeting clang, so I’m ok with using a non-portable feature.

jfengel · on Aug 18, 2023

With modern CPUs, that kind of hand-tuned assembly gets rarer and rarer. Pipelines and caching and branch prediction make it hard to know what's actually going to be faster. And even when you do know enough about the CPU, you only know that CPU -- sometimes only one model of a CPU.

There's still a niche for it, but it's tiny and it keeps getting tinier.

izacus · on Aug 18, 2023

That... really isn't all that true. I find that myth being mostly perpetuated by people who don't do any kind of performance work or have any understanding just how terribly unperformant most code out there is.

jfengel · on Aug 18, 2023

That is my observation as a compiler writer.

SubjectToChange · on Aug 18, 2023

Of course the problem is that the vast corpus of legacy C code was not written with such aggressive compilers in mind.

muldvarp · on Aug 18, 2023

> In effect, Clang has noticed the uninitialized variable and chosen not to report the error to the user but instead to pretend i is always initialized above 10, making the loop disappear.

No. This is what I call the "portable assembler"-understanding of undefined behavior and it is entirely false. Clang does not need to pretend that i is initialized with a value larger than 10, there is no requirement in the C standard that undefined behavior has to be explainable by looking at it like a portable assembler. Clang is free to produce whatever output it wants because the behavior of that code is literally _undefined_.

Also, compilers don't reason about code the same way humans do. They apply a large number of small transformations, each of these transformations is very reasonable and it is their combination that results in "absurd" optimization results.

I agree that undefined behavior is a silly concept but that's the fault of the standard, not of compilers. Also, several projects existed that aimed to fully define undefined behavior and produce a compiler for this fully defined C, none of them successful.

mpweiher · on Aug 18, 2023

1. The people creating the C standard were adamant that just following the standard was not sufficient to produce a "fit-for-purpose" compiler. This was intentional.

2. They were also adamant that being a "portable assembler" with predictable, machine-level semantics was an explicit goal of the standard.

3. The C standard actually does have text giving a list of acceptable behaviours for a compiler and "silently remove the code" is not in that list. And this text used to be normative, but was later made non-normative.

So I blame the people who messed with the standard, and guess who those people were?

moefh · on Aug 18, 2023

> The C standard actually does have text giving a list of acceptable behaviours for a compiler

The exact opposite is explicitly stated in the standard (from C11 section 3.4.3):

    undefined behavior

    behavior, upon use of a nonportable or erroneous program
    construct or of erroneous data, for which this International
    Standard imposes no requirements

The standard then lists some examples of undefined behavior, and it's true that "silently removing the code" is not in the list. Still, I think it's pretty clear that it's acceptable behavior, since the standard just stated it imposes no requirements.

mpweiher · on Aug 18, 2023

"Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

Note "permissible" and "ranges from ... to".

Again, this used to be normative in the original ANSI standard. It was changed in later versions to no longer be normative. Exactly as I wrote.

mike_hock · on Aug 18, 2023

Which is logically equivalent to imposing no requirements. "ignoring the situation completely with unpredictable results" does not meaningfully constrain the possible behaviors.

mpweiher · on Aug 18, 2023

That turns out not to be the case.

“Ignoring” is not “taking action based on”

Ignoring is, for example, ignoring the fact that an array access is out of bounds and performing the array access.

Ignoring is not noticing that there is undefined behavior and removing the access and the entire loop that contains the access. Or a safety check.

muldvarp · on Aug 19, 2023

> ignoring the fact that an array access is out of bounds and performing the array access

This doesn't mean anything. The standard imposes no requirements that an array access is translated into a memory access by the compiler. And that's a good thing because it enables optimizations that can be critical to achieve decent performance.

Assume for example that the compiler is able to prove (after inlining) that all elements of an array are equal to zero. This array is then used in a loop where in each iteration an index into this array is calculated and the corresponding array element is added to an accumulator. If the standard were to impose a rule that array accesses have to correspond to memory accesses, this code could not be optimized because one of the computed indices might lie outside of the array and the compiler is then forced to produce assembly that performs the memory access.

With UB, however, the compiler is allowed to assume that all accesses to the array are valid and will be able to completely remove the (very cache-unfriendly) loop.

mike_hock · on Aug 19, 2023

So, generally, optimizations shouldn't be allowed? Dead code removal shouldn't be allowed? Substituting constants shouldn't be allowed? Because "ignoring" UB is pretty much what the compiler did, and then let its optimization passes run to their conclusion.

> ignoring the fact that an array access is out of bounds and performing the array access

The idea that this is even meaningful is precisely the "portable assembler" misconception.

mpweiher · on Aug 19, 2023

Except the misconception is that it is not.

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p10, line 39

"C code can be portable. "

line 30

User23 · on Aug 18, 2023

I think you’re misinterpreting. Maybe it’s clearer if we elide the relative subclause:

  undefined behavior

  behavior … for which this International Standard imposes no requirements

That is an obvious definition of what is undefined behavior. It’s not giving license to do whatever. That said the ship has sailed and what implementors do obviously matters more than what the standard says.

iso8859-1 · on Aug 18, 2023

If there are no requirements on what it's doing, how is that not a license to do whatever?

There is not even a requirement that a theoretical program that contains e.g. only preceding code, would still maintain any invariants. So I don't see what an instance of "whatever" that violates "no requirements" would look like.

User23 · on Aug 18, 2023

Read with an either an atypical or malicious degree of literalism the standard supports ransomwaring your machine and indeed your entire network in the face of undefined behavior.

What it actually means is the standard doesn’t require doing the impossible, not that a program error is license to generate malicious code. That’s why it gives a (now formerly) normative list of possible approaches.

nlewycky · on Aug 19, 2023

No maliciousness is required. "1 + 2" produces "3" only because the standard defines it so -- it is therefore required that 1 + 2 produce 3. When the standard does not say what to do, there is no requirement.

You said "it's not a license to do whatever" which is only true because it isn't a license. The code can still do whatever, because there is no requirement that it do anything other than "whatever".

You seem to be arguing from a position that the code has meaning unless somehow UB gets involved, but the reality is that these bits and bytes don't mean anything until the spec tells us what meaning it has.

mpweiher · on Aug 19, 2023

> until the spec tells us what meaning it has.

That's a commonly held belief, but wrong. My first C compiler predated the ANSI C standard. Yet the code very much had meaning, just not in terms of the C standard.

The C standard defines the minimum a C compiler has to do to be in compliance with the C standard. It is not a complete definition, and this is intentional.

nlewycky · on Aug 19, 2023

> > until the spec tells us what meaning it has.

> That's a commonly held belief, but wrong. My first C compiler predated the ANSI C standard. Yet the code very much had meaning, just not in terms of the C standard.

Much like /* plain English comments */ have meaning, just not in terms of the C standard.

Talking about prestandard C isn't relevant because we're talking about what a standards conforming compiler can do with undefined behaviour per the text of the standard. I do not find your counterexample compelling.

> The C standard defines the minimum a C compiler has to do to be in compliance with the C standard.

The standard does not define what a compiler has to do. Instead, it defines the C abstract machine as well as syntax attached to behaviour in terms of that abstract machine. When a program which is restricted to that syntax is executed, it must have the effects (upon the C abstract machine*) as specified in the standard.

Sometimes the standard doesn't specify anything for a given syntax in a given abstract machine state, and sometimes the standard explicitly specifies that "the behavior is undefined".

> It is not a complete definition, and this is intentional.

I'm not sure what you mean. Compiler extensions exist (and are generally regarded as a necessary evil), is that what you're referring to? Or are you referring to implementation-defined things (all of which are explicitly called out in the standard)? Or something else? A citation, if you can?

* Notably the C language says nothing about the real machine. Supposing you have an I/O port at address 0x9000, accessing "*((char*)0x9000) = 1;" does not need to access memory address 0x9000 at all. Of course, compiler engineers are not silly and implemented it as accessing 0x9000 on the underlying machine. Similarly the standard committee is careful to ensure that the language it defines can be implemented without needed the overhead of a VM or interpreter, though sometimes they get that wrong. (For example, see the issues combining multiple alloca() and C99's "int foo[*];" in the same function.)*

mpweiher · on Aug 19, 2023

> we're talking about what a standards conforming compiler can do

No. You may be, I am not.

I am talking about what a reasonable C compiler can do, and what the ANSI standards committee intended.

Adhering to the ANSI/ISO standard is at best a necessary condition for producing a useful C compiler. It is most definitely not a sufficient condition. As I've pointed out many times before, this was intentional.

And the existence of pre-standard C compilers that worked is, of course, clear evidence that it is not a necessary condition either. Or at least was not.

The C standard leaves a lot "undefined" or "implementation defined" that is actually well-defined on a concrete machine the compiled code runs on. If you seriously think the intention of all this was to allow demons to fly out of nostrils or for compilers to start mining bitcoins, which is all perfectly legal by the standard, well, I don't really know what to say.

Particularly because the creators of the standards clearly stated so. In the very standard itself. Now they didn't make that language mandatory, so yeah, you can make a "standards compliant" compiler that violates that intent. But it will be a sucky compiler.

C ≠ The ANSI/ISO C standard.

muldvarp · on Aug 19, 2023

> The C standard leaves a lot "undefined" or "implementation defined" that is actually well-defined on a concrete machine the compiled code runs on.

This assumes there is a one-to-one correspondence between C constructs and machine code, which isn't true. C isn't a "portable assembler" and compilers are luckily able to choose whatever machine code they think will perform best under the assumption that there is no undefined behavior.

> But it will be a sucky compiler.

I don't think gcc or clang are sucky compilers at all. In fact, their ability to aggressively optimize valid code is extremely helpful for producing high-performance programs.

> C ≠ The ANSI/ISO C standard

So C is defined neither by the standard _nor_ by existing implementations and is instead defined to be whatever your headcanon is?

mpweiher · on Aug 20, 2023

> This assumes there is a one-to-one correspondence between C constructs and machine code

No it does not.

> which isn't true.

Actually, it is true in a vast majority of cases.

> C isn't a "portable assembler"

Not sure why people keep repeating this despite it being so obviously and patently untrue:

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p10, line 39

"C code can be portable. "

line 30

nlewycky · on Aug 19, 2023

> And the existence of pre-standard C compilers that worked is, of course, clear evidence that it is not a necessary condition either. Or at least was not.

The difference is that those compilers didn't have a spec to follow or disobey anyways. The can always just say "whatever happened is right" and you could argue that the compiler did something _unhelpful_ but the compiler broke no promise because it made no promise.

> Particularly because the creators of the standards clearly stated so. In the very standard itself.

They clearly stated the opposite:

""" * Undefined behavior --- behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message) """

from ANSI C89 (§1.6).

The compiler may ignore the situation completely leading to unpredictable results. If the compiler ignores a situation, such as a signed integer overflow, and that leads to unpredictable results, such as "next time we see an if-statement, execute both the if-block and the else-block", then that's 100% conforming. That's exactly what was intended. This sort of thing can happen, suppose the evaluation of an if-condition ends up in some CPU flags, and the compiler knows the two flags are exclusive because the only way they wouldn't be is if the program had a signed integer overflow.

FWIW, GCC 1.17, released in January 1988, would launch 'nethack' during the compile if it detected a #pragma that it didn't understand. The idea of interpreting it as "anything can happen" is neither incorrect nor new. (Technically in this case, an unknown pragma has implementation-defined behavior which is the same as UB plus a requirement that the behavior must be documented in the compiler's manual.) It was a bad idea though and they removed it because a compiler that does that is, well, not good. We call this property "quality of implementation", but it wasn't a correctness issue.

mpweiher · on Aug 20, 2023

> didn't have a spec to follow or disobey anyways.

Exactly. Yet they were still C compilers. So the idea that "being a C compiler" is the same as "follows the spec" is clearly nonsense. You can follow the spec and not be a usable C compiler, and you can be a usable C compiler and not follow the spec.

And the C spec not being complete was intentional, because otherwise too many already existing C compiler would have not had a chance of becoming ANSI compliant, and thus the spec would have been meaningless.

> They clearly stated the opposite:

> [..]

> Permissible undefined behavior ranges from ignoring ...

Very clearly gives a list of "permissible behaviors". Now if you believe that one of those options is "do anything you please" when it both clearly doesn't say that, it just "ignore with unpredictable results" AND it doesn't make logical sense, not sure how to help you.

(The two other options for permissible behavior are clearly completely redundant if one of them permits you to do anything you want whatsoever).

> "next time we see an if-statement, execute both the if-block and the else-block", then that's 100% conforming

It is "conforming" to the spec that has made that part non-binding. It does not conform to "ignore the situation", because the unpredictable behavior mentioned in the spec is that of the environment, not the compiler.

nlewycky · on Aug 20, 2023

> Very clearly gives a list of "permissible behaviors"

A statement like "we have products ranging from cake decorating to peanut crocheting to jousting lances" does not mean that this is a list of the only three types of items in the store. This construction is called a "false range" in English. When you have a range of something that does not have an order, it means "varied things, left unspecified". It's very clearly not supposed to be an exhaustive list, merely a few examples.

So the standard lists three examples. The first is that the runtime behaviour of the program may do any-unpredictable-thing. The second is that the compiler may 'behave in a documented manner' and maybe issue an error. They wanted these two examples because they didn't want any misunderstanding that UB was limited to what could be shown UB statically at compile time, nor that it was limited to only having effects on the program at runtime.

I'm not honestly sure why they bothered adding the third "oh, the compiler or the program terminates with an error message". I could speculate that this is what they wish would actually happen, and including it in the list improves the chance of that.

> unpredictable behavior mentioned in the spec is that of the environment, not the compiler.

I don't believe that's correct -- I think it appertains to the program not the environment or the compiler -- but it doesn't matter either way. The environment is responsible for supporting execution of the program, so if it's unpredictable then it follows that any unpredictable things can happen to your program -- it would be like trying to run on a CPU that's experiencing physical failures.

mpweiher · on Aug 20, 2023

> A statement like "we have products ranging from cake decorating to ...

If you have a statement like that, that is likely true. However, this is not a statement like that.

1. It gives a range of 3 permissible options. If one of these "permissible" options is "you can do anything", what are the other 2 options doing on that list?

2. Even worse for your interpretation, the very word permissible only makes sense if there are things that are not permissible. So once again, "you can do anything" makes no sense.

Both of these are non-sensical. Now had they actually written "you can do anything", this would be easy: they wrote nonsense. But they didn't write "you can do anything". What they actually wrote "ignoring the situation completely with unpredictable results". That this somehow (how?) means "I can do anything" is purely your interpretation. And your interpretation leads to nonsense. So clearly your interpretation is wrong, particularly when there is an alternative interpretation, that does not lead to nonsense.

In addition, the "interpretation" that does not lead to nonsense is the one that takes the words literally. "Ignore the situation". Not "act on the situation and then do anything I damn well please".

3. Even a heterogenous range restricts. Yes, "we have products ranging from cake decorating to peanut crocheting to jousting lances" does not mean that those are the only items in the store. But even with your somewhat odd choice of items, if you go into that store and ask for an aircraft carrier, you will get odd looks, because the range of items mentioned clearly restricts the items they stock to non-aircraft carriers.

> it would be like trying to run on a CPU that's experiencing physical failures.

No, it is like reading beyond the range of an array: the machine will attempt to read fron that location, that may return a value, we don't know what value, or it may signal a fault. What it does is not defined by the standard, hence undefined behavior.

It's not that* hard.