> would need to know that the loop range matches or subtends the array bound
Some compilers have pretty sophisticated analyses aimed at just that: determining affine relations to statically bound indexed accesses. Failing that, some compilers will resort to loop versioning, generating two versions of the loop and then partitioning the iteration space into the definitely-in-bounds range from possibly-out-of-bounds range, then selecting which portions of which loop to execute by prefixing both with dynamic checks. Couple all of that with peeling and unrolling, and bounds checks start disappearing and getting amortized away.
Unless libraries are receiving a copy of the meta representation of the program and running integer equality relations over the dataflow chains, then no, not really.
The library author has certain knowledge of what the library is meant to achieve, where the compiler is obliged to guess according to whatever tea leaves it can find to descry.
In particular, the library author knows that the container won't be changing size over the duration of the loop, something the compiler would have difficulty proving.
What's special about compilers, then? Compilers are code, and therefore, as you say, buggy.
Library authors know things about what their code is meant to be doing that compilers cannot deduce, so cannot act on. But the library author can. A library, according to how heavily it is used, benefits from more thorough testing than generic application code gets.
You're right that compilers tend to have bugs, but in practice, compiler bugs are rarely the cause of software issues. The same cannot be said of libraries. Major SSL/TLS libraries for instance tend to be written in C, and all of them have had memory-safety bugs.
> Library authors know things about what their code is meant to be doing that compilers cannot deduce, so cannot act on. But the library author can.
I don't see your point here.
> A library, according to how heavily it is used, benefits from more thorough testing than generic application code gets
This doesn't generalise. There's plenty of very widely used application-specific code, and there's plenty of little used library code. Also, widespread use does not imply a high level of scrutiny, even if we're talking only about Free and Open Source software.
Anyway, that's all a sidetrack. The benefits of memory-safe languages aren't up for debate, even for well-scrutinised codebases. We continue to suffer a stream of serious security vulnerabilities arising from memory-safety issues in code written in unsafe languages. The go-to example is Chromium, where 70% of serious security issues are due to memory safety. [0]
That is how the trope goes. But looking at the actual faults, we see them in bad old C-like code, so not interesting in the present context. Modern C++ code, for example, will always have exactly zero use-after-free bugs. OpenSSL is old C code, so of even less relevance here.
But the topic was not CVEs. It was optimization. An optimization opportunity that would be missed by a compiler can be explicitly spelled out for it by a library author. Whether the library author is obliged by the compiler to write "unsafe" next to it has exactly zero effect on the correctness of the code written in that place: you can easily write incorrect code there. If you don't, it was not because the compiler provided any help.
> Modern C++ code, for example, will always have exactly zero use-after-free bugs.
Not so. C++ is not a safe language, and never will be. Even if you avoid raw pointers you aren't safe from use-after-free bugs as the standard library makes them possible even then, with std::string_view (and perhaps other functionality). [0][1][2]
There is no safe subset of C++. People have tried, e.g. the MISRA folks, but they're unable to find a subset of C++ which is both safe and usable. The only way to guarantee the absence of undefined behaviour in a C++ codebase (or a C codebase) is to use formal analysis tools, which are a tremendous burden.
If it were possible to get decent safety guarantees out of C++, Mozilla wouldn't have bothered inventing a whole new language in the hope of improving Firefox.
I do agree though that modern C++ code is likely to have fewer memory-safety issues than 'C-style' C++ code.
> OpenSSL is old C code, so of even less relevance here.
It isn't irrelevant, our conversation wasn't specifically about the C++ language. I was responding to your suggestion that using well-known libraries written in unsafe languages, is a reliable way to avoid memory-safety issues. We know this isn't the case.
> An optimization opportunity that would be missed by a compiler can be explicitly spelled out for it by a library author.
Sure, but this whole thread is discussing that bounds checks are in practice generally inexpensive on modern hardware, except in cases like SIMD optimisations being precluded by the need for checks. I suspect this extends to other runtime safety checks too, but I don't have hard numbers to hand.
> An optimization opportunity that would be missed by a compiler can be explicitly spelled out for it by a library author.
Sure, that's an advantage of low-level languages. It doesn't negate the importance of memory-safety though.
Runtime checks are unlikely ever to have zero performance cost, sure, but the cost can be close to zero, and the fallout of removing checks from buggy code can be considerable.
> Whether the library author is obliged by the compiler to write "unsafe" next to it has exactly zero effect on the correctness of the code written in that place: you can easily write incorrect code there.
If it were a simple boolean matter of correct vs incorrect, then sure, but it often isn't. In practice, it can mean the difference between an exception being thrown, and undefined behaviour running riot, possibly leading to serious security issues.
> If you don't, it was not because the compiler provided any help.
Runtime checks are very helpful during development.
You keep trying to change the subject. But I have not promoted "subsetting" as a means to safety, and safety is anyway not interesting to real people. People want their programs to be useful. To be useful, a program must be correct, and every correct program is implicitly safe.
But the actual topic was not not that. The actual topic you have tried to steer away from is optimization. The point I made was that the author of a library can take up responsibilities that some people insist only the the language, via the compiler, can perform. The library author can perform optimizations the compiler fails to, and the library author can define interfaces that can only be used correctly, and safely. To the programmer using a library, it makes no difference, except that they may be unable to use some new, immature language, but can easily pick up and use a good library.
Some compilers have pretty sophisticated analyses aimed at just that: determining affine relations to statically bound indexed accesses. Failing that, some compilers will resort to loop versioning, generating two versions of the loop and then partitioning the iteration space into the definitely-in-bounds range from possibly-out-of-bounds range, then selecting which portions of which loop to execute by prefixing both with dynamic checks. Couple all of that with peeling and unrolling, and bounds checks start disappearing and getting amortized away.