I’m afraid this might be too little, too late. But regardless whether I’m right or wrong, we all should welcome a widely used language trying to become a little more safe.
On the other hand, it appears to have a great bang/buck. Link existing programs against the hardened libc++ and get those checks for "free" (other than the perf hit.)
I'm not at all clear on how exactly that would work. The entire program would have to be rebuilt from scratch, would it not? Things like std::vector::operator[] are not runtime library calls, they are almost always inlined by the compiler. You could not simply take an existing program, replace its runtime library, and get bounds-checked vector accesses.
Of course, it is a compiler feature, you will have to recompile and manually (or automatically) apply all suggested changes to the source code.
If you just link against an hardened library, only the implementation of the (non inlined) library calls will be hardened, the user code itself won't see much benefits.
Of course these days we have binary level optimizers, so who knows what it is possible in principle?
Sorry, I used 'link' very liberally there, but you cleared it up. My point was that (from what I remember, it's been a few days) you'd mostly just need to recompile. As opposed to, you know, Rewrite It In Rust (TM).
Even if not inlined at -O0, when instantiated for a custom some_type, the standard library .so won't contain any code for it, everything will be in the application binary.
It’s difficult to imagine which applications would find the performance of an outline vector access acceptable. Generally vector::operator[] is represented in built programs by a single x86 mov. A call would wreck the performance. In languages with bounds-checked vectors, they don’t do it with library calls.
I have practically always used bounds checked vectors and strings.
In the old days, all C++ frameworks bundled with compilers tended to have bounds checking enabled by default (Turbo Vision, OWL, VCL, MFC, PowerPlant, CSet++, Qt,...).
Nowadays I always set _ITERATOR_DEBUG_LEVEL to 1 on VC++, unless I am not allowed to.
Mostly the performance problems that were dealt with, had to do with badly chosen algorithms and data structures.
Given that in many domains C++ will keep being the language to use, and most wannabe replacements depend on GCC and LLVM infrastructure (written in C++), any little bit of improved safety helps.
If C++ could, and I think with some respect and acknowledgement to others as having gotten it better, adopt two improvements:
* library/language side: some domain of bullet proof memory
* compiler/language side: move build dependencies into the compiler or something that frontends the compiler ala GO/Rust with a flag to enforce it so it's optional. Here don't let perfect be the enemy of the good. There'd need be some new keywords or namespaces to make this work.
That'd be superb
I have all kinds of chances and opportunities to learn Rust but our eco system is endless C/C++. So I'd rather not deal with another complicated language. There's no near term or medium term timeline to move even 20% into Rust. I bail out to GO for I/O bound work.
There's multiple frameworks which give bullet proof memory usages and other fixes through annotations. The most common being https://learn.microsoft.com/en-us/cpp/code-quality/understan... . People just dont use them besides kernel devs, and even then they don't really use it.
I believe the entire MS kernel is written using this.
a) Not just the Windows kernel - Office uses SAL, too, in numerous places. Mostly in legacy places predating more modern C++.
b) It's not true to say that kernel devs don't really use it. They use it a lot, though there are attempts at obsoleting some of it[0].
Oh nice, I didn't know that about office, it makes sense though. I don't work for Microsoft and mainly work in the Antivirus space, so I only occasionally hear stories from friends about their internal stuff. When I said driver devs don't always use it, I mainly meant third party vendors writing kernel code. I know alot of the overseas places don't, and I've rarely had jobs which use it.
Oh, I'd buy that in a heartbeat. SAL is, shall we say, most useful in the presence of non-public tooling, and it's also in general a fairly awkward language extension. Doesn't surprise me at all to hear that third-party devs haven't taken it up.
But programming is multiplication. A programmers work multiplies. If some person once writes code that ends up leaking users data, that data might like a thousand users data in a way that cannot be unleaked. That broken code then multiplies through time.
You would not say: Let those [civil engineers] who like unsafe bridges build unsafe bridges
Because the people using them might not be able to judge the unsafety of the system. Because the work of programmers multiply we have a huge responsibility and a huge chance to improve the world.
Not everybody works in web development. Generally, C++ is mostly used for anything where performance is paramount. Personally, I write audio code. I rarely have memory issues because most of it is just number crunching. I am working in timespans of microseconds, so speed is the ultimate concern.
In fact, we have had bounds checking C and C++ extensions for decades, and practically no one uses them save for debugging purposes. This kind of shows how much interest there really is outside of some smaller circles where it may be relevant.
I really doubt we are every going to see a "hardened C++" making any inroads, but at least I find that slightly more plausible than any other new language displacing it.
And is one of the drivers to have memory hardware tagging enabled in all devices, just like Microsoft is doing with Pluton, Oracle has been doing for a decade with Solaris SPARC, and Apple is adding to their ARM flavours as well.
Yes, a comment mentions something that I have seen more than once in practice: project use bound-checked access on std::vector, but it was too slow, so instead people turn to calling data() to get a raw pointer and use operator[] on that one which removes all possible form of safety.
They won't have any alternative when running in platforms with hardware memory tagging, it will be only a matter of time until all new mainstream OSes are only available in such form.
Obviously they would have an alternative, you can just mark all your memory with the same tag. Object tagging in hardware is practically as old as... lisp?
Certainly you still can. It's not like the kernel is going to forbid a user process from using its memory to arrange a heap (or anything else) in any way it wants.
I think we are miscommunicating here, because you are claiming something which is obviously impossible. What is going to prevent a process from _writing to AND reading from its own memory_ exactly? Its "own memory" may be anything including potentially a large object of size "MAXMEM" that it just got from the system, which the program is then going to arrange into any way it likes and which is going to include a heap and vectors and whatever. This is not a ridiculous over-the-top example, this is literally how most programs under most operating systems work these days...
If your system provides a special allocator that actually does tell the hardware which allocations correspond to which object then that's nice (and not a new thing... many lisp machines had this, plus x86 segmentation, etc.), but it really does not prevent and cannot prevent a process using the allocated memory however it sees fit. The only thing that could is throwing turing-completeness out of the window.
Yes, (ancient) x86 segments are a form of memory tagging. I always found it ironic that many people complain about "the PDP-11 memory model" and yet we already had a popular architecture trying to get away from it.. only to result in one of the most hated memory models ever. While you can definitely do better, I am skeptical that any new type of memory tagging is going to improve significantly on it.
Which is still object-granularity memory tagging, and under no circumstances it can prevent a program from using its own memory any way it wants (e.g. by simply never requesting more than one object), and they obviously don't even claim to do so.
In order to create a system where you can't have a way to avoid bounds checking (as the above post was claiming), you would basically have to throw turing-completeness away, e.g. by preventing the implementation of arrays, lists, etc. The moment you can implement practically any data structure whatsoever, you can use it to build an allocator on top of it (through a heap... or not), and then proceed to access each (sub)object allocated by it without any bounds checking whatsoever. From the point of view of the hardware/OS, you will only have one large object. This is literally how most user programs work today (from the point of view of the hardware/OS).
You can provide a system allocator that tags objects separately as you allocate, but there is no way to prevent a user program from managing its own memory as it sees fit.
How would any of these prevent a process from doing whatever it wants with its own memory, for example in the way I described on my previous post? Also Microsoft Pluton at least does not do memory tagging.
If a process wants to commit suicide, there is hardly anything that can be prevented I guess.
These measures prevent access to the tagging metadata from userspace, that is how they prevent it, and usually MMU isn't accessible to userspace.
> By using spare bits in the cache and in the memory hierarchy, the hardware allows the software to assign version numbers to regions of memory at the granularity of a cache line. The software sets the high bits of the virtual address pointer used to reference a data object with the same version assigned to the target memory in which the data object resides.
> On execution, the processor compares the version encoded in the pointer and referenced by a load or store instruction, with the version assigned to the target memory.
Now if one goes out of their way to work around security mitigation and enjoy the freedom of corrupting their data, maybe they should ask themselves if they are on the right business.
> However it is now being combined with ARM technology
You mean PAC. PAC does not really do what you think it does; the original program can still corrupt its data as much as it wants (even due to bugs). And I don't think it has anything to do with Pluton.
> If a process wants to commit suicide, there is hardly anything that can be prevented I guess.
> Now if one goes out of their way to work around security mitigation and enjoy the freedom of corrupting their data, maybe they should ask themselves if they are on the right business.
No; what I have been trying to do for the entire comment chain is to show that this is how things work _right now_ (the entire process' memory being a single opaque object/blob with the same tag), and that for obvious reasons you cannot forbid that and just claim that "there will be NO alternative" because you will always be able to continue doing what everyone is doing _right now_, barring a significantly upheaval in the traditional definitions of computing.
If you provide an allocator that integrates more closely with the hardware then that is all and fine, but you just _cannot_ claim there will be no alternative specially when the alternative is just to continue what you're doing right now.
Again, we've had many architectures with memory tagging, hardware bounds checking, whatever you can think of. E.g. even x86 had memory tagging at a time (386 segmentation which was globally panned), hardware bounds checking (MPX, completely ignored due to the reduced performance), etc.
I'm currently working with very good researchers on formal verification of a part of the software I'm writing in rewriting logic with Maude - formalizing a fairly simplified version of the execution engine algorithms (0.005% of the c++ codebase maybe, 2500 lines of c++ at most) is taking ~a year roughly so far. That's not really viable for general development especially with requirements which evolve all the time.
No construction software for bridges will stop the engineer from building unsafe bridges. Why should we need a compiler that will do the same with software?
What stops the engineer from building bridges are the law regulations and his work environment. This regulations and work environment exists in programming too.
The crucial difference between construction engineering and software engineering is the level of accidental complexity it allows. Any junior software engineer can create and ship unmaintainable architectural monstrosities, and most stakeholders might be completely oblivious of that fact and what it means until much later. Software architecture and programming concerns can be difficult to communicate to non-technical people.
Regulations and work environments seem to not exist to a sufficient degree in programming to prevent the sort of errors that efforts like from TA are directed against. The attitude of many programmers towards testing would be considered reckless in many other domains.
On the other hand, compilers and IDEs have very much become part of our work environment and should help as much as feasible in avoiding as many errors ahead of time.
We might also put construction engineering on too much of a pedestal here. Bridges are very well-defined artifacts that humans have been building for centuries. How well exactly are we doing with other infrastructure projects?
Bad comparison. Brain surgeries are a method of last resort. They are unsafe because there are significant knowledge gaps in our understanding of the brain that might take centuries more to fill. Every time a neurosurgeon cuts, they are potentially doing irreversible damage to an important part of a person's consciousness, abilities and memories.
Edit: and there is no way to prevent that damage except by detailed planning and by stopping to cut when the patient (under local anestesia) ceases talking or performing their craft. Other damage can only be detected later though.
Is surgery often a last resort, done when less invasive things have been ruled out or cannot be used? Yes. Is the human body the most complex “machinery” that we know, demanding our utmost respect? Is the brain the most delicate part of the body? Probably, yes.(?) Is brain surgery in turn a very delicate procedure that should be approached with a thousand-fold more caution than mere tinkering on human artifacts, such as programming? Yes.
It’s like you have gone out of your way to prove the opposite point of what you were ostensibly trying to prove.
If I'm reading this correctly, this has two components:
- a bound checked libc++, which is not particularly exciting, libstdc++ has had _GLIBCXX_ASSERTIONS for a while for example. In fact it seems that libc++ has had it as well.
- clang warnings to flag potentially unsafe code plus fix-it suggestions to convert it to safer code using the bound checked libc++.
The later is quite interesting, especially the fix-it might make it easier to incrementally "harden" a large c++ code base.
The first one also sounds a bit like -D_FORTIFY_SOURCE=2 in glibc which fortifies memcpy etc. Many Linux distros have had this enabled by default for years. (https://stackoverflow.com/a/16604146)
Agree the second one both sounds interesting and also quite disruptive to existing code.
_GLIBCXX_ASSERTIONS is slightly different because it uses the bounds information that is already available. The historical expectation is that operator[] doesn't perform bounds checking by default, you need to use the at(size_type) members for that. Hence _GLIBCXX_ASSERTIONS for opting in. It's basically a library-only thing, much easier to implement than source fortification.
This is unfortunately ABI-altering, therefore it won’t be possible to selectively enable for projects that want it. It’s a distribution that has either to bite the bullet and say “we are going to pessimize all C++ software”, or err on the safe side and leave it disabled.
The warning about pointer arithmetic is most likely suppressable per compilation unit (for things like tagged pointers).
It is mentioned that the fixit will create forwarding overloads (marked deprecated) automatically, so the old interface is preserved for API and ABI compatibility.
As of today, vector’s iterator is just a pointer. You need more than just a pointer to detect out of bounds accesses. It makes old binaries incompatible with new binaries. It also means that you can’t mix and match, it must be either a or b.
Hence you can’t make the change incrementally, which is a big risk.
Concerning Linux distributions, I seriously doubt if they have resources to assess the performance impact across the wide range of software.
Concerning performance impact, we end up with more instructions and more memory accesses and increased register pressure. This has a chance to make things slower.
If the functions are separate overloads there is no need of versioning in the first place.
On the other hand if you change the layout of structures (like it would be the case for bound checked pointers), it is much much harder. GCC has ABI tags, but they only help a bit.
In any case nobody is going to bound check iterators by default, these already exists when using whatever debug standard library your compiler provides and they are orders of magnitude slower than unchecked iterators.
> For example, accessing a std::span or a std::vector outside of its bounds would abort the program, and so would accessing an empty std::optional. This can be done while staying Standards-conforming because undefined behavior implies that the library can do whatever it wants, which includes aborting.
sickosyes.jpg
Getting bad operations to abort execution (e.g. by throwing an exception) seems impossible to ship, not in the least due to certain, uh, entities really being against it. But abusing UB to allow people to turn on checks is excellent. This is why we should be very careful about what we choose to standardize-a common request is making signed overflow wrap instead of being UB, which prevents its diagnosis in this way.
Making out of bounds access on explicitly bound types is an easy memory safety win.
The fact the UB is still being used as an excuse to clearly undermine reasonably expected behaviour remains a BS trope, and WG21's continued acceptance of new features and specifications that say perfectly specifiable behaviour is UB instead of unspecified is a continuing source of avoidable security bugs.
It is inexcusable for new features to be added to C++ that have UB in any case other than the undefinable (invalid memory access, etc). Removing the copious unnecessary UB from the existing specs should be a higher priority that many of the new features being proposed.
Much that is UB deserves to be defined. And some that is defined deserves to be UB. Can you make it illegal? Then do not be too eager to deal out behaviors in standardization.
I don't actually oppose, in the long run, slowly picking some behaviors and changing them to be standardized. But I do vehemently oppose many of the suggestions that are brought up. The kinds of things that should be moved out of being undefined are things like "you named your function isfoo" and not "you passed NULL to memset".
It's an LoTR joke: "Many that live deserve death. And some that die deserve life. Can you give it to them? Then do not be too eager to deal out death in judgement"
Dereferencing a pointer to memory that doesn't point to a live valid object is undefined behaviour. You couldn't possibly specify what this does without instrumenting every single access to memory, like asan or valgrind does.
The difference is that implementation defined behavior must be documented and consistent, you get rid of code magically disappearing (null checks after null pointer deref for example) and becoming security problems when you update your compiler.
> you get rid of code magically disappearing (null checks after null pointer deref for example) and becoming security problems when you update your compiler.
You're not going to get that. It would make these proposed safety checks unshipably slow. Languages that have bounds checked array access by default rely on the compiler/JIT being able to recognize in the common case that those branches are dead code & eliminating them. Those "magically disappearing null checks" are in the same category. Mandating that they cannot be optimized away, even though the compiler can "prove" that they are not necessary, isn't going to fly.
First up - bounds checks are simply not expensive, that was true a decade or two ago, it is not now. Similarly null checks are not expensive either. Now the standard C++ "I'll throw an exception now" model for errors is expensive for myriad reasons, and the general mode of development for many projects is to just outright disable exceptions entirely at build time due to the costs that come from the "zero cost" exceptions.
That's why this proposal isn't suggesting throwing an exception - think brk or int3, which is essentially what other bounds checked languages like rust do.
This also isn't mandating that the check can't be thrown away. What it is doing is changing UB to unspecified/implement defined behavior. The difference is important: when something is UB the compiler is free to say "UB is not valid, therefore any preceding branches that would result in UB cannot happen" and then remove them. With unspecified behaviour the compiler only loses the ability to make assumptions based on "this UB path cannot be taken".
C and C++ both have a problem in overuse of undefined vs unspecified behaviour. Take integer overflow: there is no reason that that should be UB, on all hardware it has well defined behavior, and the impact of pretending that this isn't the case has been numerous security exploits over the year. Unspecified behaviour does not (by definition) mean that the spec has any prescriptive behavioral rules beyond "this must be consistent across all sites". For example integer overflow should always be 2's complement or it should always trap, the compiler doesn't get to mix and match, and doesn't get to pretend overflow doesn't happen.
> First up - bounds checks are simply not expensive, that was true a decade or two ago, it is not now. Similarly null checks are not expensive either
They're not expensive because they are optimized away. Branches are still relatively expensive in general. Prediction only goes so far.
> Take integer overflow: there is no reason that that should be UB
Yes there is, it allows for size expansion of the type to match the current architecture. Like promoting a 32 bit index into a 64 bit index for optimal array accesses on a 64 bit CPU. Which you can't do if you're needing to ensure it wraps at 32 bits.
It's not the how it overflows that's the problem, it's the when it overflows that is. Defining it means it has to happen at the exact size of the type, instead of at the optimal size for the usage & registers involved.
> and the impact of pretending that this isn't the case has been numerous security exploits over the year.
And many of those exploits still exist if overflow is defined, because that the overflow happened was the security issue, not the resulting UB failure mode. UB helps here because now it can categorically be called a bug, and you can have things like -ftrapv. That unsigned's wrap is defined and not UB is actually also a source of security bugs - things like the good ol' `T* data = malloc(number * sizeof(T));` overflow bugs. There's no UB there, so you won't get any warnings or UBsan traps. Nope, that's a well-defined security bug. If only unsigned wrapping had been UB that could have been avoided...
But if you want code that is hyper consistent and behaves absolutely identically everywhere regardless of the hardware it's running on (which things like crypto code does actually want), then C/C++ is just the wrong language for you. By a massive, massive amount. Tightening up the UB and pushing some stuff to IB doesn't suddenly make it a secure virtual machine, after all. You're still better off with a different language for that. One where consistency of execution is the priority, abstracting away any and all hardware differences. But that's never been C or C++'s goal.
C/C++ do have problems with UB. But int overflow isn't actually an example of it. And unsigned's overflow being defined is also really a mistake, it should have been UB.
You are right, but OP asserted that IB (implementation defined behavior) would be needed for this. IB would also allow this, but it is not necessary.
All of undefined/unspecified/implementation-defined allows this.
edit:
Whether unspecified or implementation-defined would allow trap depends on the exact wording. If the wording is along the lines of "[on overflow] the value of the expression is unspecified/implementation-defined" then trap is not allowed. But it can be also "[on overflow] the behavior of the program is implementation-defined", which would also allow trap. Point being, unspecified/implementation-defined also have a defined scope, while undefined behavior always just spoils the whole program.
> the behavior of the program is implementation-defined
You're basically defining undefined behavior. The way to do this "properly" would be "the program is allowed to do x, y, z, or also trap"–there's really not much else you can do.
Well, it is not that easy. Specifying a subset of allowed outcomes might still prevent future evolution.
For example bound checking, while fairly cheap for serial code, makes it very hard to vectorize code effectively. Then again so does IEEE FP math and that's why we have -ffast-math...
In the end, it is not impossible, it is just tradeoffs.
I'm generally on the opinion that there should be as little UB as possible, but probably this is better done by compilers that can push the limit of static and dynamic checking technology instead of the standard mandating it.
Right, that’s my point at the top of the thread :) I’m just saying if you don’t end up with something that looks like what I just mentioned you’re really not specifying anything, so you might as well go for undefined behavior. This is fine.
Unspecified behavior and implementation defined behavior are distinct in theory. The latter requires the behavior to be documented by the implementer, while the former does not.
In practice I dare you to find compiler documentation for each implementation defined behavior.
No idea how complete it is. Also a lot of stuff is architecture/platform specific, not compiler specific, so you won't find it in the general compiler docs but you have to look at the psABI.
This is great news. Such hardening can already be compiled in C code, and is even shipping in some projects, e.g. the Linux kernel has a CONFIG_UBSAN option which activates compiler sanitizers for things like array bounds checks that will panic on failure. Makes sense to extend similar behaviour to projects based on C++ codebases.