Hacker News new | past | comments | ask | show | jobs | submit | more sirwhinesalot's comments login

There's nothing wrong with simple usages of goto.

The strxcpy family on the other hand is complete garbage and should never be used for any reason. I'm horrified that they're used in the kernel at all. All of those functions (and every failed attempt at "fixing" them) should have been nuked from orbit.


This is the approach taken in git https://github.com/git/git/blob/master/banned.h


> There's nothing wrong with simple usages of goto

Indeed a like a few gotos here and there for doing cleanup toward the end of the function.


Or to break out of nested loops. The problem is with unstructured goto spaghetti making the code impossible to follow without essentially running it in your head (or a debugger).

Goto + Switch (or the GCC computed goto extension) is also a wonderful way to implement state machines.


What's wrong with `strncpy`?


strncpy won't always write a trailing nul byte, causing out of bounds reads elsewhere. It's a nasty little fellow. See the warning at https://linux.die.net/man/3/strncpy

strlcpy() is better and what most people think strncpy() is, but still results in truncated strings if not used carefully which can also lead to big problems.


Speaking of strlcpy, Linus has some colorful opinions on it:

> Note that we have so few 'strlcpy()' calls that we really should remove that horrid horrid interface. It's a buggy piece of sh*t. 'strlcpy()' is fundamentally unsafe BY DESIGN if you don't trust the source string - which is one of the alleged reasons to use it. --Linus

Maybe strscpy is finally the one true fixed design to fix them all. Personally I think the whole exercise is one of unbeliavable stupidity when the real solution is obvious: using proper string buffer types with length and capacity for any sort of string manipulation.


> the real solution is obvious

If it were obvious it would have been done already. Witness the many variants that try to make it better but don't.

> using proper string buffer types with length and capacity

Which you then can't pass to any other library. String management is very easy to solve within the boundaries of your own code. But you'll need to interact with existing code as well.


> If it were obvious it would have been done already. Witness the many variants that try to make it better but don't.

Every other language with mutable strings, including C++, does it like that. It is obvious. The reason it is not done in C is not ignorance, it is laziness.

> Which you then can't pass to any other library. String management is very easy to solve within the boundaries of your own code. But you'll need to interact with existing code as well.

Ignoring the also obvious solution of just keeping a null terminator around (see: C++ std::string), you should only worry about it at the boundary with the other library.

Same as converting from utf-8 to utf-16 to talk to the Windows API for example.


> The reason it is not done in C is not ignorance, it is laziness.

Of course not. C has been around since the dawn of UNIX and the majority of important libraries at the OS level are written in it.

Compatibility with such a vast amount of code is a lot more important than anything else.

If it were so easy why do you think nobody has done it?

> Ignoring the also obvious solution of just keeping a null terminator around

That's not very useful for the general case. If your code relies on the extra metadata (length, size) being correct and you're passing that null-terminated buffer around to libraries outside your code, it won't be correct since nothing else is aware of it.


> If it were so easy why do you think nobody has done it?

People have done it, there are plenty strbuf implementations to go around. Even the kernel has seq_buf. How you handle string manipulation internally in your codebase does not matter for compatibility with existing libraries.

> That's not very useful for the general case. If your code relies on the extra metadata (length, size) being correct and you're passing that null-terminated buffer around to libraries outside your code, it won't be correct since nothing else is aware of it.

You can safely pass the char* buffer inside a std::string to any C library with no conversion. You're making up issues in your head. Don't excuse incompetence.


> People have done it, there are plenty strbuf implementations to go around.

Precisely!

Why plenty and why is none of them the standard in C?


The TL;DR on that is basically "lazy, security unconscious assholes keep shutting it down".

Dennies Ritchie strongly suggested C should add fat pointers all the way back in 1990. Other people have pointed out the issues with zero terminated strings and arrays decaying into pointers (and the ways to deal with them even with backwards compatibility constraints) for years.

One of the most prominent was Walter Bright's article on "C's Biggest Mistake" back in 2009 and he was a C/C++ commercial compiler developer.

There is no excuse.


It is easy to document mistakes in hindsight, since hindsight is 20/20.

It is very easy to write your own one-off secure string handling library. This is a common assignment in intro to C programming classes.

So why isn't it standard in C already?

You offer a theory that there is a gang of "security unconscious assholes [who] keep shutting it down". This gang is so well organized that they have managed to block an easy improvement for many many decades for unknown reasons. That's a pretty wild theory.

Or Occam's razor suggests a different answer: It's actually difficult.

No, not the writing code part, that's easy. It's the seamlessly integrating with ~60 years of mission critical codebases part that's hard.


There's no need to integrate with 60 years of mission critical codebases, you're making up a problem in your head that doesn't exist.

Nothing needs to be fixed, all it takes is to stop doing the stupid thing.

It does not take a "coordinated gang" to shut down C standard proposals, them getting shut down is the default.

You seem to be neither familiar with the nature of the problem or the struggle that is getting anything passed through ISO standardization. I don't mean to belittle you by saying this, I just hope to make you understand that you are assuming things that are simply not based in reality.

It doesn't even need to be in the standard btw. Just write your own. It's a few lines of code. As you say, a beginner exercise. Yet there is code written after the year 2000 that still uses the strxcpy family. Long after the issues have been known and what the solution is.

"Backwards compatibility" is a total red herring. C++ has the solution right there in its standard library. A backwards compatible string buffer implementation.


> Nothing needs to be fixed, all it takes is to stop doing the stupid thing.

Well we'll just agree to disagree I suppose, as I'm equally convinced that you're not grasping what the problem actually is.

All I can say is that if this were as easy to fix as you assert and "all it takes is to stop doing the stupid thing" and we both agree that writing code for the better thing is super easy, then consider why it has not been possible to fix in the C universe.


I don't know what to tell you. Look at the git codebase, they downright ban any usage of the strcpy family, going so far as to hide them under macros so people can't use them.

Banning them outright is not possible in old codebases before the internet got really popular and people were pointing out how bad these functions were, but they sure could stop using them in any new code written in that codebase. That's what code review is for.

Any C code written after 2010 has absolutely no excuse to use these functions. They are inefficient, unsafe and more annoying to use than a strbuf implementation that takes half an hour to write.

So why have people continued to use them?

Option a) they were already there, the codebase is over 30 years old, and replacing the code entirely would be too much work. This is a valid reason.

Option b) ignorance, they don't know how to write a strbuf type. This one is downright impossible, any C dev knows how to do it, and like I said, literally every other language does it the same way.

Option c) laziness. This is for me the only real reason. As awful as these functions are, they're in the stdlib. You still see people saying "simple usages of strncpy are fine". They are not fine.

If you can think of an option d) I'd love to know, because I honestly can't think of anything else. Note that interfacing with existing 30 year old codebases does not count, as how you internally manipulate strings has no bearing on that, all you need to ensure is the 0 terminator at the end.

You get a mutable char* from the old function. You shove it in a struct strbuf {size_t capacity, size_t length, char* data}. Done.

You get a constant char* from the old function. You call strlen followed by malloc and memcpy into a new buffer for the strbuf. Or if you don't need to actually mutate the string, you store it in a non-zero terminated struct strview {size_t length; char* data}.

So what is the challenge here? Why is usage of strcpy not banned in any codebase less than 20 years old?


> you should only worry about it at the boundary with the other library.

If this was a mitigation, it would solve all problems with nul-terminated strings i.e. do strict and error-checked conversions to nul-terminated strings at all boundaries to the program, and then nul-terminated strings and len-specified strings are equivalently dangerous (or safe, depending on your perspective).

The problem is precisely that unsanitised input makes its way into the application, bypassing any checks.


It's impossible to avoid "sanitizing" input if you have a conversion step from a library provided char* to a strbuf type. Any use of the strbuf API is guaranteed to be correct.

That's very different from needing to be on your toes with every usage of the strxcpy family.


> It's impossible to avoid "sanitizing" input if you have a conversion step from a library provided char* to a strbuf type. Any use of the strbuf API is guaranteed to be correct.

I agree: having a datatype beats sanitising input (I think there's a popular essay somewhere about parsing input vs sanitising input which makes pretty much the same point as you do), but it's still only partially correct.

To get to fully correct you don't need a new string type, you need developers to recognise that the fields "Full Name" and "Email address" and "Phone number", while all being stored as strings, are actually different types and to handle them as such by making those types incompatible so that a `string_copy` function must produce a compilation failure when the destination is "EmailAddressType" and the source is "FullNameType".

Developers in C can, right now, do that with only a few minutes of extra typing effort. Adding a "proper" string type is still going to result in someone, somewhere, parsing a uint8_t from a string into a uint64_t, and then (after some computation) reversing that (now overflowing) uint64_t back into a uint8_t.

If you're doing the right thing and creating types because "Parse, Don't Validate", a better string type doesn't bring any benefits. If you're doing the wrong thing and validating inputs, then you're going to miss one anyway, no matter the underlying string type.


Sure but now we're talking about a universal problem across languages, rather than a C-specific problem.


> Sure but now we're talking about a universal problem across languages, rather than a C-specific problem.

Of course, but that's my point - C already gives you the ability to fix the incorrect typing problem, using the existing foundational `str*` functions.

A team who is not using the compiler's ability to warn when mixing types are still going to mix types when there is a safe strbuf_t type.

The problem with the `str*` functions can be fixed today without modifying the language or it's stdlib.

Most C programmers don't do it (myself included). I think that, in one sense, you are correct in that removing the existing string representation (and functions for them) and replacing them with len+data representation for strings will fix some problems.

Trouble is, a lot of useful tokenising/parsing/etc string problems are not possible in a len+data representation (each strtok() type function, for example, needs to make a copy of what it returns) so programmers are just going to do their best to bypass them.

Having programmers trained to create new string types using existing C is just easier, because then you solve the whole 'mixing types' problem even when looking at replacements for things like `strtok`.

Or ... maybe I'm completely off-base and the reason that programmers don't create different types for string-stored data is because it is too much work in current C-as-we-know-it.


For me the "real" solution looks something like this:

    ssize_t strxcpy(char* restrict dst, const char* restrict src, ssize_t len)
Strxcpy copies the string from src to dst. The len parameter is the number of bytes available in the dst buffer. The dst buffer is always terminated with a null byte, so the maximum length of string that can be copied into it is len - 1. strxcpy returns the number of characters copied on success, but can return the following negative values:

    E_INVALID_PARAMETER: Ether dst or src are NULL or len < 1, no data was copied
    W_TRUNCATED: len - 1 bytes were copied but more characters were available in src.
strxcat would work similarly. I have not decided if the return value should include the terminating null or not.


How is this useful though? I mean yes, it is useful in avoiding the buffer overruns. But that's not the only consideration, you also want code that handles data correctly. This just truncates at buffer size so data is lost.

So, if you want the code to work correctly, you need to either check the return code and reallocate dst and call the copy again. But if you're going to do that might as well check src len and allocate dst correctly before calling it so it never fails. But if you're already doing that, you can call strcpy just fine and never have a problem.


Sometimes truncation is fine or at least can be managed. Yes, strdup() is a better choice in a lot of situations, but depending on how your data is structured it may not be the correct option. I would say my version is useful in any situation where you were previously using strncpy/cat or strlcpy/cat.


Wow yeah this seems to summarize well the usual api flakiness and just shuffling of C

It seems people come with "one more improvement" that's broken in one way or the other


The problem with strlcpy is the return value. You can be burned badly if you are using it to for example pull out a fixed chunk of string from a 10TB memory mapped file, especially if you're pulling out all of the 32 byte chunks from that huge file and you just wanted a function to stick the trailing 0 on the string and handle short reads gracefully.

It's even worse if you are using it because you don't fully trust the input string to be null terminated. Maybe you have reasons to be believe that it will be at least as long as you need, but can't trust that it is a real string. As a function that was theoretically written as "fix" for strncpy it is worse in some fundamental ways. At least strncpy is easy enough to make safe by always over-allocating your buffer by 1 byte and stuffing a 0 in the last byte.


strncpy() also zero pads the entire buffer. If it's significantly larger than the copied string you're wasting cycles on pointless move operations for normal, low-security string handling. This behavior is for filling in fixed length fields in data structures. It isn't suitable for general purpose string processing.


#define strncpyz(d,s,l) *(strncpy(d,s,l)+(l))=0

Of course this one is unsafe for macro expansion. But well, its C :)


I'd rather put the final nul at d+l-1 than at d+l, so that l can be the size of d, not "one more than the size of d":

  strncpyz(buf,src,sizeof buf);


As others have already pointed out it, it doesn't guarantee that the result is null-terminated. But that's not the only problem! In addition, it always pads the remaining space with zeros:

    char buf[1000];
    strncpy(buf, "foo", sizeof(buf));
This writes 3 characters and 9997 zeros. It's probably not what you want 99% of the time.


It's not possible to use it safely unless you know that the source string fits in the destination buffer. Every strncpy must be followed by `dst[sizeof dst - 1] = 0`, and even if you do that you still have no idea if you truncated the source string, so you have to put in a further check.

    strncpy (dst, src, (sizeof dst) - 1);
    dst[(sizeof dst) - 1] = 0;
    int truncated = strlen (dst) - strlen (src);
Without the extra two lines after every strncpy, you're probably going to have a a hard to discover transient bug.


if you really want to use standard C string functions, use instead:

    int ret = snprintf(dst, sizeof dst, "%s", src);
    if (ret >= n || ret < 0)
    {
        /* failed */
    }
or as a function:

    bool ya_strcpy(const char* s, char* d, size_t n)
    {
        int cp = snprintf(d, n, "%s", s);
        bool ok = cp >= 0 && cp < n;
        ok ? *s = *s : 0;
        return ok;
    }


snprintf only returns negative if an "encoding error" occurs, which has to do with multi-byte characters.

I think for that to possibly happen, you have to be in a locale with some character encoding in effect and snprintf is asked to print some multi-byte sequence that is invalid for that encoding.

Thus, I suspect, if you don't call that "f...f...frob my C program" function known as setlocale, it will never happen.


> Thus, I suspect, if you don't call that "f...f...frob my C program" function known as setlocale, it will never happen.

Of all the footguns in a hosted C implementation, I believe setlocale (and locale in general) is so broken that even compilers and library developers can't workaround it to make it safe.

The only other unfixable C-standard footgun that comes close, I think, are the environment-reading-and-writing functions, but at least with those, worst-case is leaking a negligible amount of memory in normal usage, or using an old value even when a newer one is available.


I see that in Glibc, snprintf goes to the same general _IO_vsprintf function, which has various ominous -1 returns.

I don't think I see anything that looks like the detection of a conversion error, but rather other reasons. I would have to follow the code in detail to convince myself that glibc's snprintf cannot return -1 under some obscure conditions.

Defending against that value is probably wise.

As far as C locale goes, come on, the design was basically cemented in more or less its current form in 1989 ANSI C. What the hell did anyone know about internationalizing applications in 1989.


I actually do use `snprintf()` and friends.


except no one does that return code check and worse they often use the return code to advance a pointer in concatenated strings


`strncpy` is commonly misunderstood. It's name misleads people into thinking it's a safely-truncating version of `strcpy`. It's not.

I've seen a lot of code where people changed from `strcpy` to `strncpy` because they thought that was safety and security best practice. Even sometimes creating a new security vulnerability which wasn't there with `strcpy`.

`strncpy` does two unexpected things which lead to safety, security and performance issues, especially in large codebases where the destination buffers are passed to other code:

• `strncpy` does NOT zero-terminate the copied string if it limits the length.

Whatever is given the copied string in future is vulnerable to a buffer-read-overrun and junk characters appended to the string, unless the reader has specific knowledge of the buffer length and is strict about NOT treating it as a null-terminated string. That's unusual C, so it's rarely done correctly. It also doesn't show up in testing or normal use, if `strnlen` is "for safety" and nobody enters data that large.

• `strncpy` writes the entire destination buffer with zeros after the copied string.

Usually this isn't a safety and security problem, but it can be terrible for performace if large buffers are being used to ensure there's room for all likely input data.

I've seen these issues in large, commercial C code, with unfortunate effects:

The code had a security fault because under some circumstances, a password check would read characters after the end of a buffer due to lack of a zero-terminator, that authors over the years assumed would always be there.

A password change function could set the new password to something different than the user entered, so they couldn't login after.

The code was assumed to be "fast" because it was C, and avoided "slow" memory allocation and a string API when processing strings. It used preallocated char arrays all over the place to hold temporary strings and `strncpy` to "safely" copy. They were wrong: It would have run faster with a clean string API that did allocations (for multiple reasons, not just `strncpy`).

Those char arrays had the slight inconvenience of causing oddly mismatched string length limits in text fields all over the place. But it was worth it for performance, they thought. To avoid that being a real problem, buffers tended to be sized to be "larger" than any likely value, so buffer sizes like 256 or 1000, 10000 or other arbitrary lengths plucked at random depending on developer mood at the time, and mismatched between countless different places in the large codebase. `strncpy` was used to write to them.

Using `malloc`, or better a proper string object API, would have run much faster in real use, at the same time as being safer and cleaner code.

Even worse, sometimes strings would be appended in pieces, each time using `strncpy` with the remaining length of the destination buffer. That filled the destination with zeros repeatedly, for every few characters appended. Sometimes causing user-interactions that would take milliseconds if coded properly, to take minutes.

Ironically, even a slow scripting language like Python using ordinary string type would have probably run faster than the C application. (Also Python dictionaries would have been faster than the buggy C hash tables in that application which took O(n) lookup time, and SQLite database tables would have been faster, smaller and simpler than the slow and large C "optimised" data structures they used to store data).


It doesn't guarantee that the output is null terminated. Big source of exploits.


Can't read the pdf right now but I'm a big fan of property based testing.

One thing I find that people struggle with is coming up with "good properties" to test with.

That's the wrong way to think about it. The properties you want to test are the function's contract. Checking that contract is the goal.

You can be as specific with the contract as you want. The more specific the more bugs you'll find.

Property-based tests are just one way to check the contract, as are hand-written unit tests. You could use a static analyzer or a model checker as well, they're all different approaches to do the same thing.

EDIT: by contract I mean the guarantees the function imposes on its output. A contract for a sorting function could be as simple as the length of the output being the same as the input. That's one property. Another is that every element in the output is also in the input. You can go all the way and say that for every element at index i, the element at index i+1 (if any) is larger.

But you don't need a perfect contract to start with nor to end with. You can add more guarantees/properties as you wish. The more specific, the better (but also slower) the tests.


Already better than GTK4


https://github.com/B00merang-Project/Mac-OS-9 includes a GTK+4 theme. If you like the "High Contrast" look, check out https://github.com/B00merang-Project/System-4 too.


Very apt username


You are very likely right. I can't find sources for this. Sad to see.


All great features but C# has had most of those for a very long time. Java really stagnated for awhile. Glad to see that's no longer the case.


Rust's Arc is like ARC, this is more like a generational pool of Box<dyn Any>.


It's 30x faster than 50 step SD with the same quality, unlike LCMs which can result in substantially lower quality. The actual page of the work shows the difference.


Transmission only uses GTK on linux.


There is a Qt and CLI client, too.


And a web UI! (My preferred method as I run it on my home server)


It would still be worthwhile to greatly reduce the number of vulnerabilities coming out of new C and C++ code, which are likely to be with us for a long time still. At the very least as updates/fixes to existing codebases.


Yes, no doubt - reducing the number of vulnerabilities is a good thing. What I'm worried about is that they merely reduce the number of CVEs, and call it a win for their safety initiative. It becomes a PR exercise more than technological improvement.


Well, pay attention and hold them accountable.

But Microsoft (for instance) certainly has incentives to avoid being the next Boeing or Volkswagen with respect to being excellent box checkers that end up missing the mark on the outcomes those checkboxes are supposed to protect against. It doesn't matter if C and C++ have fewer CVEs as such if Microsoft tools and platforms gain a reputation as being insecure or unsafe.


I saddens me that smart people like Herb and Bjarne have good ideas on how to make C++ a much safer language but the actual output of the standard's committee is so far behind.

Herb mentions std::span as as safety improvement but the [] operator doesn't do bounds checks and the .at() method isn't even there yet!

Shameless plug but I discuss this issue on my blog: https://btmc.substack.com/p/memory-unsafety-is-an-attitude-p...

Herb and Bjarne have the right attitude, sadly it doesn't seem enough C++ devs or people in the standards committee do. Same applies to C.


At $work, the standard solution to ASAN reporting use-after-free issues is to... not run ASAN builds. The fact that builds in CI exhibit random inexplicable crashes regularly every week doesn’t seem to make anyone have any second thoughts. A colleague once claimed that there is nothing wrong with out-of-bounds accesses as long as we don’t crash. The same bunch is also religiously opposed to using debuggers and regularly break debugger builds by removing debug information from builds at multiple levels, blocking ptrace debugging in system images through sysctl... This is all so toxic.


> A colleague once claimed that there is nothing wrong with out-of-bounds accesses as long as we don’t crash.

I need to find the source, but someone pointed out that the safety advantages of Rust are, in part, cultural and I increasingly agree. People use Rust because they care about memory safety and that care is reflected in the programs they write.


I keep getting code reviews with manual new/delete calls despite unique_ptr being 11 years old.

weirdly I see this most often from programmers who started college less than 11 years ago. Our accidemics are not helping.


The optimist in me would like to delude themselves thinking that most of the people smart/experienced enough to make the jump to unique_ptr from new/delete realized this is closing a porthole on the titanic and made the jump to something that isn’t C++.


I am not sure. C++ is a tool. I use what my company and companies in my domain use. I wouldn't mind using Rust, but there's just very little momentum. So meanwhile I do my best with what we have.

Personally, I care more about what I do than which tool I am using.


> I wouldn't mind using Rust

Rust is not the only thing that “isn’t C++”. Go is not appropriate for every domain either but you can bet your bottom dollar that it has taken market share from C++ - which I think the world is on the overall balance better off for, and I don’t particularly like Go as a language. Someone is making gobs of money off of OCaml.

> I care more about what I do than which tool I am using.

I don’t agree with the implication that these are independent factors.

And I do get it. The Rust ecosystem founders in many areas and the RIIR meme crew on forums is annoying. That doesn’t forgive the failings of the C++ ecosystem.


We have a lot of existing c++. I still cannot figure out how to mix with anything else. Nothing else wants to deal with vector for instance.


My personal opinion is that if companies would get real punishments for all of these goddamn security breaches, and the CTO's head was on the line, you'd see a shift in attitude real quick.


This in fact happened to Cloudflare with Cloudbleed and their decision was to switch to Rust.

There is a human factor at work here. Rationally speaking, Herb is right 98% reduction is sufficient. But when the CTO's head is on the line, they won't listen and switch to Rust.


Not all pieces of software are created equal. A desktop CAD application that doesn't do any networking and doesn't manipulate sensitive user data isn't worthy of binary exploitation. If there is adequate security at the system OS layer, at worst it will corrupt a user's file.

Infrastructure network code that runs on millions of servers worldwide is a completely different story. Being able to send a sequence of bytes that unlocks funny side-effects of a network service can be weaponised on a mass scale.


> Not all pieces of software are created equal. A desktop CAD application that doesn't do any networking and doesn't manipulate sensitive user data isn't worthy of binary exploitation. If there is adequate security at the system OS layer, at worst it will corrupt a user's file.

That software is almost certainly running on a network-connected machine though and likely has email access etc.. A spear-phising attack with a CAD file that contains an RCE exploit would be an excellent way to compromise that user and machine leading to attacks like industrial espionage, ransomwear, etc...


If you've fallen victim to phishing you're hosed anyway as a malicious process can read and write to the address space of another process, see /proc/$pid/mem, WriteProcessMemory(), etc.


There's a spread of things that can happen in phishing; I would expect that it's a lot harder to get a user to run an actual executable outright than to open a "data" file that makes a trusted application become malicious.


In order to read or write /proc/pid/mem your process needs to be allowed to ptrace() the target process. You can’t do that for arbitrary processes. Similar story for WriteProcessMemory().


Above your security context, no, but you can definitely WriteProcessMemory any other process that is in your same security context or lower (something similar holds for ptrace, although remember that SUID/SGID binaries are running not at a same security context)


Those are increasingly rare. Nowadays you have all these apps requiring subscriptions and expecting users to login and what not.

But I agree it depends heavily on exactly what application are we talking about. Is it running on server? Definitely needs to be security conscious. Is it a library that might at some point be used by an application running on a server? Needs to be more hardened than a nuclear bunker.


They are all pieces to a puzzle. If you can add or modify a CAD file to a location a user of the desktop software will access, a defect in file parsing could provide user level remote access to you. Then if any other application or library has a privilege escalation, you have rooted the box. And even if there is no privilege escalation on that box, how many more CAD files can that user modify to spread the remote access?


You're assuming they have security breaches due to C++.

I'm betting here they don't; if they have security breaches it's due to '1q2w3e' being the password to their world-accessible PHP admin panel, and not because of C++ code.


Using C++ doesn’t mean you must have security issues. It means that you have to do more things right in your other work to avoid them, and we have several decades of experience showing that even very good teams struggle with that. The more separate concerns teams need to manage, the more likely it is that someone will make a mistake with one of them – and since time is finite, the attention spent on memory management is taking away time which could be spent on other classes of error.


For every 1 security breach due to C++ memory management, there are at least 100000 due to shitty PHP code that doesn't escape strings or uses plaintext passwords that never change. (This is a conservative estimate.)


Can you cite your sources on that analysis? Be sure to include the relative affected numbers so we don’t count an exploit in Chrome the same as a PHP exploit affecting a dozen people using someone’s obscure WordPress plugin.

Another way of thinking about this, why are all of the browser teams who have some of the best C++ developers in the world and significant resources adopting memory-safe languages? Nobody at that level is doing that because it’s cool, so there might be something to be learned from their reasoning.


> why are all of the browser teams who have some of the best C++ developers in the world and significant resources adopting memory-safe languages?

They aren't. Even Mozilla abandonded their Rust-in-Firefox project.


PHP (the language) has long since moved past awful practices like that, and we can definitely tell people to stop doing that and use the provided safe alternatives instead. In fact, the PHP docs do just that. PHP is no longer to blame here.

Also that number is greatly exaggerated. It's simply not true anymore, check the CVE website if you don't believe me.


Here is Dennis Ritchie proposal for fat pointers in C.

https://www.bell-labs.com/usr/dmr/www/vararray.pdf

It is a culture thing, eventually the authors don't have any last words to say, if they let the community rule the language design, and their voice is equally one vote.


> This is a version of a paper published in Journal of C Language Translation, vol 2 number 2, September 1990

This says it all really. Nothing more needs to be said. Unfortunate.


I haven't read the linked paper, but both CPU speed and RAM available have increased about 100x since 1990, and nobody then had uttered the words "threat model". Some approaches that are sensible now were reasonable to overlook in 1990 for being too heavy.


Check when Morris worm came out.

And by the way,

"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

-- C.A.R Hoare's "The 1980 ACM Turing Award Lecture"


The Morris worm affected around 2000 VAX machines a couple of years previously, and was the first ever such incident on that scale. In other words, almost nobody in 1990 had been affected by a computer security incident. It didn't make sense in 1990 to prioritise this security threat over efficiency concerns.

Insisting on memory safety back then would be like insisting on code being accompanied by checkable formal proofs of correctness now: It's a technique that can be applied right now and that does improve safety, but it comes at such a cost that the tradeoff only makes sense for a handful of niche applications (aerospace, automotive, medical devices).


Yeah, that is why we didn't had to buy anti-virus software, duh.


Viruses in 1990 propagated by people running .EXE files they copied from somewhere, or booting floppy disks they found somewhere.

Tell me how bounds checks on array accesses would have prevented that.



> 01 JUN 2004

Got anything relevant?


Yes but the same story kept repeating over the years. C89 had a good excuse. C99 was iffy with the VLA stuff instead of proper slices. What excuse did C11 have?


> but the actual output of the standard's committee is so far behind.

That criticism misunderstands the actual way the C++ committee works. They're not a supreme legislative group that can dictate what a new C++ should be and everybody else is required to just obey. In contrast, Apple can dictate what the next version Swift will do. Microsoft can do the same with C#.

Instead, what happens in C++ is somebody or company makes a proposal with some concrete working code for others to evaluate and vote "yes" on. So the reality means one of the teams from MSVC, gcc, clang, Intel C++, etc have to take the lead with a Safe C++ alternative that convinces everybody else to implement.

To Herb Sutter's credit, he did make a "C++2" implementation: https://github.com/hsutter/cppfront

But his side project at Microsoft didn't gain traction with gcc, clang, etc and everybody else in the industry. So at this point, the C++ committee will be perceived as "so far behind" ... because there's nothing for them to vote on.

Similar situation happened with "breaking ABI compatibility". Google wants to break ABI but others didn't.


And that is why while C++26 is being discussed, C++20 modules are still full of warts, working on Visual C++ (kind of), and not really anywhere else, as ISO C++ has long stop being about standardizing stuff with actual field experience.


>And that is why [...] C++20 modules are still full of warts, working on Visual C++ (kind of), and not really anywhere else, as ISO C++ has long stop being about standardizing stuff with actual field experience.

Yes, but your complaint about flawed C++ standards or incomplete implementations is orthogonal to what I was writing about.

I'm just explaining that the C++ committee doesn't have the power to impose changes that some people think it does. Basically, I'm saying "the sky is blue" type of statement. The C++ committee is a reflection of what the members' compiler teams want to collectively do. The committee doesn't have unilateral supreme power to dictate standards to compiler teams not interested in it. Instead, they collect feedback from people sharing proposals in papers and put things to a vote. The compilers' teams have the real power and not the committee. (What I've stated in this paragraph should be uncontroversial facts but the downvoters disagree so I'd love to hear them explain exactly what power and agency they think the C++ committee actually has.)

If one understand the above situation, then the following different situations shouldn't have any mystery as to cause & effect:

- How did the std::chrono get added to the standard library? Because Howard Hinnant made a proposal with working code, and others liked it enough to vote it in.

- Why is there's no _standard_ cross-platform GUI and audio library in C++? Why is there not standardized Safe Syntax C++ like Rust? Because nobody has made a proposal that convinced enough of the other members to agree to a x-platform GUI and audio framework.

- Why does C++ committee adds features and prioritizes things I don't care about? Because the <$feature$> you cared about wasn't proposed for them to discuss and convince others enough to vote it in.

But yes I do understand the "warts" complaint you're talking about. It's frustrating that the std:regex with bad performance got approved. In similar examples, N Josuttis in multiple conference videos has complained about the problems with ranges and views. He says it was wrong for the C++ committee to approve it. (footnote: The original author of the proposal tried to explain his technical rationale: https://old.reddit.com/r/cpp/comments/zq2xdi/are_there_likel... )

To reiterate, I'm not trying to explain away bad language standards. New features that will have flaws will continue to happen in the future whether it's created by a singular corporation like Apple(Swift) or a cooperative group like the C++ committee.

I'm just explaining why some "wishlist desirable C++ feature" isn't going to be in the standard if there's no proposal that convinces the other members to vote it in.

EDIT to reply: >When we complain about the "committee" [...] the things they choose to propose and vote for.

The C++ committee members are not static but the webpage has list of names : https://isocpp.org/wiki/faq/wg21

Clicking on various PnnnnR.pdf proposals that motivated each feature in the conformance table shows most authors are not from the actual committee members: https://en.cppreference.com/w/cpp/compiler_support

Using the above workflow to address your complaint about std::span and at(), I found this comment from the original author Tristan Brindle who proposed it and why he thinks the committee voted no:

2019-10-18T22:55:30z https://old.reddit.com/r/cpp/comments/djqdu2/why_is_stdspan_...


I can't edit anymore so sending a second reply.

That reddit link is actually showing the problem to be worse. It's not that someone forgot, it's that the committee are absolute goddamn clowns.

Incredible.

Since the committee is a reflection of the larger C++ community, it's not even a case of a few bad apples spoiling the bunch, it's more like there are a few really good apples that are being bombarded with fungal spores on a daily basis by the rest.

Their justification for not having .at() makes absolutely no sense! Contracts, had they made it in, would have been for fixing []. Since that didn't happen, .at() was pretty much mandatory to have (and the clowns are adding it in C++26).

Severe attitude problem.


When we complain about the "committee" we're not complaining about some amorphous entity but rather the people that make it up and the things they choose to propose and vote for.


Not a C+++?


Even if not mandated by the standard, concrete standard library implementations do provide bound checking on span (and vector, and optional, etc.), but, even when meant for production use, are disabled by default.

And I don't see a big push in the community to enable them. I think the committee is just an expression of the community on this front.


> Even if not mandated by the standard, concrete standard library implementations do provide bound checking on span (and vector, and optional, etc.), but, even when meant for production use, are disabled by default.

That's a choice though. You can enable these in your production builds if you want (with libstdc++ at least) and some Linux distributions have chosen to do just that.

The thing though is that these checks are NOT free and the overhead is not justified for all use cases so forcing them on everyone is not appropriate for C++.


> The thing though is that these checks are NOT free and the overhead is not justified for all use cases so forcing them on everyone is not appropriate for C++.

Well, that's why they should be a flag. The question is whether it should be enabled by default or not.


It should be enabled by default, and if you want to index without bounds checking you should have to write something like a.unsafe_at(i)


> Herb mentions std::span as as safety improvement but the [] operator doesn't do bounds checks and the .at() method isn't even there yet!

You mean this implementation? https://en.cppreference.com/w/cpp/container/span/at

To quote: "Returns a reference to the element at specified location pos, with bounds checking.

If pos is not within the range of the span, an exception of type std::out_of_range is thrown."


You missed the Std column => (C++26)


> ...the actual output of the standard's committee is so far behind

The committee just publishes documents. It is actually far ahead of C++ implementations.

The committee would probably move faster if there were more attention (and funds and volunteer work) spent on advancing C++ implementations. This especially seems true for safety and security concerns as they tend to have more tangible problems to solve than the other kinds of standards proposals.


You're not far ahead if you're running in the wrong direction.

The standard is prioritizing the wrong things. It's normal that implementations are struggling when they need to implement something as complicated as C++ modules for example. There's no excuse for the .at() method being missing from std::span.

On the C side of things the problem is more egregious, it took over 30 years to standardize typeof after every compiler ever had already implemented something of the sort. GCC's __attribute__((cleanup)) should have definitely been standardized ages ago with so many libraries and codebases relying on it.

What does the C standard give us instead? _Generic. It's just silly at this point.


> There's no excuse for the .at() method being missing from std::span.

The issue is that there are two camps. One believes that precondition failures should not be recoverable and should abort the application and thus think that 'at' is an abomination. The other believes that throwing exceptions on the face of precondition failure is appropriate.

Hence what goes into the standard depends on how many people on each camp are present at each committee vote. This is also one of the reasons why the contracts proposal is not yet in the standard.

On a more practical side, .at does not help in any way to bound check the hundreds of billions of existing lines of C++.


std::span came out in C++20, by that logic neither did it help in any way...

Personally operator [] should abort by default because otherwise it is redundant with .at()


Of course aborting in span::operator[] wouldn't be enough. But bound checking in operator[] for vector, deque, std::array and normal arrays would help (I think it is infeasible to do it for arbitrary pointers).


> (I think it is infeasible to do it for arbitrary pointers)

I think there's a viable path that could solve it well enough for a safe compiler mode to be feasibly mandated in secure coding standards.

Pointer values can come from one of several operations: pointer offsetting, int-to-pointer, address of known object (including especially array-to-pointer decay), uninitialized values, function parameters, struct members, and globals. Safe bounds information is automatic for uninitialized values and addresses of known objects, and pointer offsetting can trivially propagate known bounds. If you had annotations ("the size of this array may be found in variable X"), you could usually get pretty reliably information for the last three categories.

The only truly difficult case is int-to-pointer, but from what I've seen in other contexts, it's likely that int-to-pointer in general is just an inherently cursed operation that auto-safe code just shouldn't support.


Well, the point is safety for existing code. If you can annotate pointers to match them with their bound you can as easily replace them with a span and avoid needing compiler heroics.

Edit: unless you absolutely need the change to be ABI stable, but even then there are ways around that.


> Shameless plug but I discuss this issue on my blog:

> “First make them care, then make it easy for them to do the right thing.”

I would say first make it easy to do the right thing, making them care will be an Eternal September.


Sadly even if you make it easy to do the right thing, without the right attitude to match it matters little. Some people still concatenate unsanitized input with raw SQL strings despite the abundance of libraries that make creating safe queries easier.

At this point I think heads need to roll before people take the problem seriously.


I wasn’t clear: we do agree that attitude is the crucial piece, I just disagreed on the order in which it should be done.

I think that implementing in compilers the mechanisms for doing the right thing can be done first, and relatively quickly. Which now that I think more about it, would need the right attitude on the part of the compiler vendors and standards committee.


Being able to use a foreach loops is a decent improvement.

And since you now have a proper access API, you could on theory enable bounds checks via compiler flag, even for operator[].


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: