Hacker News new | past | comments | ask | show | jobs | submit login

>> I cannot consistently write safe C/C++ code.

I think it's going to depend on your attack surface. Could I sit down and write a web-facing server from scratch and have a reasonable expectation of safety? Probably not. Can I write a piece of firmware for an embedded device with no internet access and a very limited serial protocol, and have a reasonable expectation of that being 'safe'? That seems more likely, particularly if I follow good practice and use available analysis tools where possible.

I think the biggest problem with these languages is that when something goes wrong (as it so often does), the keys to the kingdom are up for grabs as the whole of everything can suddenly be manipulated, to read out arbitrary memory (e.g. heartbleed)




It almost goes both ways. Yes, being on an embedded device with no Internet access reduces your external attack surface. But it also limits your ability to use lots of newer language features designed to improve safety, because those features often cost CPU cycles or memory.

My (completely uninformed) guess is that most of any net benefit from being on an embedded system comes from the simple fact that "embedded" implies "constrained", which implies "less code," which, in turn, implies "fewer opportunities to create bugs in the first place."


I don’t think that you are limited regarding newer language features when you develop for embedded devices or limited in the use of more safety focused languages.

E.g. Ada, despite being in the older age for computer languages was specifically designed with embedded programming and safety features in mind and does this quite good.

It also supports OOP if you want that sort of thing and more crucially has pretty good concurrency support build into the language while still being reasonably safe. With SPARK/Ada you even get a language variant that can be formally proven. To name some more modern use cases.


But this wasn't talking about using other languages with more of a safety focus. It was talking about using newer features of C++ that are meant to support safer programming. Most of these come in the form of abstractions with non-zero cost.


OK, but...

I see three options. 1) Use the non-zero-cost abstraction to do X safely. 2) Write the equivalent abstraction to do X safely, which will cost you the time to write and will probably be buggier than C++'s version. 3) Don't do X.

The fourth option - do X but in an unsafe way - should not be on the table for a professional developer.

Now, C++'s philosophy is that you can't write your own thing to do X more efficiently than C++'s builtin version. They may not perfectly achieve that in all cases, but they come fairly close. That plus bugs means that I won't reach for option 2 unless I have really good reason to think I can do better.

That leaves using the builtins to do X safely, or not doing X. At least that's how it looks to me...


It certainly plays into it - fewer moving parts means less opportunity for things going wrong.


>Could I sit down and write a web-facing server from scratch and have a reasonable expectation of safety? Probably not.

This is very suprising to me that people think this. I am not a fantastic programmer, but I don't see writing safe code diffucly at all, you just have to understand the basics how attacks work, and follow some basic rules when writing code.

In order for a piece of software to be exploitable, you need to have an input channel that is not isolated from the functionality- something like taking the parameters in the url and passing it directly to a shell is a massive failure in isolation.

As long as you do proper input sanitation, where the input must have certain format, and certain length for each part, and certain formatting for each part, the attack surface goes down to an absolute minimum. For example, for all the C/C++ code that I have seen that deals with web, only once I remember seeing a check on characters in the passed data being in the set of [0x10, 0x13, 0x20-0x7e], which should be the first check performed on any HTTP data passed to it, even before you start invalidating HTTP characters.

As far as memory safety goes, never rely on null terminated strings, allways copy the exact amount of data to a statically defined max, and make the buffer that holds data have the length max+1 with an extra byte that is allways set to null for every copy into the buffer. For malloc/free, design your code so that malloc and free are in the same function at the beggining and end, and allways run memcheck and valgrind before deploying to production.

Between all the other mechanisms in place, like non executable stack/heap, stack canaries, ASLR, and the tools like memcheck and valgrind, the possibility of you making an exploitable software is as low as coding it in a "proven safe" language (if such a thing exists) and messing up some parser input that can lead to unexpected behavior. And of course, there is still the possibility of using a library that has an exploit, or coding in an exploit in higher level logic, but that is a danger for any language.

I would bet that if taking an exploitation class for CS degrees was a hard requirement, where people understood the concept of stack, heap, and how the different types of exploits work, the outlook on this would change. I also wonder if teaching people C++ instead of C, and relying on built in memory management mechanisms to build software instead of making them manually write all the allocations by hand has an effect of a less understanding on what goes on under the hood.


> In order for a piece of software to be exploitable, you need to have an input channel that is not isolated from the functionality

And prove that your input validation is correct for every possible value, and that under stress the system doesn't start doing something weird, and there are no possible integer overflows that might cause something funny to happen and ...

> For example, for all the C/C++ code that I have seen that deals with web, only once I remember seeing a check on characters in the passed data being in the set of [0x10, 0x13, 0x20-0x7e], which should be the first check performed on any HTTP data passed to it

Unless it's unicode of course, which it really should be these days if we're talking about the web, and when really you want a unicode parsing library to validate, and then you're reliant on that having no exploits, and anything of any significant complexity usually turns out to have some somewhere.

> "Between all the other mechanisms in place, like non executable stack/heap, stack canaries, ASLR, and the tools like memcheck and valgrind, the possibility of you making an exploitable software is as low as coding it in a "proven safe" language"

This is provably false though, and that's part of the point of the article, even well looked-after software written by experienced people with an eye on security suffers from problems, and with unsafe languages there are classes of attack that are just not possible with safer languages. These keep on happening regardless of the apparent skill level of the practitioner. The author has been doing it for 25 years and doesn't think they could do it reliably. I was a C programmer for 15 and I agree, past a certain level of complexity it looks like there just are going to be errors of this sort somewhere in your code.

Sure, programs written in 'safe' languages are not bug-free or exploit-free, but for public-facing, networked code they do seem to be better as they eliminate a huge number of damaging attack types straight off.


>And prove that your input validation is correct for every possible value

You don't need to check against every single value, you just need to check formatting, length, and data ranges. For your integer overflow example, you made 2 mistakes already, first is that anything that is expected to be positive should be unsigned integer and secondly, when you parse the string, you fail to check against max integer value.

>Unless it's unicode of course

Then validate unicode, lol. All this stuff has defined rules. Furthermore you design your code not to support unicode, and then someone inputs a name in non ASCII, the request will be rejected, which is a much better issue to have security wise rather than blindly accepting any and all input without validation.

>This is provably false though

Based on what, an opinion of some programmers? Thats not really "proving" anything. If you want to make a statement that most programmers don't understand things like why you should never rely on strlen, Id probably agree.

But Im not really concerned with distribution of skill within the modern programmers, I against the assumption that "you should not write stuff in C because its very hard to write safe C code".


> You don't need to check against every single value

I said prove for every value, not check every value.

> first is that anything that is expected to be positive should be unsigned integer and secondly, when you parse the string, you fail to check against max integer value.

I wasn't necessarily confining myself to input validation when talking about integer overflow, there are other places and ways it can be caused, and other vulnerabilities in systems than pure invalid input. Maybe I pass perfectly valid input that causes a small bug somewhere deep inside your system to behave weirdly. The point is it's not as simple as "I validate my input and everything's fine".

> Then validate unicode, lol.

That's the point I was making there, you can't just validate ascii these days, and validating unicode's not as trivial as you're making out, plus you now have a unicode parser which may contain any of these problems too.

> Furthermore you design your code not to support unicode

I think your ideas are about two decades out of date here. There's more to the world than ascii and there has been for a long time.

> Based on what, an opinion of some programmers?

Based on where exploits happen and how bad they are.

> But Im not really concerned with distribution of skill within the modern programmers, Im more interested in the assumption that it is hard to write safe C code in comparison to other languages,

Do you not see the inherent contradiction in the way you've stated that? "It doesn't matter if most people can't do it, that doesn't make it difficult"

Plus, you've told us yourself, there are a ton of extra tools and techniques needed to even attempt to make 'safe' code in C compared to other languages where these classes of errors are just impossible by design. Does this not say "harder to make safe" to you?

> and learning that is beyond reach of most people.

The point in the article is that it's quite likely beyond basically everyone, and this is the conclusion a C programmer has come to after 25 years of bitter experience.


Im really starting to hate the internet more and more every day.

You say validating unicode is not trivial, and mention using a parser which can contain bugs.

Or, you can just look at the specification like this one:

https://docs.oracle.com/cd/E18283_01/server.112/e10729/appun...

And easily write your own validator to check against valid byte ranges.

Ill just leave this conversation with "Agree to Disagree".


The point is it's not as trivial as you say, and there's more to safety than pure input validation.

The author's point was that even people who think they're doing it right don't catch everything. If you think you do then I wish you the best of luck, but I also wouldn't want to work on safety-critical systems with you.


> As far as memory safety goes, never rely on null terminated strings, allways copy the exact amount of data to a statically defined max, and make the buffer that holds data have the length max+1 with an extra byte that is allways set to null for every copy into the buffer. For malloc/free, design your code so that malloc and free are in the same function at the beggining and end, and allways run memcheck and valgrind before deploying to production.

Aren’t you just proving the author’s point here? Even your fairly simple heuristic that works in most cases is easy to screw up due to a typo / brain fart, and it’s quite likely that even experienced developers will write code that memcheck/valgrind complains about (which is why the tools exist!)

So writing memory-safe code in C/C++ is brittle and not easy to do consistently without additional tools and checks. The author’s point wasn’t that it’s impossible or conceptually deep, but that it’s tricky and painstaking work which is easy to mess up.


>Even your fairly simple heuristic that works in most cases is easy to screw up due to a typo / brain fart,

This argument can be made for "safer" languages just as easily. Most of the web exploits that exist, like request smuggling, parser abuse, or generic exploit involving a sequence of API calls, are all due to brain farts. At a certain level you have to expect some competency.

My main argument is that the competency for writing C code isn't that much higher than for other languages.


> My main argument is that the competency for writing C code isn't that much higher than for other languages.

I think that goes against observed reality, and the consequences of these errors are worse in unsafe languages. Look at heartbleed - a bounds-checked language would not have allowed that problem to be anything more than an exception, instead process memory was open for reading.


Easy counterexample: most of the computers worldwide run Linux kernel, which is written in C, and kernel exploits, especially those that are accessible from the web, are significantly rarer compared to the higher level exploits in CRUD APIs or LAMP stacks that work on the http protocol level and the authentication state machines.

This argument doesn't make sense. Yes, programmers make mistakes. Yes, they shouldn't make those mistakes. My point is that its not difficult to write safe C code if you follow some basic rules.


> Easy counterexample:

The linux kernel is not really a counterexample here, and neither would I call the decades of security hardening that's gone into it "easy"...

> Yes, programmers make mistakes.

And in some languages those mistakes have far worse consequences than others, and more types of mistakes are possible.

> My point is that its not difficult to write safe C code if you follow some basic rules.

Except it is more difficult, for a start because you have more rules and tools needed to make it safe, if that's even possible. As you yourself have said.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: