Hacker News new | past | comments | ask | show | jobs | submit login
C: A Technological Landmine (expandrive.com)
30 points by jonshea on July 31, 2009 | hide | past | favorite | 57 comments



Ugh, this is close to being a troll post. Yes C has its weaknesses and domains appropriate to it's use. But sentences like

"Lacking a strong and expressive type system, C not only permits but encourages its programmers to sacrifice correctness, safety, robustness, testability, and maintainability in favor of some highly underdeveloped and ill-measured ideas about “performance”. Much of the infrastructure of the Internet is built out of this garbage."

and especially words like "garbage" only exposes the author as someone who doesn't know what he is writing about. (ok i could have used the shorter word "fool" here, but ..).

The "infrastructure of the internet" (including the underlying operating systems) is one of the domains in which C shines.

There is good reason that even today, large chunks of "infrastructure" code is written in C/C++.

"anybody who considers C for high-level application development at this point in history, is in a grievous state of sin"

With "high level" being conveniently undefined and without any examples, that statement means next to nothing.

What a terrible, ill thought out article.


Indeed, large swathes of the Internet are written in C.

Large swathes of the internet have also had buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow after buffer overflow... for decades.

At some point you have to stop blaming "lazy" programmers and start pointing fingers at the language that let the same bug through thousands upon thousands of times; a bug whose term almost has no meaning in other languages. How do you "buffer overflow" Java or O'Caml? (It isn't quite impossible, necessarily, but it's multiple orders of magnitude less likely.)

A safe variant of C wouldn't have been that hard to create, but the culture wouldn't permit it. So... I also point fingers at the culture that has valued performance over correctness... not just highly, not just valued performance highly along with correctness, but has valued performance over correctness.

Anyway, the whole "the internet is written in C" is a very weak argument in C's favor; it certainly fit a culture and can't be argued to be totally impractical, but a lot of the sharp edges aren't actually necessary, even with performance as a primary goal, and those sharp edges have claimed far too many victims. I totally respect the original authors, but it should have been incrementally improved into something much safer, instead of freezing in its 1970s level of development.


> A safe variant of C wouldn't have been that hard to create, but the culture wouldn't permit it.

C is a non-language. It's a tool used to give developers complete control of the hardware. It's just one step up from Assembler (to make the developers job a little bit easier).

Adding type checking, buffer overflow mitigation, or any other extraneous domain-specific feature is completely outside scope of the language. If you need those features, write a language, dialect, or compiler extension which supports them.


Hi, culture! There you are!

Anyhow, I seriously object to calling "buffer overflow mitigation" a "domain specific feature" since there's hardly a domain that hasn't been hammered by it being missing.

Furthermore, obviously the C that is not C is not the true C. I'm not trying to play semantics. My point is that "not the true C" should have been written decades ago and completely replaced C swiftly. There's an old saying about how you can make any program run quickly if there's no requirement for it to be correct. I don't care how small your processor is or how special your embeddedness makes you; ample evidence says that letting input data segfault your or worse, get executed as code, is a real and pressing problem, vs. the almost-always hypothetical problems of a little bit of overhead when dealing with buffers (to do it properly, no less). Start with correct code.

By the way, as I use this word "correct" I'm feeling a bit like an academic wonk, but bear in mind that I'm not talking "provable" or anything. I'm talking about making tens of thousands of critical security bugs over the past several decades go away, with programmers actually spending less time on bugs, too. It's not exactly a hypothetical consideration.


> Anyhow, I seriously object to calling "buffer overflow mitigation" a "domain specific feature" since there's hardly a domain that hasn't been hammered by it being missing.

Numeric computing, for one.




My preference would have been a switch which would enable things like boundschecks on arrays.


Michael taught both the SICP and the Software Design and Implementation classes at an ivy league school for several years. I’m fairly certain that he knows what he’s talking about.

I’ll admit that the post was a little ranty. It was quickly reworked from an internal email conversation, and it’s not intended to be a taken as if it came from a conference proceeding.

With that in mind, do you deny that C/C++ “sacrifice correctness, safety, robustness, testability, and maintainability in favor of… ‘performance’”?


C is advanced assembler. It is an absolutely great language for system level stuff, especially because everything is explicit, no relying on side effects or stuff hidden from view.

The thread of execution is extremely easy to follow.

The only thing I would change if we could revisit the past is that I would add a string primitive to the language with a half decent set of string operators. That would have made my life a lot easier at some point in the past.

The funny thing is that most languages that people use that criticize 'C' are usually at the core levels written in C.

There is probably a good case to be made for the claim that Unix would not exist if it weren't for the C language.


> The funny thing is that most languages that people use that criticize 'C' are usually at the core levels written in C.

The difference is that the people using these higher level languages only depend on a single set of maintainers who need to get the primitives right once instead of every random coder needing to manage buffers, garbage collecting unused memory, threading, and a host of other landmines on each and every project.


I think those 'random coders' would grow up quite rapidly if they had to learn how to program for real.

It's the high level language cruft that is a prime source of all the endless layering of glue on top of other glue that we're stuck with.

Less scripting, more binaries.

I cringe every time I install some minor system level package and I have to include perl, awk, php and python or some other combination of stuff.


Funnily enough the proliferation of those scripting languages has made it easier to get a picture of what's going on if you do get rooted by the consequences of running a flakey C app. In the old days the md5 binary would be broken and you were pretty much screwed if the other guy was good. But these days you can hammer out manual md5 algorithms and do other stuff using perl, python, and anything else at your disposal - things that are deployed by default on contemporary unix distributions. Better still, if you've got a book on unix systems calls you can absolutely fly through troubleshooting techniques that weren't practical before. (Sure the bad guys could just delete your VMs, but then that is a clear sign that the system has had it - and that signal is useful in and of itself - one of the difficult things in a situation like that is making a call on how far gone your system is so you can set short-term priorities in the lead up to rebuilding everything)


Good thing those coders wading around what you seem to consider the shallow end of the coding pool are not writing things like our nameservers, web servers, operating systems, or anything else important. Oh, that's right, they are. And they keep fucking up. Repeatedly.

When this keeps happening over and over again it is time to consider the possibility that the problem isn't the craftsman but the tool.

The advantage of using lots and lots of glue languages is that it forces the important bits to be loosely coupled, adds flexibility to the development and deployment process, and leads to systems that are easier to comprehend and reason about. The only things that should be opaque binaries are system libraries and VMs/runtimes. Everything else should be "glue".


Given the number of lines of C code out there the number of fuckups is a lot less than you'd expect. Because C is usually used in gatekeeper situations (operating systems, compilers, servers, network stacks) when there is a breach it is serious.

I find it hard to conceive of a posix compliant OS written in a higher level language, especially because of the lack of deterministic behaviour when handling interrupts and allocating memory. C is mind numbingly simple at that level which is exactly why it is used in these situations.

But every language makes it possible to write insecure code, C has its own unique challenges:

It isn't that long ago that somebody managed to get an exploitable scenario out of UTF-8, it took me a long time looking at the code to see how it was even possible. In a higher level language that sort of thing is more difficult to achieve, that's for sure.


The funny thing is that most languages that people use that criticize 'C' are usually at the core levels written in C. There is probably a good case to be made for the claim that Unix would not exist if it weren't for the C language.

Check out Oberon (both the language(s) and the operating system(s)) for a look at an alternate universe in which C doesn't exist and Unix never happened. And people are happy. And anything that can be done in C can be done. Although some mistakes are significantly more difficult to commit.


You and I must use different definitions for “side effects or stuff hidden from view” if you think those are things you avoid by programming in C.


The only really 'hard' thing I ever did in C was write an OS with interrupt handlers that passed control to user space.

Debugging that took me more time than I care to remember, I hung that box so often that I hooked the reset switch up to a foot pedal :)

Other than that C has been pretty good to me in terms of showing where the bugs are. Post mortem debugging would usually give you the location of your crash and that plus a little bit of deduction will tell you what you're doing wrong.

Memory overwrites (and stack overwrites) are things that belong to a certain phase of coding where you are still learning the ropes and you have not yet mastered the idiomatic solutions that people will use to avoid such problems. Setjmp and longjmp are also an excellent source of hard to find bugs, as well as multi-threading (the multi-threading problems are not unique to C, some languages are simply better at doing that sort of thing, occam for instance).

I've heard C compared to a race car, it's great as long as you stay between the lines, take a turn too fast and you're heading straight for trouble. Experience (as in bumping your head a couple of times) will make sure that you know how far you can push it.

C is like any other powertool (think chainsaw or plasmacutter), you can easily get hurt but once you've mastered the tool it is quite amazing what you can do with it.

I got in to C after a long period of writing almost nothing but assembly language (x86, 68K and a bunch of 8 bitters) and I remember thinking of C as a very 'high level language'. My perspective has changed a bit since then...


> The funny thing is that most languages that people use that criticize 'C' are usually at the core levels written in C.

The JVM and MS CLR are largely in C++.


The SSL bug in this case happened because the programmers had wrong assumption about the network input (namely it doesn't contain the zero byte value).

This type of wrong assumption can be made in any language. And this bug doesn't make buffer overflows or any other exploitable activities possible, so they are not dangerous in a way that 'usual' C code can be dangerous.

Still it is a serious threat regarding the widespread usage of that particular library.


What's your issue with the quote? C does not have a proper type system. C code is inevitably loaded with unsafe cast operations.

> C/C++

He was talking about C, not C++. C++ has a good type system and also makes it easy to avoid bounds checking problems.


For a long time people built huge buildings with very very thin measures in place for worker safety. Buildings cost a lot less and went up a lot faster - but it came at the cost of workers lives.

OSHA's rules make it much more expensive and tedious for American cities to grow - but the growth isn't coming on the backs of construction workers. It's a trade off we've decided to make because we value safety and we value not getting our pants sued off for negligence.

You can write some well designed quick-and-dirty C code that does what you want, and does it fast. But once in a while you'll make a mistake that you probably won't notice and might cost you your company.


Ironically, the article referenced by the author does not blame the C language for this problem. Instead, it blames the CA for issuing the certificates in the first place:

"Marlinspike said since there is no legitimate reason for a null character to be in a domain name, it’s a mystery why Certificate Authorities accept them in a name."


Yes, and there's no reason for browsers to accept more than one domain name in a CN field. However a quick look through rfc3280 and an ASN.1 reference make me think it is a less than trivial task to figure out what would and would not be a legal termination for a string encoded in the Subject field of a certificate. But it is perfectly reasonable to expect the CA to check for that.


The reason I like C is that every action is so specific. Yes, that means it's not suited for "high level" applications, like web apps, and situation in which development time needs to be cut. But that specificity and control over every action is exactly why it's good for network and hardware programming. I haven't seen C used outside of those realms in a long time.

edit: Furthermore, it's low-levelness makes it very versatile. It centers around the universal abstractions used in Unix - the ability to open, read, write, and close files. That, combined with structs, unions, and it's basic data types allow you to use it for virtually ANY protocol.


> That, combined with structs, unions, and it's basic data types allow you to use it for virtually ANY protocol.

Sadly, when you use C to implement low-level binary wire protocols, you quickly discover that structs, bit-fields, and unions are nearly useless because they are incompletely defined. Byte order is undefined. Structure layout is mostly undefined -- you pick field order, but you can't choose packing, alignment, or padding rules. The sizes of the integer types vary by platform and compiler. Bit field layout, packing, and alignment are almost completely undefined.

What you're left to work with are unsigned characters, pointers, and bitwise operations. You have to pack and unpack everything manually, or your code won't port. It's enough to get the job done, but it's like using a wrench to pound in screws.

I could do with a little less specificity of action, myself.


I haven't seen C used outside of those realms in a long time.

For an example of modern high-level C, check out GTK+. It's a sprawling cross-platform GUI library that provides the foundation for the arguably most popular Linux desktop environment (GNOME). Although GNOME apps are frequently written in higher-level languages using bindings, GTK+ itself is plain C.


One thing this guy doesn't mention, that I would think relevant to the discussion: Every language currently used by more than 4 people has a notion of FFI via C. This is nice as it allows for the old "profile it and write the slow bits in C" type programming. I particularly like that style of programming, because in the end, you only need to do C style intensity for a small bit of code. Over time, the number of these small, but useful bits accumulates, and the result is a decent, bottom up style library, without the pain of having started in C. (It also helps avoid the cruft...).


This article in particular crystallized a thought I've had about this site, and sites like this in general. I would love to see a wiki-editable block attached to each submission that tries to describe, in as few words as possible, what information the article contributes.

Think of it as compression, where a basic knowledge of computing is assumed. More interesting articles would have a lower compression ratio, which might be a fun thing to filter on. This article might go down to, "C is generally unsafe, and you probably aren't skilled enough to make it safe, so don't use it". Or maybe, "I needed to write something for my company blog, so I found a recent security hole and added some vaguely related platitudes".


C is used for low level libraries for it's lean and mean performance. It sacrifices checks and safety features for this, and allows the programmer full control. Do you see professional race cars with ABS and Automatic Stabilization? No, you give the Driver FULL and TOTAL control, same with C and other low level languages. C has only a few data types that are as basic as you can get, I mean what do you expect use something like STL strings?

If you start having type checking and various other easy-to-code and child-safety features, you are bloating and giving up performance in the low level libraries, if this happens imagine what the performance on the higher up application level would be.


I'd prefer my libraries be rock solid secure even if I lose some (or even a lot of) performance for it.

Hardware is always getting cheaper.

Losing data integrity and the trust/confidence of your users is extremely expensive, and can be fatal for a startup.

Performance is not the most important metric for a lot of applications.

I'd prefer the safe but slow 5 star crash test rated sedan with a good alarm over the race car that is going to blow up after a few races, in library terms.


> I'd prefer my libraries be rock solid secure even if I lose some (or even a lot of) performance for it.

Then don't use C.

People don't seem to understand that C is just a step up from Assembly Language. There are no "types" in the sense that they exist in higher level languages. And thus, there are no ways to check those types (unless you want to write a type checking system yourself). Lets not forget that most modern programming languages with type checking and "rock solid secure" libraries were, in fact, written in C.


Which languages? SBCL is written in CL, GHC is written in Haskell, etc. Sure, there is some C to talk the C-based API of the OS it's running on, but all the important stuff is self-hosted. This is a rather common pattern, as nobody really wants to write important application code in C.


> Hardware is always getting cheaper.

This is a poor justification. A few years ago a house was a good investment because 'housing prices will always be going up.'


Please provide even a single deluded fantasy in which the price/performance ratio for a particular piece of hardware or component in the hardware stack will not continue to trend in the direction of more bang for the buck.


I don't know if this counts as "deluded", but how about: resource exhaustion of raw materials required in hardware manufacture.

See, for example: http://blogs.wsj.com/informedreader/2007/05/25/a-metal-scare... which talks primarily about LCD displays (we're fast running out of Gallium/Hafnium/Indium), but points out that copper is likely to get significantly more expensive throughout this century.

That's going to increase the price/performance ratio of practically everything.


Yep, that's a deluded fantasy all right. See "Ultimate Resource 2", "The Resourceful Earth", or "State of Humanity" by Julian Simon or "Earth Report 2000" by Ronald Bailey.


Silicon makes up much more of the cost of a computer than copper, its not even close. Silicon is >50 cents a gram, copper is less than a cent per gram.


That's what the significantly was all about. We're not really going to run out of silicon any time soon, but we actually might with copper.

At the moment, both those prices are strongly dominated by the cost of processing, I would imagine. At some point, it could well be the scarcity you're paying for with copper, though.

Scarcity increases price, and two orders of magnitude is hardly inconceivable.


"People always need a place to live so real estate/housing prices will always be going up. There is only a limited amount of land on the Earth, but the population continues to grow."


Relying on single-thread (execution context/core) performance to increase is a mistake.


Yes, but I can't imagine any realistic scenario where this performance will start to backslide. The rate of performance increase will continue to slow down, and the number of cores that you get per unit will increase, but "the hardware" in the most general sense will keep getting faster and cheaper over time.


First, you said an upward trend - I submit that any meaningful upward trend in single core performance has ceased. Further, it has backslided in many cases. In order to increase the number of cores per processor, each core is getting simpler by reducing or eliminating single-threaded optimizations like branch prediction and out-of-order execution.


That, and not every program runs on ever-faster hardware.


Actually, c-strings are a speed-for-memory trade-off. At the expense of saving a few bytes of memory per string, most operations are O(n) time (strlen, for example). If c-strings were in the form of <length><data> instead of <data><\0>, this bug would be avoided.

This is not a performance issue, only bad design and convention. (The other problem was using memcpy and strcmp on the same data. You can't treat blocks of memory as strings; a type system would eliminiate this confusion at no runtime cost.)


professional race cars absolutely have ABS. in fact ABS systems are so effective that their use has been restricted in the likes of formula 1 etc.


"For example, professional cooks don't use consumer grade safe knifes, they use hardcore deadly knives for their performance."

They also wear cut-resistant gloves.


I suppose that butchers might use cut-resistant gloves, but in the professional kitchens I've shared I've never seen them worn by a chef.


heh, just did a ninja edit and replaced with the race car analogy after realizing that. HN comments need to be ACID


I think the cooking analogy holds. Sure, those people DO use knife-proof gloves and for good reason. They need to be able to cut through things of the same basic structure as human meat! So it goes with code: if you are working on a low level libs and languages you should be protecting yourself with well thought out data and exception handling. If you don't, you might just cut your hand, I mean, data off!


Use the right tool for the job. C is the right tool for some jobs; if you jam it into a place where it doesn't belong, you probably don't have a deep enough understanding of it to use it safely.

There are two groups you find misusing something. Those that really know what they are doing, have weighed the risks/rewards, and have decided that misusing the tool to get the job done is worth the associated risk. Then you have people who don't know what they are doing. They are going to have problems, but don't blame the tool.


Anyone care to guess at what he means by a strong and expressive type system?


I’m pretty sure he’d point to Haskell as the best example.


Common Lisp, {,S,O?CA}ML, and even C# also make the list.


Ada's


I wrote my own sprintf handler which checks string length and truncates if necessary.

unsigned long lsprintf(unsigned long max_length,char [asterisk]dest,char [asterisk]fmt, ...)

char buffer[1024];

lsprintf(1024,buffer,format);

Slightly overkill.


C99 has snprintf() for this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: