Hacker News new | past | comments | ask | show | jobs | submit login

C's second biggest mistake, IMO, is NUL-terminated strings. The combination of these two mistakes is much worse than either of them alone. Since you don't know where a string ends without scanning it, and nothing prevents you from writing off the end of the allocated space, the familiar 'strcpy' etc. are loaded guns with hair triggers.



I'd argue that the nul-terminated strings is a direct consequence of arrays decaying to pointers when passed to functions.


They're clearly closely related, but it would technically have been possible to represent strings using a header with a length field followed by the characters in the string. That would have required people to pass the location of an arbitrary character as two explicit values, the string itself and an offset, which would certainly not have been in the spirit of the way C takes advantage of pointers, but nonetheless would have been possible.

BTW I must take issue slightly with your characterization of the problem. Arrays decay to pointers immediately when referenced, not just when passed to functions. I suppose a C implementation could do bounds checking on non-pointer-mediated accesses to local arrays, but I've never seen an implementation do that (except those that do bounds checking more generally). Did yours?

In fact, the way I put it once was, "C does not have arrays. It has pointers and notations for allocating and initializing memory; it does not have arrays." (Meaning, of course, "array" in the sense of a firstclass object, with bounds checking.)


Length-prefixed strings have another disastrous problem - strings cannot be sliced. A substring would always have to be copied.

Arrays decay to pointers when dereferenced, but this wouldn't impede array bounds checking. I didn't bother putting in static array bounds checking in Digital Mars C because pretty much all the interesting cases were buffers passed to a function.

I know C doesn't have first class arrays. That's what my enhancement suggestion was for.


> Length-prefixed strings have another disastrous problem - strings cannot be sliced. A substring would always have to be copied.

Slicing is one common operation that's O(n) with a simple vector representation. I agree it's nice if it can be made O(1), but concatenation is still O(n) with that representation. This is why I'm a fan of ropes or other more exotic sequence data structures that can do all the common operations in log time or better.

> I know C doesn't have first class arrays. That's what my enhancement suggestion was for.

Fair enough. I think it's a good suggestion, but I'm skeptical that it would see wide use. Here's why. It has been straightforward for a long time to declare a struct type that holds a base and bounds, pass that struct by value, and then do one's own bounds checking. (The K&R-era compilers didn't have struct passing by value, but it was introduced in Unix v7, as I recall. That was a long time ago.) This isn't quite as convenient as what you're proposing, but it's not hard, and I don't think I've ever seen anyone do it. (Not that I spend my time reading other people's C code.) It certainly hasn't become standard practice.

Still and all, if you can persuade the committee to put it in the language, and also define new standard library functions that make use of it, I agree it would be a good thing.


Aren't strings immutable in most languages? Like in javascript, String.slice() returns a copy, it does not mutate the original string.

I don't disagree with you, but I don't see how it's a disastrous problem, either.


In D, which has ptr/length arrays, strings are immutable and sliceable without copying. This can make them very fast.


C's NUL-terminated strings were intended for mutating strings in char buffers (where the buffer's capacity was known at compile-time), not for passing around. More detail here: https://news.ycombinator.com/item?id=7475866


That's completely bogus. C doesn't have first-class arrays so that means the entire C standard library provides no way to pass strings to functions.


Did you read the link? The point was that in idiomatic C, you pass a string to a function by passing a raw pointer to a char-array buffer together with an external length.


Yes I read the link. It makes the unfounded claim that heap allocated strings were never intended to use the NUL character as the means of determining length. Get a 1st edition of K&R and see what fraction of functions in the standard library pass a length with a string.

Get an SVR4 API manual and do the same. After doing this I don't find this argument to be compelling.


In the end, it's about efficiency and redundancy. If a function is going to have to do something with every character of a string (e.g. copy it into a new buffer), and you don't have very many registers reserved in your calling convention (especially true for e.g. system calls), then it's silly to pass an explicit length if you know you've already NUL-terminated the string in its buffer.

All Unix system-call wrapper functions were specified this way, because it is guaranteed that they will need to move the string data from a user-space data structure to a kernel-space data structure (copying it wholesale in the process) before manipulating it. The NUL-terminated string can, in this case, be thought of as an encoded wire format for passing a string between the userland process and the kernel.

And because the Unix system-call wrappers--the most visible and useful parts of the C stdlib--followed this convention, C programmers generalized it into a convention for passing strings in general, even between functions in the same memory-space that can afford whatever registers they like.

If you're not going to iterate over everything in the passed buffer, though, C intended you to do things more like sprintf: take a destination buffer, and explicit size; write to the destination buffer, without scanning for a NUL first as a place to begin, and without appending a NUL afterward; and return an explicit size in bytes written.


http://en.wikipedia.org/wiki/Tony_Hoare#Quotations

I call it my billion-dollar mistake. It was the invention of the null reference in 1965


I think people would prefer that openssl seg-faulted than silently allow theft of private data.

I.e. the array-decay-to-pointer is a more expensive problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: