Hacker News new | past | comments | ask | show | jobs | submit login
Some Were Meant for C (2017) [pdf] (kent.ac.uk)
116 points by chauhankiran on April 25, 2019 | hide | past | favorite | 176 comments



I don't like how people on the internet hate on C all the time. To me C is a wonderfully simple and elegant language that I can most of the time very clearly express my thoughts in. I find that this not the case with many other languages. This is the reason I use C. Of course compiler writers are trying to sabotage this but I don't write any critical code so if I get hit by an occasional bug it's not a big problem. I think C is misunderstood, perhaps because it is used everywhere and standards for critical pieces of software are applied to non-critical ones. E.g. never freeing memory, even leaking it by destroying all references to it, can be totally fine if your program just runs for a few seconds. Just allocate what you need, throw it away, and let the OS clean up when you're done. This is perfectly fine but I feel like many people wouldn't accept this kind of code because it's not acceptable in other circumstances.

Anyway, the author makes a few good points that are important but often overlooked, but I don't think I'd agree with him on the safety aspect in section 6. Making a safe C implementation doesn't really appear possible. If you allow casting integers to pointers, how do you implement these saftey checks he's talking about?


"There are only two kinds of languages: the ones people complain about and the ones nobody uses." (Bjarne Stroustrup)

I guess that doesn't only apply to C++, but also C. :)


I stopped writing C because string manipulation sucked.


I agree with that. Recently I started reading "Writing an interpreter in Go" and thought I'd follow along using C.

From the first chapter, the Go code starts using strings as a short-cut to represent tokens . In other languages this is trivial because strings are very easy to create, resize, change, etc. Using C, this became an issue though, as using strings became a roadblock where I started having to implement different solutions rather than focusing my attention on the contents of the book.


Using strings for tokens is probably not how you would ever do this in C. Make the tokens enums.


Oddly this is also true for go....


I never really got that complaint, I'd like to see some examples of what people consider so ugly about C strings.


1 - string contents and actual length are handled in separate variables without correlation

2 - no enforcement that a null terminator actually exists in the string

3 - C brags about performance and is probably the slowest language to compute string length

4 - manipulating strings requires very carefull handling of buffers, usually forcing everyone to use the heap as easier way out


Storing the length of a string alongside the string is a viable option in C, it's up to you to do it yourself though (whereas Java and almost every other language does it for you). See https://github.com/phusion/nginx/blob/master/src/core/ngx_st..., used by NGINX.


Which breaks down every time a C API needs to be called, and that linked API still has plenty of functions with separate pointer and length parameters.


You can do it like in Free Pascal/Delphi: store the string both length-prefixed (for fast length access and bounds checking) and zero terminated (for passing to functions that expect zero terminated strings).


And then the callee corrupts it.


If the callee is meant to corrupt it then it is your fault for misusing it (unless the corruption is intentional and you plan to recalculate the length after calling it). If it wasn't meant to corrupt it then it is a bug and if it is in your code then it is your fault for not using the string functionality that you already have in place to avoid the bug in the first place. If it is not in your code, as long as you had to use it you'd have that bug regardless of what language or framework you used since it is out of your control and there isn't anything you can do about it.


Aka, C's community version of "you are holding it wrong".


Is there a language that doesn't allow any abuse of an API, including APIs that were not written in that language?


Yes, any system languages that doesn't need to depend on the existence of C.

If you are going to mention Assembly as possibility, check ClearPath where there is no Assembly, NEWP has full control over the hardware stack.


>Storing the length of a string alongside the string is a viable option in C, it's up to you to do it yourself though

Obviously, since many other languages that do so are implemented in C.


Also sds, used by redis: https://github.com/antirez/sds


I was actually more looking for practical issues. Usually the code that I write doesn't even handle strings a lot. Maybe I'm just using other languages for when I do that or maybe I'm using other approaches where others would use strings or maybe I just subjectively don't find them so bad as others. I'd just like to see exactly what people are complaining about so I could find out why I usually don't.


>I was actually more looking for practical issues.

Those are not practical? Billions of dollars have been wasted on issues stemming from this...


I'd just like to see a concrete example for once.


Like, apart from tons of well known bugs, and vulnerabilities caused from string manipulation? How exactly did you miss news items, reports, posts, university lectures, books, and even your own personal experience, on them?

Or is the insinuation that we are hand-wavy about it, and you doubt the existence and scale of the problem? It's a well researched, well established problem, known for almost half a century.

https://en.wikipedia.org/wiki/C_standard_library#Buffer_over...

https://security.web.cern.ch/security/recommendations/en/cod...

http://www.informit.com/articles/article.aspx?p=430402&seqNu...

https://randomascii.wordpress.com/2013/04/03/stop-using-strn...

https://courses.cs.washington.edu/courses/cse341/04wi/lectur... (null termination)

https://www.geeksforgeeks.org/why-strcpy-and-strncpy-are-not...

https://www.owasp.org/index.php/Reviewing_Code_for_Buffer_Ov...:

And let's not even get into format string issues...


3 - C brags about performance and is probably the slowest language to compute string length

I have a feeling that of all axes of performace C cares the most about memory overhead. Then the obvious idea is to have it at exatly one byte per "simple" string, and you get to pick the class of programs that can't get away with that default string type:

• One-byte terminator: complicated text-handling application with a lot of (longer than a couple of pointers on average) string slices.

• One-byte length: anything that needs strings longer than 255 chars.

And then of these two solutions you pick the obviously more general one. What could possibly go wrong?


Anything that needs strings longer than 255 chars already had a solution in existing systems programming languages back when C was born.

Character arrays with open length, bound checked.

Naturally it requires better compiler support than C authors were willing to implement.


>Naturally it requires better compiler support than C authors were willing to implement.

Which is still the case with many things in Go, a language of close origin to C (though this time not about strings).


It is interesting how in both cases they disregarded what was being made around them.


What was made around them was only ever the gorilla with jungle thing. Fine for small programs, or larger ones within walled gardens. But not fine for infrastructure work.


Yet Multics was deemed safer than UNIX as per DoD security assement.

I guess security is not relevant as infrastructure work.


Oh, and what do they use today?

> I guess security is not relevant as infrastructure work.

...


Ada and even Java (PTC/Aonix) when security matters.


these seem like tradeoffs which are straightforward to understand, which allow for simpler ABI & runtime. sure, the "UX" of the language suffers compared to e.g. Python, but at least the mechanics are easier to understand. If you want Python style string handling in C you could just use the Python C-API.


And yet I'm betting your language is run through a C program on its way to being interpreted or compiled.


That is the unfortunate reality of having UNIX being widespread.


Windows is the fault of UNIX? I'm skeptical.


Windows is not a pile of C code, rather C, C++ and .NET.

And nowadays C code is considered legacy, with C#, Rust and constrained C++ as the road to the future.


It's not unfortunate. If it was a bad thing, UNIX wouldn't be so widespread.


>If it was a bad thing, UNIX wouldn't be so widespread.

Strange argument. Javascript is widespread, COBOL was widespread, Windows is widespread, X86 is widespread. Widespread doesn't mean good. UNIX was a disaster, and the whole family of UNIX-like OS spent decades just for mitigate its errors and faults.


I guess we have someone here that enjoys using PHP, JavaScript and Perl.


As a UNIX professional programmer, no. Other than Perl, which I don't use, why would you think that?


Same set of language design qualities and being widespead due to historical accident.


And our money go to some Cobol program on their way to our bank/insurance.

And our data through some JS monstrosity.

So?


About point 3: Computing the string length is fast in C. The point is: In other languages you always have the length around, so you never count it up.

All the listed weaknesses also have benefits. For instance it is easy to get a substring without need to copy.

But yes, many bugs in C software originate from spring buffer overflows.


> All the listed weaknesses also have benefits. For instance it is easy to get a substring without need to copy.

To make substring in some other languages, you need to store pointer to beginning of the substring and length of substring.

To make substring in C, you need to store pointer to beginning of the substring and put '\0' into original string.


> To make substring in some other languages, you need to store pointer to beginning of the substring and length of substring.

Plus a pointer to the beginning, plus a reference counter as the user expects it to manage lifetime. In C this is the user's job. Where they know the life time is guaranteed they can optimize.


I coded in C for over a decade. Never did I need a reference counter.


No, you store offset + length.


You can't use any of the stdlib string functions if you don't have that \0 at the end though, right?


Is there any function you would miss? Of those, which one couldn't you recreate in 5 straightforward lines of code?


O(N) is not the same as O(1).

That substring won't remain valid without copying it.


> That substring won't remain valid without copying it.

Unless it's not modified? And unless, when modified, that shouldn't be its new value?

C Strings (nul-terminated) are the right approach for static storage of small static strings (like strings literals in the source code) since they have low overhead, and "substrings" aren't second-class citizens.

For dynamically allocated strings that won't be modified after creation, the right approach is using a large memory chunk that is shared between many such strings, plus two indices for offset / length (or just offset if it's text that can be terminated with a sentinel).

Having a short string of about 10 characters allocated as a dynamic object in its own allocation is wasteful. Slow to allocate and has about 2x to 3x overhead. This approach isn't good for applications that store a large amount of data.

> O(N) is not the same as O(1).

Don't call strlen() in situations where the strings are large and you need to know the length ahead of time, and running time is paramount. Instead, store the length.


The typical micro-optimization while typing without any profiler feedback, just gut feeling, as prevalent across the community.


You don't need profiler feedback for these things, just back-of-the-envelope calculations. How much data you would like to store and process is not something a profiler can answer for you.

I can tell you that in one of my programs, the difference between garbage collected strings and optimized ones (actually, strings converted into unique integer handles immediately) for one of my projects (SAT solver in Java), handling a few million variables, was something like a second until completion vs a couple of minutes before the garbage collector finally dies due to lack of oxygen, losing all data computed up to that point.


I agree, but C (the language) doesn't even have the concept of a 'string'. It's just the convention how some C standard library functions interpret an array of bytes with a zero at the end.

At least in C it's quite obvious that strings are not trivial if you want both an intuitive way to work with strings, and high performance. The C++ std::string type is neither intuitive to work with, nor does it allow to write high-performance code.

For string processing it's really better to use another language with different trade-offs.


> I agree, but C (the language) doesn't even have the concept of a 'string'

It has string literals, so yes it does.

> The C++ std::string type is neither intuitive to work with

Many would disagree.

> nor does it allow to write high-performance code

True, but only due to backwards compatibility with C - std::string operations have to add a null terminator for no other good reason.


> Many would disagree.

Many would also agree, that means nothing. Personally i dislike C++'s strings... and the rest of STL, which i view as one of the worst standard library APIs in wide use.


> True, but only due to backwards compatibility with C - std::string operations have to add a null terminator for no other good reason.

That's about the least reason why std::string is inefficient.


C++ std::string is better and more secure and anything that C ever produced.

As for string processing in general, I do agree that other languages are better suited.


More often that not I find myself missing C-type strings in other languages. Being able to just walk through the characters and manipulating them is something I found rather ugly in python for instance. The NUL character is in my experience not so terrible, you typically have null pointers at the end of a linked list or whatever as well and nobody complains about that. Now I have to admit I had a bug recently that took me longer to fix then I would like to admit because I wasn't walking a string right, but usually I have very little trouble with them.


What's a character? UTF-8 makes that a bit difficult to answer. If you want arrays of ASCII bytes, you can have those in most programming languages.


1 to 4 bytes.

How does utf8 make that difficult to answer?

When was the last time you iterated over a string of unicode points and said you know what would be handy right now? If these code points were split up into arbitrary and unusable bytes of memory.


Well, ä is a character in German. You can either write it as LATIN SMALL LETTER A WITH DIAERESIS, or you can use COMBINING DIAERESIS and a. When you iterate over the German word Mädchen as Unicode code points you might be confused. Other languages do much crazier things.


That doesn't make it hard..


It's been a long time since I wrote C, but the main problem in my recollection is that the standard library is not intuitive. Something simple like take a couple of arbitrary strings, concatenate them and return the result without leaking memory and not causing buffer overflows is not as trivial as it should be.

I don't think is a huge problem per se, though, you can just use a string library.

See the confusion here for an example: https://stackoverflow.com/questions/308695/how-do-i-concaten...

snprintf looks like it is the easiest way out.


>More often that not I find myself missing C-type strings in other languages. Being able to just walk through the characters and manipulating them is something I found rather ugly in python for instance

If what you say if you want mutable strings, many languages have those and you don't need anything like "NULL" to have them (and you can use a bytearray of the string in Python, though Unicode complicates this).

>The NUL character is in my experience not so terrible, you typically have null pointers at the end of a linked list or whatever as well and nobody complains about that

That's not the same thing at all. The linked list is comprised of structs with next fields, that can be null or point to something. Your program can handle either just fine, as both are valid cases (a linked list expects to find the NULL guard at the end but also expects a non-NULL next pointer if the node is not the last one, so will handle both).

OTOH, if an incoming string doesn't have a NUL byte your program will crash/corrupt memory/worse. On top of that, you need to remember it to add it/make space/for most string manipulations. Strings are not expected NOT to end with NUL, and when they don't there's no way you can mitigate it, except to set arbitrary limits to how many characters you consider.


> OTOH, if an incoming string doesn't have a NUL byte your program will crash/corrupt memory/worse.

how are linked lists different? if the last node contains garbage for its next pointer, the outcome will be exactly the same. it's a bit more rare to encounter an "unterminated" linked list, but I've seen it happen plenty of times deserializing a linked list from disk or if the programmer just forgot to initialize the pointer. c strings basically are linked lists with an implicit next pointer.


This is why I don't like other languages. They have no concept of functions to interpret an array of bytes with a zero at the end.


Actually (and I know this is a lame troll), it's trivial to make such functions in most non-C languages.


Why is this comment being down voted? Not everybody here is familiar with C string manipulation, if you down vote or complain at least give more detail that "it sucks".

@aap_, I asked a similar question some time ago and got some answers, you can check the thread here: https://news.ycombinator.com/item?id=19302581

The direct answer I got was:

> I'm guessing because an off-by-one or an extra skip might mean you miss the end of the string and go off into la-la land feeding whatever garbage happens to be in memory to your parser? That would mostly be a C issue (as it has no string abstraction at all).


>Why is this comment being down voted? Not everybody here is familiar with C string manipulation, if you down vote or complain at least give more detail that "it sucks".

Well, if someone is not familiar, why do they read a subthread on the matter?

Shouldn't they better start with a tutorial on C/C strings?

Even if people on this thread gave arguments, how would they (not familiar with C and C strings) would evaluate them? They could be totally bogus.


I am finding hard to imagine not seeing the difficulty here, so instead I’m just gonna point out simple operations like stripping whitespace, splitting strings on a character pattern, changing case, dealing with character encodings, regex matches all require manually iterating and mutating or copying strings and in the case of regexes require compiling and auditing various libraries. The abstractions other standard libraries have used, such as rust, make it much easier to simply express the string operations as high level operations and spend your time elsewhere while retaining relatively high levels of performance. Often, string processing is not in the inner loop and does not benefit from things like combining multiple string operations into a single pass, traditionally a thing that might make c perform better all other things being equal.


If this a joke?

Length not known, so prone to overflows at anytime, atrocious standard library, ... (and let's not even go into the Unicode situation).


You basically have to write your own high level string library just to approximate the features of just about any other language...


C was already hated on BBS and USENET during the early 90's.

You will find my nick on those archives at the C++/Pascal trenches side.


Mysteriously, whenever I'm forced to work on an existing C++ code base, it has far more bugs than all C code bases that I've worked on.

And yes, apart from the myriads of usual C++ leaky abstraction bugs that "just" cause wrong results, this includes segfaults.


And how many of those are actually caused by features inherited from C copy-paste compatibility?


Fads have always come and gone. C was already 20 years old in the 90s so it was a fair target for hating in favour of something newer and shinier.


In the early 90's C was still hardly a thing outside the UNIX computer lab.

Mainframes were written in a mix of Assembly and safe system languages (IBM i, z and Unysis ClearPath are the survivors).

8 and 16 bit home micros OSes were largely written in Assembly, and all system languages were in equal footing as platform guests.


>In the early 90's C was still hardly a thing outside the UNIX computer lab.

That's not what I remembered. Back in 1989, I bought the very popular Microsoft C 5.1 compiler (in shrinkwrapped box) from the Egghead retail store. Egghead was basically a smaller version of CompUSA.

Also, non-UNIX publications like PC Magazine and BYTE had ads for Borland Turbo C. Watcom C was also a competitor.

In the late 1980s and early 1990s, if one wanted to write COTS software for DOS on IBM PC compatibles, it was either assembly or C Language. Other languages like dBASE/Clipper/4GL were for LOB apps instead of COTS. In 1985, both Lotus 123 v3 and MS Excel were written using C.

It was more like the early 1980s before the personal computer revolution when the C Language was mostly tied to UNIX.


It was very regional.

Back in the Iberian Penisula, Turbo Basic, Turbo Pascal, TASM, NASM reigned on the PC for COTS.

Amiga demoscene was all about AMOS and 68000 Assembly.

I code since 1986, touched my very first C compiler in 1992.

By then I already had used Timex 2068 Basic, 48+ Basic, CP/M on +3 A, Z80, Turbo Basic, Forth, Turbo Pascal 3 - 6, Dbase III Plus, 8086 with TASM.

Thankfully also in 1992, I got hold of Turbo C++ 1.0, and I was settled on my opinion about C.


I feel like C has to go in the bucket of 'bad languages' these days, because it doesn't have sensible unicode support built in.

But that doesn't mean you can't have fun with it, or write good computer programs.

It just means that user inputted text is always going to be a pain.


To be fair it doesn't really have strings 'built in' particularly well either. It directly manipulates memory patches, and the string support that is there is mostly a very simple abstraction on that.

If you want support for unicode etc, there are libraries.


But then that brings you compatibility issues, overhead ...


If you want to avoid development overhead, absolutely, C is a terrible language to pick.

I'm not sure what you mean by compatibility issues in this context?


I'm thinking about differen string representations used by different libraries. Is this not true?


Honestly all I've encountered default strings (and people shy away from those now) and LibICU stuff. I think ICU has become a defacto standard but I have been out of C for a few years.

This tends to be important for strings you're going to show to the user (or input you gather from them), maybe most of the C in the world isn't aimed at interfaces like that...


Actually the best way to deal with unicode in almost all situations is to use UTF-8, and dealing with UTF-8 in C is trivial.

For other encodings, well you need to decode explicitly (or just go for wide chars). Shouldn't come as a surprise, and not a big deal either.


Everybody's brain works differently. Maybe C is the language for you. Maybe your tradeoffs are different from mine and you're ok with dealing with segfaults and security exploits. AFAIC if I never again have to write string handling code in C it'll be too soon."|"|"|"|


There are standard C libraries that handle strings very well. Are you saying other languages don't have security exploits? And in 35 years of writing in C, I have only had a very few segfaults while writing new code but never in production so I don't know why you think it happens so often unless your knowledge is only from reading reddit headlines.


> There are standard C libraries that handle strings very well.

I'm not aware of any. I've never encountered a language that is more frustrating than C to manipulate strings in.

> Are you saying other languages don't have security exploits?

No, I'm saying they have fewer. Far fewer.

> And in 35 years of writing in C, I have only had a very few segfaults while writing new code but never in production

Good for you. In 25 years of writing C and working with dozens of other programmers writing C in that time, I've lost count of how many segfaults and memory corruption errors I've had to debug.

I don't miss those days and never want to go back.

> unless your knowledge is only from reading reddit headlines.

It is not. See above.


[flagged]


> Yeah, language without ... decent comprehensive library

I'm sorry, did you just say C doesn't have a comprehensive library?! I agree with your other points (but posit that they are not always negative; some of the 'flexibility' is useful in the domains where C dominates. Most of the shortfall can be covered by tooling, linting and standards.

It's been almost a decade since I last used C, but I think the complaints are overblown. Some tools have sharper edges than others, but there are there to accomplish a goal.


>I'm sorry, did you just say C doesn't have a comprehensive library?!

Data structures? Lists, Hashtables, Maps, Strings, Refcounted pointers, Weak pointers, Ephemerons, iterators for mentioned? C library is a joke even by 1970 standards.

>Most of the shortfall can be covered by tooling, linting and standards.

The amounts of CVEs prove you wrong. Besides, a typical C-coder considers himself smarter than type checker, otherwise he would be using another language (ML, Rust, Ada, C++).


I will always have a soft spot for C. It was my first language. I wrote my first linked list in C. 30 seconds later I had my first segfault. C was the entrance to the rabbit hole for me. I went down through assembly language, down through logic design, circuit design, and transistor theory, all the way down to fabrication and quantum mechanics. It was a long time before I came back from that trip. Today, I wouldn't use C for everything, or even for many things, but I'm glad I know it and I'm glad it's the substratum that holds most everything else together.

I'm glad because C is an idea you can hold in your head. I can look at old C, I can look at new C, and it's all pretty sensible. (Excepting macro abuse, deliberate obfuscation, or gross incompetence.) It's hard to say that about most any other language.


What is lovely about C is its paper-thin layer atop the syscall pokey bits: signal, vfork, mmap, etc. No surprise that other languages either start by using libc, or find their way there eventually.


The "killer feature" of C is the simple semantics of the language (despite all the dark undefined corners), which can be implemented relatively straightforward on a variety of ISAs. There isn't any need for runtime support because of the lack of fancy language features.

That makes it the most natural way to call into the operating system, without any concerns that the chunk of memory returned by mmap has to be somehow managed by the language runtime.


The trouble is you can step from simple semantics to dark corners through such simple things as "adding two numbers together" and "accepting user input". I do agree that the lack of need for a managed runtime is a huge advantage on embedded or small systems.


> I do agree that the lack of need for a managed runtime is a huge advantage on embedded or small systems.

Not only in embedded systems. It is a huge advantage when interfacing with the operating system from within a userland process.

Lack of a runtime means that a call to a kernel function like, say, `sched_setscheduler(2)` would not interfere with any such thread scheduling policies of the host runtime. Or that the runtime is grabbing file descriptors out of your control, making it hard to reason about resource usage.


I wonder how bi-directional this has been since the rise of c? i.e. CPU manufacturers defining ISAs that are well suited for usage with c.


I think that the fact that we essentially have instructions on most architectures well suited for the implementation of certain operations common in C answers this pretty well.


This is only true for UNIX based OSes, given that libc is naturally the door into the OS, and contains much more than what ISO C compliant libc is supposed to have.


And not even all unices: MacOS has several APIs where the specified interface is the Objective-C interface, not the C one.


What's why I love Object Pascal/Delphi - good balance between abstraction and the underlying system, compile to native code, with an clean, easy language syntax:

https://castle-engine.io/modern_pascal_introduction.html


If C didn't have (as much) undefined behavior it would be practically perfect. There's not much else that I would change about it.


Buffer overflows. That's far from perfect.

It's much too easy to produce buffer overflows in C, be it due to the design of C strings or just because of the manual handling of dynamic memory.


Buffer overflows of your own data are your responsibility. The main problem is that there is no effective way in the language to control (prevent, or at least detect) stack overflows. For example, when you reach a recursion limit or when you allocate too many automatic variables. For scientific computing, the ability to fearlessly declare a local matrix like "float A[n][n];" for arbitrary values of "n" would be invaluable.


The BERI CPU, the CHERI ISA and CheriBSD has an interesting approach to this, where they have capability-based addressing at the pointer level, supported by hardware.

https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/


There is no String class,type in C. You could create a String and make it safe but slower, you need to keep in mind when C was created and for what purpose was it created.


When C was created there was already 10 years experience writing safe OSes in ESPOL, NEWP, PL/I, PL/S, Mesa, BLISS and a couple of other variants, outside Bell Labs.

The big difference was that those OSes versus UNIX costed real money.


So those OS were great but nobody wanted to pay for the greatness? I am not that old so I do not know that history but in University we learned programming with C because the hardware and C are close.

Any idea why there was no popular safe array or safe string library for C? Maybe at that time there was no internet and everyone had to create his own abstractions?


Some of those OS are still being sold, like IBM i, IBM z and Unisys ClearPath MCP.

Bell Labs was forbidden to sell their research, so they offered UNIX for a symbolic price to universities (vs what other OS used to cost), alongside source code tapes and a liberal license.

This gave birth to several startups that tried to create a business using UNIX instead of the alternatives, given the authors experience with UNIX at the university, e.g. Sun and SGI among others.

Later when the US goverment dictated AT&T split, AT&T was allowed to charge for UNIX and that is when they decided to go after BSD, because 10 years later, under such conditions, UNIX was slowing eating mainframes, given the success of Sun, SGI, Aix, ....

There are no safe array or safe string libraries, because they all fall appart under scrutiny, given C's approach to secure code.


>There are no safe array or safe string libraries, because they all fall appart under scrutiny, given C's approach to secure code.

I do not understand, maybe give an example.

About the history I am not sure that proves that UNIX and C advantage was only because of that, it could be a factor but there are merits to C and UNIX that if you want to disprove you can;t just do it by mentioning that history. I mean Windows API used C.


> I do not understand, maybe give an example.

Because since those libraries are not built-in types, you always need to convert back to a ptr and length values at some point when interoperating with other C code.

And given the lack of bounds checking, you are back at square one.

> I mean Windows API used C.

Windows API used C, because by the time Windows came around UNIX was already well established in the enterprise.

C spread outside UNIX, because many of us were using C dialects, e.g. Small-C, on personal computers when bringing work home from job/university.


> Because since those libraries are not built-in types, you always need to convert back to a ptr and length values at some point when interoperating with other C code.

And that's exactly why C wins, in my opinion. Because that means interoperability.

Storing a pointer + length pair in a fixed layout struct is bad from a normalization standpoint. They are independent data. If you don't separate them you will end up with data redundancies as soon as you have parallel arrays. If you use dynamic vectors and in C++ or similar languages and you ask yourself which is the object that you should call .size() on, that's when you notice that it's morally wrong to bundle pointer + length.


They are not independent, hence why 30 years since Morris worm, we keep increasing the amount of memory corruption exploits in C code.

Or more precisely, 68% of Linux kernel exploits according to Google's talk at Linux Kernel Summit 2018.


I didn't say class or type but "C strings".

You can't rip them out and replace them with something better without massive code legacy issues. And then you still haven't dealt with other sources of buffer overflows in C.


There are no C strings, there are C arrays that can be made to be of char data. C++ has a string type , Qt has a QString type and you can inter-operate with code that wants C char array. If you were creating the C language and compiler at that time with that hardware would have you not implemented pointers and created Java ?


Honest question. What has stopped people from creating a standard with no undefined behavior? Is such a thing impossible? Have people done it?


See for example here https://blog.regehr.org/archives/1287

> After publishing the Friendly C Proposal, I spent some time discussing its design with people, and eventually I came to the depressing conclusion that there’s no way to get a group of C experts — even if they are knowledgable, intelligent, and otherwise reasonable — to agree on the Friendly C dialect. There are just too many variations, each with its own set of performance tradeoffs, for consensus to be possible.


> "What has stopped people from creating a standard with no undefined behavior? Is such a thing impossible?"

I don't understand the "undefined behavior" bandwagon.

1. We have a perfectly defined list of "undefined behaviors".

2. Said list also happens to be relatively small and scoped.

3. "Undefined behaviors" exist because the language can't make certain runtime guarantees which are largely dependent on compiler/os/platform/hardware-specific promises. If you have to, just roll in your own runtime checks. C won't force those on you...


The language could make those guarantees if it wanted to. This might add overhead on some architectures, but would be possible. An example is integer overflow. If we limit to machines using two's complement (thus any machine architecture used during last thirty years) this could be fully defined easily. And if C is ever used on a other architecture they could build a workaround using some form of oberflow trap. (Since CPU design would take C into consideration)

Or evaluation order - `i = ++i;` could easily be defined in some way. But might prevent some niche optimisations by the compiler.

Of course by C's nature there are limits (C won't be able to detect use after free or similar without changing language notably) but there is room where UB could be reduced, if it was seen as neccisary.


> "The language could make those guarantees if it wanted to"

> "Of course by C's nature there are limits"

You seem to acknowledge the fact that most of the undefined behaviors in C are essentially born out of compromise. Those compromises were driven by principles such as "keep the power in the hands of developers", "don't impose unnecessary restrictions", "keep the language clean", "avoid hidden runtime magic". The end results reflect that.

As I've mentioned in my previous comment, there's no "one size fits all", so the language makes it trivial for you to roll out your very own runtime magic (à-la Zig/Nim) which suits you best. Why is that a bad thing?


You don't actually have to make the undefined behaviour go away completely, which is difficult.

An easier approach would be to put a bound on what is permissible undefined behavior.

Sounds a bit like an oxymoron, doesn't it? After all, it is "undefined", right?

However, the C standard does exactly that!

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

(section 3.4.3)

In fact, in the first version of the ANSI/ISO C standard, this was a binding part of the standard. In later versions, it was made non-binding, though the language is still in the standard.

Yes, the standard has language that says what permissible undefined behavior is, but you are free to ignore it and still call yourself compliant. Which is what just about everybody does nowadays.

Make it binding again and most of the mess disappears.

https://blog.metaobject.com/2018/07/a-one-word-change-to-c-s...


It doesn't fix anything. In most compilers, the behavior of compilers with respect to undefined behavior that everyone complains about boils down to "the compiler creates implicit __builtin_assumes that undefined behavior does not happen." That is "behaving during translation [...] in a documented manner characteristic of the environment," (especially given that the constant discussion of how undefined behavior is treated in the compiler by compiler writer means it's more well-documented than other mandatory-documented things, such as the actual implementation choices for implementation-defined behavior). It's just that the environment isn't the environment people thought it was.


> characteristic of the environment

Setting "the compiler" = "the environment" and therefore anything the compiler does is part of the environment and thus legit seems at best the type of sleight of hand the compiler writers use to justify their actions.

When defining the behaviour of the compiler, "the environment" obviously cannot be "the compiler".


> What has stopped people from creating a standard with no undefined behavior?

It's a lot of work.

> Is such a thing impossible? Have people done it?

It's possible; Ada has a really good take -- in that standard, there's a class of errors called "bounded errors" which on the surface look like "undefined behavior", but are a lot different in that they list out the possible results and thus preclude the "nose demons" problem of C -- see: http://www.ada-auth.org/standards/2xaarm/html/AA-1-1-5.html


> What has stopped people from creating a standard with no undefined behavior?

Because making C safer would introduce a lot more complexity in the runtime environment and instrumentation, and that is even hard to achieve correctly or has varying performance implications across all ISAs that C compilers are currently targeting.

Consider out of bounds indexing. To determine that an instruction is touching a memory region that is not, in abstract terms, a C array or a memory allocation by malloc and friends, now you need to insert long traps in memory, and even then, there is nothing stopping you from accessing array `b` through, for example, `a + 42`.


Removing undefined behavior is not actually the same as making C safer. In fact, in a sense, well defined C exists (many versions in fact), it's whatever your C compiler spits out with whatever combination of flags have been passed in.


If you were to write this standard, how would you define what happens when you write to a random address in memory?


The only reason C is as popular as it is is that it was used to write Unix, which was widely distributed and implemented. The rest is just following on that. It's just like the reason human (and other mammals) have their optic nerve passing through the retina creating a blind spot: pure historical accident.


When C was created there have been already higher-level, more complex, safer etc... languages. The language ecosystem wasn't much different from today, just with fewer choices.

Having a language that's simple, has a low runtime-overhead and can be implemented in a simple compiler is a quality that quite obviously is still valuable. C would have vanished like a lot of other languages that got out of fashion.

I don't think UNIX has much to do with it, otherwise C wouldn't have survived the 80's and 90's when other operating systems (Windows!) ruled supreme and UNIX operating systems were the underdog.


Windows never ruled supreme on the server room.

That place always belonged to mainframes and later UNIX, due to its "cheap" price versus mainframes.


Perhaps. But C didn't rule on the mainframes, either.

Your story seems to be that C won because of UNIX, which won because of the server room. But the server room was a small chunk of landscape, and code was being written for the whole landscape, not just for the server room. And C won all over the place - the server room, the desktop, and the embedded space. It didn't win on the desktop because of influence from the server room.


By your definition JavaScript won not because of the browser, but because it is such a wonderfull programmaing language.


It is useless to have a conversation with those who will not listen.


Yep, UNIX is the IDE for C. UNIX contain C compiler by default, which makes it very easy to distribute programs written in C.


Yes and no. It can explain why C dominates over other low-tech languages, like e.g. Pascal. But when people talk about replacing C they usually mean replacing it with some feature-rich language. And why C survives and dominates certain areas despite lacking all the fancy features is a totally different question.


Hello, why you can Pascal a low-tech language? Recently I heard about 'low-tech programming', not sure what does it actually mean.


Pascal isn't a low level language its a teaching one


It's worth checking out the current state of Pascal:

https://www.embarcadero.com/products/delphi

https://www.freepascal.org/

It's just as capable as any other language. I don't know what your definition of "low level" is, but I suppose you mean inline assembly, manual memory management, and pointers.

Pascal has inline assembly: http://docwiki.embarcadero.com/RADStudio/Rio/en/Inline_Assem...

Pascal has manual memory management: http://docwiki.embarcadero.com/RADStudio/Rio/en/Memory_Manag...

Pascal has pointers: http://docwiki.embarcadero.com/RADStudio/Rio/en/Pointers_and...


That's an age old discussion.

The practical Pascal variants like Turbo Pascal and Quick Pascal, which compiled to native code, are essentially on par with C regarding the features. That includes access to absolute memory addresses, port I/O and the speed of the resulting code. I've done over a decade of Turbo Pascal coding, and going on three decades of C, and I agree they feel like essentially much the same thing.


Humm ok but sounds risky I can recall PET programming where you would poke raw assembly code and BBC basic could inline Assembly , not sure that means BASIC is a low level language.


Obligatory (and hilarious) read: "The Night Watch"[0] by James Mickens.

> You might ask, “Why would someone write code in a grotesque language that exposes raw memory addresses? Why not use a modern language with garbage collection and functional programming and free massages after lunch?” Here’s the answer: Pointers are real. They’re what the hardware understands. Somebody has to deal with them. You can’t just place a LISP book on top of an x86 chip and hope that the hardware learns about lambda calculus by osmosis.

[0]https://www.usenix.org/system/files/1311_05-08_mickens.pdf


Got me LOL’ing several times! Thanks for sharing


It's dumb that in 2017 the author doesn't say anything specific about Rust, which is by far the strongest "C replacement" candidate.

The author doesn't deal with the problem that the "direct access to memory without abstractions" style they love so very easily drifts into undefined behaviour. It also optimizes very poorly due to lack of aliasing information, unless you enable type-based alias analysis in which case accidental undefined behavior is catastrophic.

The "dynamically checked C" they propose as a solution to C's safety issues is no solution at all until you can get it to work in practice. A lot of really great people, including the researchers he cites, have tried really hard to make it work and have failed in practice. He doesn't seem to understand why they failed or have any insight into how to overcome the problems.



I would change it to: some were meant to c


From the intro it's a reference to a song lyric (which I wasn't familiar with):

  While some were meant for sea, in tug-boats ’Round the shore’s knee,
  (Milling with the sand,
  and always coming back to land),
  For others, up above
  Is all they care to think of,
  Up there with the birds and clouds, and Words don’t follow.
  —Tiny Ruins, from “Priest with Balloons”


That came to mind immediately. :)


the other discussion mentions D as a better C.

Anyone ever try to build "C the good parts"?


They did. Many times.

First they made "C the good parts" by gathering all the various C bits then in use to make something portable and called it C89.

Then they made "C the good parts" by removing all the parts that made C slow to write and called it Perl.

Then they made "C the good parts" by making C easier but still fast and called it Java.

In the meantime, we've gotten Objective-C and C++, which were attempts to make C better while preserving backwards compatibility. C++ has gone on to spawn its own legacy of "No actually, these are the good parts" with D, Clay, Rust, and various half-steps along the way that want some of the features of C++ but not all of the features of C++.

I don't think they'll stop, because as it turns out, people use tools for different reasons and it's very rare that a general purpose tool solves your very specific problem perfectly. I'm inherently skeptical of any "X the good parts" because the "good parts" are domain specific.


I agree. If you take the tool perspective, it is easy to understand that different tools are needed for different tasks. Don't use hammer for sawing. No single language scales to all corners of programming problems.

Actually I really don't get this endless "silver bullet" discussion. C has its place and for very good reasons. Also the author makes really good points about the integration aspects of C.


I read this thought quite often here on hacker news but it is a bit alien to me. To me the charm of programming is the ability to---at least in theory---write anything. This is also embodied in the idea of Turing completeness.

There is a continuum between the possibilities of a programming language and the possibilities of a configuration file. As soon as one says that a programming language is suitable for a particular purpose one has moved a few steps towards the configuration file end of this continuum.

This is, of course, a personal preference but I very much prefer to enjoy the power of programming language as opposed to the lack thereof of configuration files. Therefore, I like programming languages that attempt to be useful for any purpose. My favorite language is still C++ and if I were to switch to something else I would be inclined towards Rust.


A bit of rope is charming because you can build practically anything out of rope. For example, you can build skyscrapers out of bamboo and rope. But if you want that skyscraper to be useful, durable, safe, etc it may be better to use the rope for what rope is best at, and use other materials and tools for what they're best at.


Somehow I was expecting Lisp instead of C++ after all you've said. Any specific reason why you're not using Lisp?


Well, the ability to program anything includes the possibility to do tasks that require high performance. Also, I do like the fact that the compiler guards the type correctness of the program.


If your requirements are only high performance and improved type checking, then you are all set. The only language you need is C++. But the criteria above does not define all programming problems.

C++ is too complicated. "Nobody" writes C++, but a subset of it. Select your subset.

Dynamic typing in Lisp (and the like) is really nice for quick prototyping where program correctness is not key. You are exploring what you want to achieve with the program, be it algorithm level or architecture. C++ is not well suited for that, since type checking and memory worries are slowing you down, too many details to drag along. Hence no "silver bullet".


Sure, C++ is complicated. And the way templates are compiled is positively weird, which in turn sometimes leads to long compilation times. The 'select your subset' thing is often repeated, but I do not see much of that. I am a bit worried, though, that C++ seems to keep growing to the point where it really might get to big and complicated.

Quick prototyping is not something that I do. Nor am I very much interested in it. At my place of work I have seen it done around me after which it was my task to turn the python prototype into C++. The most surprising thing there was how far the prototype turned out to be from what was actually needed, to the point where I very much question whether the prototyping exercise was useful at all. YMMV regarding this, of course.


A python prototype is useful for specifying the correct program behavior. It's useful as a guide for later implementors. Better than a spec in a word document.


Being the language to write UNIX clones, that is all about it.


So by your definition the windows kernel is unix?


Compared to the architecture of an OS targeting, say, hardware with native capability handling, Windows and Unix are virtually indistinguishable. They're both just "that ugly bag of compromises you make to abstract over a register machine that can peek and poke at an unprotected address space, with the OS managing an MMU through CPU-interrupt-triggered context switches to enable a kind of multitasking where each task perceives itself as living on a single core and is unaware of being pre-empted unless it checks for it after the fact."

There are examples of OSes that don't work like that, despite targeting x86. For example, bare-metal Forth.


No, because Windows has plenty of C++ on it and Microsoft Security Team advocates using C#, Rust and constrained C++ (Core Guidelines) as the future of systems programming on Windows.

https://github.com/Microsoft/MSRC-Security-Research/blob/mas...

They are also very clear that C is done on Windows, and compatibility is only to the extent required by ISO C++ and a couple of key FOSS projects.

https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...

UNIX is C's platform, there is hardly any reason to use it outside of non-UNIX OSes.

Plenty other languages offer system programming features, with better type safety and equal portability.


> Then they made "C the good parts" by making C easier but still fast and called it Java.

That one doesn't feel right. More like "figure out how to nudge the C++ crowd into the general direction of Smalltalk without them noticing".


Is java really smalltalk under the hood? Ruby certainly is. I always got the feeling that java was c++ with a bit more religion about OO, a well defined and platform independent standard library, and no multiple inheritance.


> I always got the feeling that java was c++ with a bit more religion about OO

No way. C++ was always much more into the functional programming paradigm. (See the STL, for example.)

The connection between C++ and OOP is because OOP was the insane hype at the time when C++ was being invented. The OOP lipservice was mandatory in order to be taken seriously by the fashion-driven programming industry, but real C++ programmers always looked down on OOP and considered it a code smell and crutch.


> The connection between C++ and OOP is because OOP was the insane hype at the time when C++ was being invented.

Not really, the C++ OOP features were directly modeled after very similar features in Simula. C++ was specifically designed as a way of bringing these sorts of features to C, although it did include other improvements to the language as well. Templates and the STL as we know it were a relatively late addition to the language.

It's also worth mentioning that there's only a handful of things about OOP that could be genuinely considered "a code smell"; in fact you could restrict that concern to one feature, viz. implementation inheritance. Object-based programming which follows the "composition over inheritance" guideline can still broadly tap into the improved-modularity benefits that 'objects' are generally known for.


> C++ was always much more into the functional programming paradigm. (See the STL, for example.)

Always? The STL was a last-minute addition to the standard library before C++98. Just a few years before, template implementations in compilers were buggy. There's a reason Qt has its own containers: because it's that old.

> The connection between C++ and OOP is because OOP was the insane hype at the time when C++ was being invented

No, it was because Bjarne wanted features from Simula whilst still generating fast code.

> real C++ programmers always looked down on OOP and considered it a code smell and crutch.

Absolutely not. Again, look at Qt. Look at CERN's ROOT. Java looks a lot like it does because that's how C++ code was written at the time. Even in the early 2000s I was getting funny looks from people when I told them to default to putting variables on the stack.


> The STL was a last-minute addition to the standard library before C++98.

C++ without a standard C++ library is not really C++.

> No, it was because Bjarne wanted features from Simula whilst still generating fast code.

Yeah, but Simula is somewhat its own thing, before the OOP madness.

> Again, look at Qt. Look at CERN's ROOT. Java looks a lot like it does because that's how C++ code was written at the time.

Only because C++ was the only thing available at the time, so people twisted it into 'OOP', despite the fact that C++ was very a poor fit for 'OOP'.


>> The STL was a last-minute addition to the standard library before C++98.

> C++ without a standard C++ library is not really C++.

You seem to be missing the point, which is that there was a time (two decades!) when the C++ standard library existed, but didn't include the STL.

> Only because C++ was the only thing available at the time, so people twisted it into 'OOP', despite the fact that C++ was very a poor fit for 'OOP'.

You are completely mistaken on your history here. C++ was intended to be "C with classes" from day one.


"OOP" is not "X with classes".

"Classes" is a low-level thing that you'd need for implementing many language features. Including things like 'abstract data types' of the ML kind.

Good C++ style has always viewed "OOP" as something highly suspect and hacky.

(This didn't apply to "classes" in the C++ vein, which are mostly about pre/post-conditions and RAII.)


Java is sneaky. It's a Smalltalk-y (ish) engine covered in a C++-y (ish) frontend language. This great talk (the first 20 minutes, really) describes how the original design was conceived: https://youtu.be/Dq2WQuWVrgQ


Is it really the case that calling a method in Java is the equivalent of passing a message under the hood? (I don't know how the VM is structured)


I am not sure how these two philosophical views differ, but the JVM is typed, so calling a non-existent method will result in an error when the class of the callee is loaded.


> It's a Smalltalk-y (ish) engine covered in a C++-y (ish) frontend language.

That sums it up quite well.


I'm skeptical about X the good parts, but hey "JavaScript the good parts" is really a thing in my experience.


I actually meant like a book like JavaScript: The Good Parts by Douglas Crockford. But thanks for your insight!


The best thing about C (apart from speed) is that for every issue with the language there is a tool that fixes the problem. It can do everything everywhere even if it's sometimes not the best choice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: