Hacker News new | past | comments | ask | show | jobs | submit login

And therein lies a major problem with c and c++ compilers:

It's effectively impossible to write bug free code. Bugs in c and c++ usually trigger undefined behavior. It is therefore impossible to write a conforming program, which makes any guarantees in the spec meaningless.

I've hit heisenbugs like these that only trigger when optimized, and resist write(), out(), fflush(), etc and it's infuriating.

Or even worse: programs that no longer work when compiled on a newer compiler. With other languages, you are at least spared from this kind of code decay.

But everyone's writing compilers to the spec so tough :/




This is a myopic viewpoint. Undefined behavior is critical for code that needs to be fast. The premise of languages like C and C++ is that the airgap between the language's abstract execution and memory model and the hardware's is thin to non-existent.

*(edit accidentally submitted early)

In this case, the UB is due to the compiler's ability to reorder statements. This is such a fundamental optimization that I can't imagine you're really suggesting that a language without this optimization capability is a "problem." Rearranging instructions is critical for pretty much any superscalar processor (which all the major ones are), and I hate to imagine the hell I'd be in if I had to figure out the optimal load/store ordering myself.


> In this case, the UB is due to the compiler's ability to reorder statements.

No, the undefined behavior arises because dividing by zero is undefined. Languages wishing avoiding this particular bug can make division by zero defined to trap or have some sort of guaranteed behavior, after which the compiler is required to not reorder those statements. In this case reordering the instructions is legal because having undefined behavior in your program makes literally anything the compiler does legal.


My point still stands in that I don't want the compiler to check for division by zero if I don't ask it to.


Sure, then use C or C++ and check for it yourself. But if you mess up your program is invalid, so there's that. If you don't like that, write your own assembly by hand to convert this to implementation-defined behavior instead of undefined behavior.


> The premise of languages like C and C++ is that the airgap between the language's abstract execution and memory model and the hardware's is thin to non-existent.

Which is not really the case for processors from the last 20+ years.

> In this case, the UB is due to the compiler's ability to reorder statements. This is such a fundamental optimization that I can't imagine you're really suggesting that a language without this optimization capability is a "problem." Rearranging instructions is critical for pretty much any superscalar processor (which all the major ones are), and I hate to imagine the hell I'd be in if I had to figure out the optimal load/store ordering myself.

UB is not necessary for that though. E.g. define that integer division by zero leads to the process receiving a terminal signal. That could be implemented just as efficiently (trapping on division by zero free at the hardware level in modern processors, the C standard gives broad freedom for signals to be received asynchronously so instructions could still be reordered), but would close the door to silent memory corruption and arbitrary code execution: unless the programmer has explicitly defined how the signal is handled, their program is noisily terminated.


My point is that it is humanly impossible to write a bug-free program. In C and C++, bugs usually manifest themselves in UB.

To make matters worse, compilers, ever searching for diminishing returns on performance improvements, have been steadily making the CONSEQUENCES of UB worse, to the point that even debugging is getting harder and harder. These languages are unique in their growing user hostility.


> growing user hostility

You have excellent tooling (*sanitizer, static analysis, valgrind, ${WHATEVER}) and abstractions provided to do handholding for you (ie. unique_ptr).

Most of that was created in last few years.


Completely wrong.

Lint was created in 1979, during the mid-90's we already had Insure++, Purify and others, ages before of the free beer alternatives.

Yet being free still doesn't help the large majority of C and C++ to actually use them, as proven at one of CppCon talks, where a tiny 1% of the attendees confirmed using any kind of analysers.


Lint is not a whole program static analyzer and being free is a big deal.

Your -second- third paragraph is, unfortunately, still correct though.


Somehow other professionals manage to pay for what goes into their toolboxes, music case, kitchen knives, ....


How well the attendees of CppCon represent the developer base of C++ projects which are in dire needs of these tools?

Maybe they are so skilled the need does not arise?


One of the most eye-opening papers in this regard was the integer overflow checking paper (which Regehr was coauthor on), which found that every library tested invoked undefined signed integer overflow. This includes SQLite, infamous for its comprehensive testing, and various libraries whose sole purpose was to check if an operation would overflow without invoking undefined behavior.

Belief that you are skilled enough to write C/C++ code that doesn't exercise undefined behavior either shows that you don't know what is undefined behavior or that you believe you are the best programmer to have ever lived.


Even people who work on the language spec can't really avoid writing code that hits UB without the help of sanitizers.


The amount of CVEs found per month on highly skilled projecs, with deep review processes, like the Linux kernel proove otherwise.

As for how well, usually conferences like CppCon have the top of the tops.


On Linux side, these slides I got from Alex Gaynor illustrate your point really well:

https://events.linuxfoundation.org/wp-content/uploads/2017/1...

There hasn't been much in terms of changes. Languages immune to classes of vulnerabilities by default and/or sound checkers that can catch all of them seem necessary. And by seem necessary, I mean massive, empirical evidence that most developers can't code safely without such alternatives even on critical, widely-used, well-funded projects.


Well don't use it. You aren't the target customer because for me, fixing the performance bottleneck is a lot harder than finding a divide by zero, and I certainly don't want to pay for the compiler to check things like divide-by-zero without me asking it to. When I don't care about performance, I reach for a scripting language or something. It's a tool, don't get all emotionally worked up about it.


We are entitled to be emotional about it, because we all have to use tools which have the misfortune to be written in C derived languages.

Even if I don't touch those languages for production code, my whole stack safety is dependent how well those underlying layers behave, and how responsibile the developers were towards writing secure core.

Which as proven by buffer oveflows in IoT devices not much.


So where is your production code written in Ada or Pascal?


Ada cannot tell.

Pascal was replaced by Java and C#.


Ada absolutely can tell. Divide by zero throws an exception and buffer overflows are caught by essentially every modern compiler.


Sure it can, but that wasn't the question. Rather what happened to the code I have written.

NDAs make us not able to tell about stuff.

What magic variant of C or C++ compiler are you using that throws errors on buffer corruption, unless you are speaking about using code with debugging mode enabled in production instead of using a proper release build.


> buffer overflows are caught by essentially every modern compiler.

If you use high-level arrays with bounds-checking, which are not always fast enough, and can become a maintenaince burden. If they should be absolutely secure (like, std::vector isn't - it can be moved behind your back), they also require GC.


> don't get all emotionally worked up about it.

Not constructive


I was always able to write fast code in languages like Object Pascal and Ada, while being safe from C and C++ UB cargo cult.


Cannot speak for Ada. Only looked through a few tutorials once or twice and did not pursue further, I think mainly because I did not like the verbosity.

> Object Pascal

FWIW I've worked for 6 months with a large Delphi code base (which is Object Pascal right?) and I really wanted to like it (and I did like quite a few aspects to it). Note that I was employed to improve performance, and I was competent enough about performance to have achieved speedups of 100x to 1000x for the parts I was working on. So, I'm not saying you can't write performant code in Delphi. But here are a few annoyances I can remember:

- Need to typedef Type of Pointer-To-Type for anything before using it in function arguments: type Foo = record .. end; type PFoo = ^Foo; type PPFoo = ^^Foo. This is not only annoying. It's also extremely hard to read function signatures like function Foo(a: PFoo; b: PPBar) compared to function Foo(A: ^Foo; b: ^^Bar) IMHO.

- Pointer types not really type-checked. Why? It would be so easy to do.

- No macros, which are incredibly important (need experience to use them well? Yes, but still).

- Object oriented culture pervasively used in libraries (for example, deep deep inheritance trees), leading to bad maintainability and bad performance. Tons of badly thought out features, weird syntactic tricks. Reminds of C++.

Pretty sure there were more. As I said there are good things in it that are not in C, like better compilation performance and some parts to the syntax. But especially the first three are show stoppers for me. (The OO culture is painful as well, but you don't need to buy into it if you can do without the libraries).

Of course, the Delphi code I wrote is just as unsafe as if it was in C.


> Delphi code base (which is Object Pascal right?)

Object Pascal was created by Apple with feedback from Niklaus Wirth for Lisa and Mac OS implementation.

Other Pascal vendors eventually copied the extensions, most notably Borland.

When they released Delphi, they kept calling it Object Pascal, although most of what was in Apple's MPW and Turbo Pascal variants is mostly legacy.

As for the rest I will have to agree to disagree.

- I love that I already had those OOP features and modules in 1992 vs bare bones C;

- I consider proper TDD (Type Driven Programming) a good practice;

- pointers are type checked, not sure what you mean here

- No macros is plus, the large majority at ISO C++ is creating safer alternatives for each use case

- OOP is quite useful in many use cases. I loved Turbo Vision and OWL.

> Of course, the Delphi code I wrote is just as unsafe as if it was in C.

Naturally one can disable all safety buttons and go full speed, but here lies the beauty of Algol linage of languages.

Type safe by default and if one really requires that extra mile, then escape hatches are in place.

Thing is, for 99% of most applications that is largely unnecessary.


> - pointers are type checked, not sure what you mean here

They weren't here with Borland. Maybe it was one of the many optional compiler switches, so I'll take that back.

> No macros is plus, the large majority at ISO C++ is creating safer alternatives for each use case

That's just wrong. Learn how to use the tool and use it when it makes sense. There are a LOT of situations where the easiest for maintainability by far is to abstract at the syntactic level. The Rust guys acknowledge this as well. Even the Haskell compiler (GHC) has a C-style preprocessor built-in. For example, I use macros for data definitions, to create identifiers, to create strings from identifiers, to automatically insert additional function call arguments like size information or source code location...

> that extra mile

You mean 100x - 1000x in speed?

> Thing is, for 99% of most applications that is largely unnecessary.

Your initial comment was explicitly about performance. And I disagree that 99% of applications should accept a 100x - 1000x decrease in speed (and harder maintenance by far, if you ask me!) (corollary: less purity and joy in programming, by far), or even a 10x for that matter, just to get some more safety. I mean, safety is nice and all, but it's not all the reasons why I'm doing this. YMMV.

EDIT: I now understand that you mean "99% of most applications", while I read it "99% of (or most) applications". I disagree with the implicit statement that you should write 99% in a "safe" language and the rest in a systems language. It is well known that you can never easily know where the bottleneck is or where the next will be. And it is well known that it is very laborious to accomodate multiple languages, or incompatible code styles, in a single project. (I've also heard some negative stories about integrated scripting languages, for example from the games industry, but I don't work there...)

And in the end, speed is actually not the primary reason why I prefer C...


Macros can be easily replaced by other means, that is what Java and .NET do via annotations and compiler plugins.

Rust macros are tree based and integrated into the compiler, not a text substitution tool running before the compiler, which cannot reason about what is happening.

I don't get where that 100x - 1000x factor comes from, most compiled languages aren't that slower than C, specially if you restrict to ISO C.

If C extensions are allowed in the benchmarks game, then those other languages also have a few tricks up their sleeves.

For example, I can disable all checks in Ada via Unchecked_Unsafe pragmas and in the end there will hardly any difference to generated C code.

The big difference is that I am only doing that in that function or package that is required to run like a 1500cc motorbike at full throttle to win the benchmark game, while everything else in the code is happy to run under allowed speed limits.


> Macros can be easily replaced by other means, that is what Java and .NET do via annotations and compiler plugins.

There are cases where text substitution is the right thing to do. How do you do the things I mentioned, for instance? Java in particular is an infamous example, requiring lots of hard to maintainable boilerplate. Tooling helps in some cases to write it, but can't help reading it, right?

Some examples from my current code

    #define MAKE(x, y, z) [x] = { #x, y, z }

    #define MSG_AT_EXPR(...) _msg_at_expr(__FILE__, __LINE__, __VA_ARGS__)

    #define PARSE_LOG() \
            if (doDebug) \
                    MSG_AT(lvl_debug, currentFile, currentOffset, \
                           "%s()\n", __func__);

    #define BUF_RESERVE(buf, alloc, cnt) \
            _buf_reserve((void**)(buf), (alloc), (cnt), sizeof *(buf), 0, \
                         __FILE__, __LINE__);

    #define CLEAR(x) mem_fill(&(x), 0, sizeof (x))
    #define SORT(a, n, cmp) sort_array(a, n, sizeof *(a), cmp)

    #define RESIZE_GLOBAL_BUFFER(bufname, nelems) \
            _resize_global_buffer(BUFFER_##bufname, (nelems), 0)
In Delphi I've had to manually write all these expansions, resulting in less maintainable code. Go look in the linux kernel, I'm sure there are tons of examples that you'd be hard pressed to replace by a cleaner or safer specialized means.

> I don't get where that 100x - 1000x factor comes from, most compiled languages aren't that slower than C, specially if you restrict to ISO C.

It's not so much about the language, but what you do with it and how you structure your code. Or, actually, how you structure the data. OOP culture is bad for performance.

If I were to chose the single best resource for this kind of argument, that would be Mike Acton's talk from CppCon 2014 on youtube. If you want to watch that. Note that I'm not his fanboy. These are experiences I've made on my own to a large degree. And the arguments apply just as well to maintainability if you ask me.

> The big difference is that I am only doing that in that function or package that is required to run like a 1500cc motorbike at full throttle to win the benchmark game, while everything else in the code is happy to run under allowed speed limits.

And so the rebuttal is: No. If your code is full of OOP objects you can micro-optimize a certain function like crazy, but the data layout and the order of processing are still wrong.

To give another anecdata, for my bachelor's thesis I had to write a SAT solver for clauses of length <= 3 in Java. I modeled clauses as POD objects holding 3 integers (the 2nd and 3rd of which could be -1). My program could do about 10M clauses before memory was getting tight and it was doing only GC for at least a minute before it would finally die. Note that all objects are GC'ed reference types in Java (as you probably know).

I then converted it to highly unidiomatic Java by allocating 3 columns of unboxed integers of length #clauses, instead of allocating #clauses POD objects. The object overhead went away, so I could do about twice as many clauses before memory was used up. And when it was used up, since there was basically no GC overhead, the program died immediately (after a few seconds of processing, without a minute of GC). The downside was that maintainability was drastically worse since I was using only the most primitive building blocks of Java, and none of its "features".

If that had been in C, I could have just stayed with the first approach, since C has only "value types". It would have been performant from the start. C would have yielded a more maintainable program since I would not have had to fight the language. I could also have chosen the second approach, and it would have been easier to write and read than the Java code (which required tons of boilerplate).


You know that Mike Acton is now working on Unity's new C# compiler team, right?

And yes he is also having a role on the new ECS stack, which is just another form of OOP, for anyone that bothers to read the respective CS literature.

Had you implemented your SAT solver in a GC language like Oberon, Modula-3, Component Pascal, Eiffel or D, among many other possible examples, then you wouldn't need such tricks as they also use value types by default, just like C.


I know and as far as I know he's trying to improve performance there. If you actually bother to watch the video you will find him ranting hard against mindless OOP practices.

ECS as I understand it is pretty much not OOP. My idea of it I would call Aspect-oriented, i.e. extracting features from artifacts and stuffing them in global tables, which of course separate data by shape. If you look on wikipedia, the first association you will find is also Data-oriented programming (the term from the talk; it is about tables-shaped and cache-aware programming and I believe it was also coined by Mike Acton).

Data-oriented programming stands particularly opposed to OOP which the games industry has found to scale miserably.


Yes, I did watch that talk back when he did it. I always follow CppCon talks.

Then you should also watch the talks he did later at Unite, after joining Unity.

As I mentioned regarding ECS, on CS literature.

For example,

"Component Software: Beyond Object-Oriented Programming"

https://www.amazon.com/Component-Software-Object-Oriented-Pr...

First edition (1997) used Component Pascal, C++ and Java, while the 2nd edition replaced Component Pascal with C#.

"Component-Based Software Engineering: Putting the Pieces Together"

https://www.amazon.com/Component-Based-Software-Engineering-...

ECS and Data-oriented programming aren't the same thing.


"Unity at GDC - A Data Oriented Approach to Using Component Systems" https://www.youtube.com/watch?v=p65Yt20pw0g


But the authors of large, popular code bases chose C or C++.

Anyone is free to rewrite Apache in Ada, but for some reason it isn't happening.


Do so many people write code in JavaScript today instead of countless other high level languages because JavaScript is technically superior and better designed than any other language, or because browsers and the web provide an ubiquitous runtime platform?


Ever heard of this little thing called money?

Since free UNIX brought C into the masses, and Bjarne made C++ as a means to never have to touch bare C after his encounter with BCPL, many people have choosen this languages because they were a language with an OS SDK.

So now unless we get some nice lawsuits, companies will keep picking the easy path.


So, are you saying it's been difficult to get Ada or Free Pascal or Java up and running on a Unix system in the last 10 or 20 years?


What I am stating is that to re-write existing systems, regardless how rotten they might be, someone needs to pay for the work to happen.

What many tend to forget on those "X re-written in Y" posts.

Pay Per Hour * Total Hours Effort = Money Spent on Rewrite

Additionally what I am saying is that languages that come with the OS SDK have first class privileges and experience shows that 99% of the companies won't look elsewhere.

For example, in commercial UNIX days, you would get C and C++ compilers as part of the base SDK. The vendors that had support for additional languages like Ada, had them as additional modules.

So any dev wanting to push for language X needed to make a case why the company needed to put extra money on the table instead of going with what they already got.

A similar process happens on mobile and Web platforms nowadays, you either go with what is available out of the box or try to shoehorn an external toolchain and then deal with all integration issues and additional development costs ($$$) that might arise.


Many many free software projects are started by people who don't get paid for it. Those people start their project in whatever language they want. If someone wants to write webserver software in Java or an OS in Object Pascal, they can do it.

Successful projects may get financial support from companies later. I doubt that these companies are overly selective towards "obviously bad languages". I don't buy that there are any mechanisms in place to get cynic or outraged about. Maybe it _is_ just that some languages are more productive.


| It's effectively impossible to write bug free code.

What does that even mean? It's impossible to have bug free code in any language. Bug in the libraries, the compiler, in the OS, in the hardware...


That's actually good your code breaks using new compilers as it means the code is bogus anyways. The alternatives to UB is either a strict spec that will be dog slow on $arch or abstract away everything and make it complex.


Being dog slow is very much dependent on the use case.

Yes it will be slower than taking every shortcut in the name of performance.

What really matters is, does the execution speed and memory footprint meat the requirements?

If the user is happy to get their data in 100ms, with a requirement of 300ms max, getting it in 10ms is hardly an advantage.


If you can deliver responses 10x faster than required, you can scale with 10x fewer resources (or at least presumably some factor of fewer resources) and still stay within the user requirements.

I'd say that's a nice advantage.


These most recent optimizations are nowhere near 10x faster. The only one that can even come close is autovectorization and that can be done without heavy reliance on UB.

If your loop is worth autovectorizing the 2 instructions to check for pointer aliasing, and other showstoppers is not material.


Not everyone is going to be the next Google, Facebook, Crytech, EA, ...

There are better ways to waste money than spending it on YAGNI features.


Consider Parkinson's law... Compare git to mercurial... Consider how many successful Java command line programs are there?


There are plenty of them at the enterprise level.

Just a couple of months ago I re-wrote several Korn shell scripts doing ETL related tasks into a couple of saner Java CLI programs at customer request.


And the responsiveness is good? Isn't there always this terrible startup lag? Could we rewrite the C implementation of e.g. git in Java to get a program that is just as fast (e.g., instantaneous response for most operations)?


Surely, you know that there are native code compilers for Java, right?


Is this practical? What are the limitations? Is this Java spirit? Why doesn't everybody use it? What does this do about startup time?

My initial point in the meantime was only that performance does matter. And that for some reason I cannot recall any (open source or free software) CLI programs written in Java from the top of my head. While there are free Java implementations easily available no?


Many don't use them, because they are commercial tools, and many developers nowadays don't like to pay for software.

The only limitation is that for reflection code one needs to white list which classes end up on the binary.

All major third party commercial JVMs always had the capability of AOT to native code, it was just tabu at Sun.

Oracle has other point of view thus kept the Maxime project alive, rebranded it into Graal, and now those that don't like to pay for developer tools can also enjoy AOT compilation to native code via SubstrateVM, GraalVM and Graal integration into OpenJDK.

Just Windows support is not yet fully done for the time being.

Their long term roadmap is to rewrite the C++ parts of OpenJDK in Java itself, also known as Project Metropolis.

OpenJDK 11 also adopted a feature already common in another commercial JVMs, which allows a JVM to dump a JIT image before existing. Which then allows for AOT like startup on the 2nd execution onwards.

Also Java isn't the only safe alternative to C, those that don't mind lack of generics can just pick Go instead of dealing with C.

Which then we already have several high profile projects using it, including Google's exploratory microkernel based OS.


Correct, it depends on the use case. Nobody is saying you should write everything in C++. But there is plenty of software that actually uses your CPU for data processing where speed is absolutely crucial. If you know that every second (or 100 ms) saved in every functionality your software offers will eventually matter, would you still choose the more pessimistic performance guarantee?

Did you know (at the point where the entire architecture was chosen) that you will get 100 ms with a safer language? What if you got 1000 ms and C++ got 100 ms?


Thing is, C and C++ aren't the only language with those capabilities, e.g. CPU for data processing.

What to live on the danger zone and disable bounds checking on e.g. Turbo Pascal?

Surround the critical performance code path with {$R-} and {$R+}, while enjoying safe bounds checking everywhere else.


This isn't necessarily true. Many compiled languages these days have a strong specification that guarantees a lack of undefined behavior for the vast majority of code, yet remain relatively performant–think Swift, Rust, and the like.


"Undefined behavior" doesn't mean "buggy". It simply means stuff that's CPU-specific or compiler-specific. C++ has a standard, unlike languages like Rust or Python. This is a good thing, because compiler-specific crap doesn't magically go away if you avoid standards and just declare one implementation as a "reference".


> "Undefined behavior" doesn't mean "buggy". It simply means stuff that's CPU-specific or compiler-specific.

Yes, it does. The behavior you are describing is "implementation specific", and it is ok to have this in your program provided you know what your implementation will do. It is illegal to have any undefined behavior in a well-formed C/C++ program.


The standard doesn't defined well-formedness, nor does it consider undefined behaviour illegal necessarily. It simply has nothing to say about what happens when undefined behaviour is invoked.

And it is OK to invoke it if you know what your implementation will do. The standard even gives documenting the behaviour as an option.

(I wonder how many people worry about supplying clang a file that doesn't end in a new line? That is undefined behaviour, and yet you know exactly what's going to happen: you'll get a warning, if compilation continues the code will build as if the new line were there, and clang won't delete your source file, even though it would be perfectly within its rights to.)


> I wonder how many people worry about supplying clang a file that doesn't end in a new line? That is undefined behaviour

It no longer is, since C++11. Check Phase 2.2 here: https://en.cppreference.com/w/cpp/language/translation_phase...

> clang won't delete your source file, even though it would be perfectly within its rights to.

UB allows the execution of the compiled program to wipe your hard drive, but it certainly does not give your compiler that right. I mean, the standard doesn't say what side effects invoking a compiler is allowed to have (because that's out of scope), so none of it can be predicated on UB.

(Mandatory mention: https://github.com/munificent/vigil)


Thanks for the clarification. An outbreak of good sense? I hope it's contagious. It is still undefined behaviour in C11.

Not sure I agree with you about UB - missing line-endings is a parse-time issue, so the intent appears to be that the compiler (or interpreter) is free to do what it likes even at this stage, before your program is even ready to run.


When people talk of UB in C++ they always mean "implementation specific".


No, they don’t. These are separate concepts with different behavior. “Undefined behavior” is illegal to have in C/C++ programs and includes things like division by zero and out-of-bounds array access. “Implementation specific” behavior is legal but allowed to differ, such as querying the size of an int.


You really don't need to go very far to get into UB instead of implementation defined: signed int overflow is already UB and the program is permitted to do literally anything if you ever accidentally make one of your ints too large.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: