This is a myopic viewpoint. Undefined behavior is critical for code that needs t...

saagarjha · on Nov 4, 2018

> In this case, the UB is due to the compiler's ability to reorder statements.

No, the undefined behavior arises because dividing by zero is undefined. Languages wishing avoiding this particular bug can make division by zero defined to trap or have some sort of guaranteed behavior, after which the compiler is required to not reorder those statements. In this case reordering the instructions is legal because having undefined behavior in your program makes literally anything the compiler does legal.

banachtarski · on Nov 4, 2018

My point still stands in that I don't want the compiler to check for division by zero if I don't ask it to.

saagarjha · on Nov 4, 2018

Sure, then use C or C++ and check for it yourself. But if you mess up your program is invalid, so there's that. If you don't like that, write your own assembly by hand to convert this to implementation-defined behavior instead of undefined behavior.

lmm · on Nov 4, 2018

> The premise of languages like C and C++ is that the airgap between the language's abstract execution and memory model and the hardware's is thin to non-existent.

Which is not really the case for processors from the last 20+ years.

> In this case, the UB is due to the compiler's ability to reorder statements. This is such a fundamental optimization that I can't imagine you're really suggesting that a language without this optimization capability is a "problem." Rearranging instructions is critical for pretty much any superscalar processor (which all the major ones are), and I hate to imagine the hell I'd be in if I had to figure out the optimal load/store ordering myself.

UB is not necessary for that though. E.g. define that integer division by zero leads to the process receiving a terminal signal. That could be implemented just as efficiently (trapping on division by zero free at the hardware level in modern processors, the C standard gives broad freedom for signals to be received asynchronously so instructions could still be reordered), but would close the door to silent memory corruption and arbitrary code execution: unless the programmer has explicitly defined how the signal is handled, their program is noisily terminated.

kstenerud · on Nov 4, 2018

My point is that it is humanly impossible to write a bug-free program. In C and C++, bugs usually manifest themselves in UB.

To make matters worse, compilers, ever searching for diminishing returns on performance improvements, have been steadily making the CONSEQUENCES of UB worse, to the point that even debugging is getting harder and harder. These languages are unique in their growing user hostility.

llukas · on Nov 4, 2018

> growing user hostility

You have excellent tooling (*sanitizer, static analysis, valgrind, ${WHATEVER}) and abstractions provided to do handholding for you (ie. unique_ptr).

Most of that was created in last few years.

pjmlp · on Nov 4, 2018

Completely wrong.

Lint was created in 1979, during the mid-90's we already had Insure++, Purify and others, ages before of the free beer alternatives.

Yet being free still doesn't help the large majority of C and C++ to actually use them, as proven at one of CppCon talks, where a tiny 1% of the attendees confirmed using any kind of analysers.

gpderetta · on Nov 4, 2018

Lint is not a whole program static analyzer and being free is a big deal.

Your -second- third paragraph is, unfortunately, still correct though.

pjmlp · on Nov 4, 2018

Somehow other professionals manage to pay for what goes into their toolboxes, music case, kitchen knives, ....

minipci1321 · on Nov 4, 2018

How well the attendees of CppCon represent the developer base of C++ projects which are in dire needs of these tools?

Maybe they are so skilled the need does not arise?

jcranmer · on Nov 4, 2018

One of the most eye-opening papers in this regard was the integer overflow checking paper (which Regehr was coauthor on), which found that every library tested invoked undefined signed integer overflow. This includes SQLite, infamous for its comprehensive testing, and various libraries whose sole purpose was to check if an operation would overflow without invoking undefined behavior.

Belief that you are skilled enough to write C/C++ code that doesn't exercise undefined behavior either shows that you don't know what is undefined behavior or that you believe you are the best programmer to have ever lived.

esrauch · on Nov 4, 2018

Even people who work on the language spec can't really avoid writing code that hits UB without the help of sanitizers.

pjmlp · on Nov 4, 2018

The amount of CVEs found per month on highly skilled projecs, with deep review processes, like the Linux kernel proove otherwise.

As for how well, usually conferences like CppCon have the top of the tops.

nickpsecurity · on Nov 4, 2018

On Linux side, these slides I got from Alex Gaynor illustrate your point really well:

https://events.linuxfoundation.org/wp-content/uploads/2017/1...

There hasn't been much in terms of changes. Languages immune to classes of vulnerabilities by default and/or sound checkers that can catch all of them seem necessary. And by seem necessary, I mean massive, empirical evidence that most developers can't code safely without such alternatives even on critical, widely-used, well-funded projects.

banachtarski · on Nov 4, 2018

Well don't use it. You aren't the target customer because for me, fixing the performance bottleneck is a lot harder than finding a divide by zero, and I certainly don't want to pay for the compiler to check things like divide-by-zero without me asking it to. When I don't care about performance, I reach for a scripting language or something. It's a tool, don't get all emotionally worked up about it.

pjmlp · on Nov 4, 2018

We are entitled to be emotional about it, because we all have to use tools which have the misfortune to be written in C derived languages.

Even if I don't touch those languages for production code, my whole stack safety is dependent how well those underlying layers behave, and how responsibile the developers were towards writing secure core.

Which as proven by buffer oveflows in IoT devices not much.

xmiller · on Nov 4, 2018

So where is your production code written in Ada or Pascal?

pjmlp · on Nov 4, 2018

Ada cannot tell.

Pascal was replaced by Java and C#.

jacoblambda · on Nov 4, 2018

Ada absolutely can tell. Divide by zero throws an exception and buffer overflows are caught by essentially every modern compiler.

pjmlp · on Nov 4, 2018

Sure it can, but that wasn't the question. Rather what happened to the code I have written.

NDAs make us not able to tell about stuff.

What magic variant of C or C++ compiler are you using that throws errors on buffer corruption, unless you are speaking about using code with debugging mode enabled in production instead of using a proper release build.

jstimpfle · on Nov 4, 2018

> buffer overflows are caught by essentially every modern compiler.

If you use high-level arrays with bounds-checking, which are not always fast enough, and can become a maintenaince burden. If they should be absolutely secure (like, std::vector isn't - it can be moved behind your back), they also require GC.

dagenix · on Nov 4, 2018

> don't get all emotionally worked up about it.

Not constructive

pjmlp · on Nov 4, 2018

I was always able to write fast code in languages like Object Pascal and Ada, while being safe from C and C++ UB cargo cult.

jstimpfle · on Nov 4, 2018

Cannot speak for Ada. Only looked through a few tutorials once or twice and did not pursue further, I think mainly because I did not like the verbosity.

> Object Pascal

FWIW I've worked for 6 months with a large Delphi code base (which is Object Pascal right?) and I really wanted to like it (and I did like quite a few aspects to it). Note that I was employed to improve performance, and I was competent enough about performance to have achieved speedups of 100x to 1000x for the parts I was working on. So, I'm not saying you can't write performant code in Delphi. But here are a few annoyances I can remember:

- Need to typedef Type of Pointer-To-Type for anything before using it in function arguments: type Foo = record .. end; type PFoo = ^Foo; type PPFoo = ^^Foo. This is not only annoying. It's also extremely hard to read function signatures like function Foo(a: PFoo; b: PPBar) compared to function Foo(A: ^Foo; b: ^^Bar) IMHO.

- Pointer types not really type-checked. Why? It would be so easy to do.

- No macros, which are incredibly important (need experience to use them well? Yes, but still).

- Object oriented culture pervasively used in libraries (for example, deep deep inheritance trees), leading to bad maintainability and bad performance. Tons of badly thought out features, weird syntactic tricks. Reminds of C++.

Pretty sure there were more. As I said there are good things in it that are not in C, like better compilation performance and some parts to the syntax. But especially the first three are show stoppers for me. (The OO culture is painful as well, but you don't need to buy into it if you can do without the libraries).

Of course, the Delphi code I wrote is just as unsafe as if it was in C.

pjmlp · on Nov 4, 2018

> Delphi code base (which is Object Pascal right?)

Object Pascal was created by Apple with feedback from Niklaus Wirth for Lisa and Mac OS implementation.

Other Pascal vendors eventually copied the extensions, most notably Borland.

When they released Delphi, they kept calling it Object Pascal, although most of what was in Apple's MPW and Turbo Pascal variants is mostly legacy.

As for the rest I will have to agree to disagree.

- I love that I already had those OOP features and modules in 1992 vs bare bones C;

- I consider proper TDD (Type Driven Programming) a good practice;

- pointers are type checked, not sure what you mean here

- No macros is plus, the large majority at ISO C++ is creating safer alternatives for each use case

- OOP is quite useful in many use cases. I loved Turbo Vision and OWL.

> Of course, the Delphi code I wrote is just as unsafe as if it was in C.

Naturally one can disable all safety buttons and go full speed, but here lies the beauty of Algol linage of languages.

Type safe by default and if one really requires that extra mile, then escape hatches are in place.

Thing is, for 99% of most applications that is largely unnecessary.

jstimpfle · on Nov 4, 2018

> - pointers are type checked, not sure what you mean here

They weren't here with Borland. Maybe it was one of the many optional compiler switches, so I'll take that back.

> No macros is plus, the large majority at ISO C++ is creating safer alternatives for each use case

That's just wrong. Learn how to use the tool and use it when it makes sense. There are a LOT of situations where the easiest for maintainability by far is to abstract at the syntactic level. The Rust guys acknowledge this as well. Even the Haskell compiler (GHC) has a C-style preprocessor built-in. For example, I use macros for data definitions, to create identifiers, to create strings from identifiers, to automatically insert additional function call arguments like size information or source code location...

> that extra mile

You mean 100x - 1000x in speed?

> Thing is, for 99% of most applications that is largely unnecessary.

Your initial comment was explicitly about performance. And I disagree that 99% of applications should accept a 100x - 1000x decrease in speed (and harder maintenance by far, if you ask me!) (corollary: less purity and joy in programming, by far), or even a 10x for that matter, just to get some more safety. I mean, safety is nice and all, but it's not all the reasons why I'm doing this. YMMV.

EDIT: I now understand that you mean "99% of most applications", while I read it "99% of (or most) applications". I disagree with the implicit statement that you should write 99% in a "safe" language and the rest in a systems language. It is well known that you can never easily know where the bottleneck is or where the next will be. And it is well known that it is very laborious to accomodate multiple languages, or incompatible code styles, in a single project. (I've also heard some negative stories about integrated scripting languages, for example from the games industry, but I don't work there...)

And in the end, speed is actually not the primary reason why I prefer C...

pjmlp · on Nov 5, 2018

Macros can be easily replaced by other means, that is what Java and .NET do via annotations and compiler plugins.

Rust macros are tree based and integrated into the compiler, not a text substitution tool running before the compiler, which cannot reason about what is happening.

I don't get where that 100x - 1000x factor comes from, most compiled languages aren't that slower than C, specially if you restrict to ISO C.

If C extensions are allowed in the benchmarks game, then those other languages also have a few tricks up their sleeves.

For example, I can disable all checks in Ada via Unchecked_Unsafe pragmas and in the end there will hardly any difference to generated C code.

The big difference is that I am only doing that in that function or package that is required to run like a 1500cc motorbike at full throttle to win the benchmark game, while everything else in the code is happy to run under allowed speed limits.

jstimpfle · on Nov 5, 2018

> Macros can be easily replaced by other means, that is what Java and .NET do via annotations and compiler plugins.

There are cases where text substitution is the right thing to do. How do you do the things I mentioned, for instance? Java in particular is an infamous example, requiring lots of hard to maintainable boilerplate. Tooling helps in some cases to write it, but can't help reading it, right?

Some examples from my current code

    #define MAKE(x, y, z) [x] = { #x, y, z }

    #define MSG_AT_EXPR(...) _msg_at_expr(__FILE__, __LINE__, __VA_ARGS__)

    #define PARSE_LOG() \
            if (doDebug) \
                    MSG_AT(lvl_debug, currentFile, currentOffset, \
                           "%s()\n", __func__);

    #define BUF_RESERVE(buf, alloc, cnt) \
            _buf_reserve((void**)(buf), (alloc), (cnt), sizeof *(buf), 0, \
                         __FILE__, __LINE__);

    #define CLEAR(x) mem_fill(&(x), 0, sizeof (x))
    #define SORT(a, n, cmp) sort_array(a, n, sizeof *(a), cmp)

    #define RESIZE_GLOBAL_BUFFER(bufname, nelems) \
            _resize_global_buffer(BUFFER_##bufname, (nelems), 0)

In Delphi I've had to manually write all these expansions, resulting in less maintainable code. Go look in the linux kernel, I'm sure there are tons of examples that you'd be hard pressed to replace by a cleaner or safer specialized means.

> I don't get where that 100x - 1000x factor comes from, most compiled languages aren't that slower than C, specially if you restrict to ISO C.

It's not so much about the language, but what you do with it and how you structure your code. Or, actually, how you structure the data. OOP culture is bad for performance.

If I were to chose the single best resource for this kind of argument, that would be Mike Acton's talk from CppCon 2014 on youtube. If you want to watch that. Note that I'm not his fanboy. These are experiences I've made on my own to a large degree. And the arguments apply just as well to maintainability if you ask me.

> The big difference is that I am only doing that in that function or package that is required to run like a 1500cc motorbike at full throttle to win the benchmark game, while everything else in the code is happy to run under allowed speed limits.

And so the rebuttal is: No. If your code is full of OOP objects you can micro-optimize a certain function like crazy, but the data layout and the order of processing are still wrong.

To give another anecdata, for my bachelor's thesis I had to write a SAT solver for clauses of length <= 3 in Java. I modeled clauses as POD objects holding 3 integers (the 2nd and 3rd of which could be -1). My program could do about 10M clauses before memory was getting tight and it was doing only GC for at least a minute before it would finally die. Note that all objects are GC'ed reference types in Java (as you probably know).

I then converted it to highly unidiomatic Java by allocating 3 columns of unboxed integers of length #clauses, instead of allocating #clauses POD objects. The object overhead went away, so I could do about twice as many clauses before memory was used up. And when it was used up, since there was basically no GC overhead, the program died immediately (after a few seconds of processing, without a minute of GC). The downside was that maintainability was drastically worse since I was using only the most primitive building blocks of Java, and none of its "features".

If that had been in C, I could have just stayed with the first approach, since C has only "value types". It would have been performant from the start. C would have yielded a more maintainable program since I would not have had to fight the language. I could also have chosen the second approach, and it would have been easier to write and read than the Java code (which required tons of boilerplate).

pjmlp · on Nov 5, 2018

You know that Mike Acton is now working on Unity's new C# compiler team, right?

And yes he is also having a role on the new ECS stack, which is just another form of OOP, for anyone that bothers to read the respective CS literature.

Had you implemented your SAT solver in a GC language like Oberon, Modula-3, Component Pascal, Eiffel or D, among many other possible examples, then you wouldn't need such tricks as they also use value types by default, just like C.

jstimpfle · on Nov 5, 2018

I know and as far as I know he's trying to improve performance there. If you actually bother to watch the video you will find him ranting hard against mindless OOP practices.

ECS as I understand it is pretty much not OOP. My idea of it I would call Aspect-oriented, i.e. extracting features from artifacts and stuffing them in global tables, which of course separate data by shape. If you look on wikipedia, the first association you will find is also Data-oriented programming (the term from the talk; it is about tables-shaped and cache-aware programming and I believe it was also coined by Mike Acton).

Data-oriented programming stands particularly opposed to OOP which the games industry has found to scale miserably.

pjmlp · on Nov 5, 2018

Yes, I did watch that talk back when he did it. I always follow CppCon talks.

Then you should also watch the talks he did later at Unite, after joining Unity.

As I mentioned regarding ECS, on CS literature.

For example,

"Component Software: Beyond Object-Oriented Programming"

https://www.amazon.com/Component-Software-Object-Oriented-Pr...

First edition (1997) used Component Pascal, C++ and Java, while the 2nd edition replaced Component Pascal with C#.

"Component-Based Software Engineering: Putting the Pieces Together"

https://www.amazon.com/Component-Based-Software-Engineering-...

ECS and Data-oriented programming aren't the same thing.

jstimpfle · on Nov 5, 2018

"Unity at GDC - A Data Oriented Approach to Using Component Systems" https://www.youtube.com/watch?v=p65Yt20pw0g

xmiller · on Nov 4, 2018

But the authors of large, popular code bases chose C or C++.

Anyone is free to rewrite Apache in Ada, but for some reason it isn't happening.

the_why_of_y · on Nov 4, 2018

Do so many people write code in JavaScript today instead of countless other high level languages because JavaScript is technically superior and better designed than any other language, or because browsers and the web provide an ubiquitous runtime platform?

pjmlp · on Nov 4, 2018

Ever heard of this little thing called money?

Since free UNIX brought C into the masses, and Bjarne made C++ as a means to never have to touch bare C after his encounter with BCPL, many people have choosen this languages because they were a language with an OS SDK.

So now unless we get some nice lawsuits, companies will keep picking the easy path.

jstimpfle · on Nov 4, 2018

So, are you saying it's been difficult to get Ada or Free Pascal or Java up and running on a Unix system in the last 10 or 20 years?

pjmlp · on Nov 4, 2018

What I am stating is that to re-write existing systems, regardless how rotten they might be, someone needs to pay for the work to happen.

What many tend to forget on those "X re-written in Y" posts.

Pay Per Hour * Total Hours Effort = Money Spent on Rewrite

Additionally what I am saying is that languages that come with the OS SDK have first class privileges and experience shows that 99% of the companies won't look elsewhere.

For example, in commercial UNIX days, you would get C and C++ compilers as part of the base SDK. The vendors that had support for additional languages like Ada, had them as additional modules.

So any dev wanting to push for language X needed to make a case why the company needed to put extra money on the table instead of going with what they already got.

A similar process happens on mobile and Web platforms nowadays, you either go with what is available out of the box or try to shoehorn an external toolchain and then deal with all integration issues and additional development costs ($$$) that might arise.

jstimpfle · on Nov 4, 2018

Many many free software projects are started by people who don't get paid for it. Those people start their project in whatever language they want. If someone wants to write webserver software in Java or an OS in Object Pascal, they can do it.

Successful projects may get financial support from companies later. I doubt that these companies are overly selective towards "obviously bad languages". I don't buy that there are any mechanisms in place to get cynic or outraged about. Maybe it _is_ just that some languages are more productive.