Knowing how exactly sequence points work might be an advanced topic, but I am not sure I'd hire a junior C/C++ programmer if he has not heard of sequence points. If somebody has, then he clearly won't do something like this code. I am not 100% sure in how they work either (haven't programmed C for 4 years), so I just avoid these kind of unnecessary complications in the code, if not for myself, then for the guy who comes after me, and who, if he has a bad day, may inadvertently cause some trouble.
Really, that's one of the reason why Java still works well at big companies. The code is usually so verbose and 'un-smart' that it's quite hard for programmers with minority complex to obfuscate it when they're trying to prove that they're smart. It's a shame however that if the language is primitive then you can always create a big framework to obfuscate stuff. :)
I am not 100% sure in how they work either (haven't programmed C for 4 years), so I just avoid these kind of unnecessary complications in the code
Hear, hear. I am considered the "language lawyer" of my embedded group and often get asked questions about C minutiæ. It isn't uncommon that my answer is "I don't know how that works, because I would never write something that requires an answer to that".
Modern compilers are blind to shorthands etc.; the only useful knowledge about "C" (really C compiler & hardware) minutiæ relates to (a) how to convince the compiler to optimize certain high-level constructs (e.g. loop unrolling), (b) how to convince the compiler to emit certain low-level constructs (e.g. SIMD instructions), and (c) how the hardware behaves (e.g. the cache model). Generally anything else – you can rewrite it so you don't have to think too hard about how C works.
EDIT: Understanding type promotion is an exception to this. The integer type hierarchy is unfortunately (a) deeply baked into C and (b) mostly brain-dead – C mostly conflates physical integer width with modular arithmetic and provides no non-modular integer types, often leading to subtle software bugs (see last week's story about binary search).
EDIT: So is understanding const-ness. At least you can ignore this if you don't get it (or if C doesn't, as is the case with certain nested const types).
Const-ness is really different. It's a concept I really miss from other languages as it _adds_ more semantic information to the code. It does not obscure, it clarifies. It's a way to give orders instead of giving recommendations.
I really liked const-ness as well; in functional languages const-ness comes for free as all state is by default immutable -- one the reasons I jumped to Scala.
Just FYI for readers, the term "sequence point" has been deprecated in C++11. There is now a richer notion of operations being sequenced before/after each other or unsequenced. This motivation for this was to improve C++'s memory model in the face of concurrency. (It does not change the results of this discussion, though :) )
What an annoying post. Talks like he knows a lot, when he doesn't know this most basic of basic points of how C works. It's understandable to not know everything, even to have the odd hole in your basics like this. But the know-it-all voice (and for such a basic topic) "I explained operator precedence..." is just going to make you someone nobody wants to work with. This is not undefined behavior, it's one of the first things you learn about C after encountering the pre/postfix operators IF you pay attention to the details, which, with C, you should know to do.
> This is not undefined behavior, it's one of the first things you learn about C after encountering the pre/postfix operators IF you pay attention to the details, which, with C, you should know to do.
What exactly is not undefined behavior? The post describes a statement where a variable is modified multiple times between sequence points. What does the standard say about this?
N1256, 6.5 says
Between the previous and next sequence point an object
shall have its stored value modified at most once by the
valuation of an expression.
J.2 Undefined behavior
The behavior is undefined in the following circumstances:
[..]
* Between two sequence points, an object is modified more
than once, or is modified and the prior value is read
other than to determine the value to be stored (6.5).
I think they are the only thing relevant in the article. The introductory example is just one example of how not knowing about sequence points leads to bugs. There are many more and it makes for a much more interesting read to first learn about sequence points and then applying that knowledge to various examples.
There was a popular myth way back in the 90's that preincrement was always faster than postincrement because the former could generate a temporary. As if this were the case:
I always thought this "myth" was confined to C++ code (iterators) where there may in fact be code that does something analogous to what your "postincrement()" does.
Compilers are sometimes smart enough to remove the unnecessary operation in C++ (e.g. switch to ++x themselves), but I always use ++i for the same reason people simplify their usage of C in between sequence points --- it's an easy transformation, and why tempt fate?
The compiler can't necessarily optimise i++ away in C++, because the ++operator or the destructor of the temporary iterator object returned may have side effects. It's still not quite equivalent to ballards postincrement() though, because the C++ standard allows for that temporary copy operation to be optimised away
I was actually asked in an interview at Google in 2011 whether pre- or postincrement was faster in a snippet of code where the expression's value wasn't used. When I said that I was pretty confident that the compiler would produce the same assembly in both cases, the interviewer insisted that pre- was faster based on the same logic you mentioned.
To her credit, she didn't penalize me for disagreeing with her and I still got the offer.
This is one of those things I've had in my head for years, and as a result of this I habitually use pre-increment in code, even in other languages. I think I haven't actually believed the "myth" myself, though, for quite a while.
it's a little weirder though.
int main(int argc, char argv)
{
int a = 0;
printf("%i %i %i\n", a, a++, a++);
}
will give "0 0 1" (gcc 4.2.1), the increment "shouldn't" happen until after the ; if you're going with post statement. I think your rule would expect "0 0 0" with a being 2 after the printf.
The issue here is that the order of function argument evaluation is undefined. If the commas weren't inside of a function call, they would result in sequence points. So, this would be well defined:
int a = 0, b, c;
b = a++, c = a++;
But, there is no defined order in which to evaluate function arguments.
The example given in the article shows all the code, but your example involves a function so the code in question is hidden. The behaviour partially depends on how the function is written.
"0 0 1" is what I would expect from your code, and identifier a should end up as 2.
Practically every new version of gcc adds a new optimization that recognizes some new form of undefined behavior and then rewrites your function to do whatever it wants.
Classic example: signed integer overflow. It worked for decades. Then one day it didn't.
That isn't really an accurate description of the issues: it's not the case that it was "working" and then "broken" by GCC maintainers. It's always been unsupported and not worked in specific situations, but for 99% of code it appeared to GCC users that it was supported. The signed integer overflow behaviour was never guaranteed by old versions of GCC, and code exploiting it would not always be compiled with the "expected" behaviour. It's just as the GCC optimizer has improved there are more circumstances when it does optimisations that hinge on the assumption.
Compiler users generally want something that "just works" and doesn't do anything unexpected, but in the case of a low-level language like C, doing away with undefined behavior essentially would mean pessimistically avoiding many optimisations on the 99%+ of straightforward, reasonable code out there in favor of not doing anything surprising on the remaining fraction of dubious code that depends on certain things happening in scenarios where behavior is undefined according to the C standard. There are languages that make that choice, but C isn't one of them.
That is my point. De facto working code is not working code.
I have fixed feelings about the integer overflow issue because it's so easy to trigger, unlike triple post increment fake examples. And it usually results in a security problem. For very little benefit, IMO.
Why? What reason could there possibly be that such well known, gaping holes in the standard have persisted for all these years and multiple standard revisions?
C90, C99, C11 and now C14... If performance or backward compatibility is a concern, surely an optional macro like __STDC_STRICT__ could be suggested in the standard with clear expectations that could be relied on.
That's a reasonable question. It's a shame this comment gets downvoted instead of the various people falling all over themselves to show off how smart they are by posting incorrect guesses about how undefined behavior doesn't matter.
Here's a good article that addresses the question "Why have undefined behavior?"
People seem to be forgetting that an implementation includes an execution environment as well as just a compiler, and that a perfectly valid (and, to my mind, very much recommended) option for undefined behaviour is to behave in some documented manner characteristic of the platform. I don't see this "no obligation" stuff as a great idea, not when it results in so much surprising behaviour - and especially not for simple situations where there's something obvious that the compiler could do, that would surprise absolutely nobody familiar with the system in question.
I always imagined undefined behaviour was there to avoid tying implementations' hands, by having the standard not mandate things that vary in practice. But it seems that people are assuming it's there to give compiler writers carte blanche to do whatever they like, and then point at the standard as justification. Given how much stuff could potentially be added to C to improve it, I don't know why people are spending all this time trying to figure out all the ways in which the letter of the law allows them to confuse the programmer.
I observed this when I was back in school using a simpler expression
i = 0;
i = i++;
the answer will be different in different compilers. I use this as an interview question ever since, not to get the right answer but to understand the candidates thought process in solving the problem and his/her understanding of operator precedence :-)
One of the things we train any newcomer to our company is to never do what this man does:a complex single line with expected behavior based on your (arbitrary) conventions.
Yo always do multiple lines, with comments on each one.
It sounds ridiculous, but this simple thing made some kind of bugs impossible: those that you have in front of you but you can't see in a million years. The atomic operation in code is the line, you can't debug a complex line(1).
It is painful forcing people to do that, people use to hate being told what to do, but at the same time they love the outcome so much. In the end everybody loves it.
1.With assembly you are debugging an instance of your code. You are not debugging what will be created with any compiler, any os or architecture.
While I agree the code presented is just, plain and simple, bad, and the first way to the solution is splitting it up, I really don't get why you want comments for every line? Or don't you mean literally every line maybe?
I mean suppose I split the first part of the statement like this:
int r = a[ i ];
++i;
These lines are self-explanatory. It's pretty hard to argue they need a comment? Especially when used in a function that already should have a name/comment explaining what it does?
I like using it when I'm filling an array with an indeterminate number of elements. I've used this pattern a few times:
foo foos[255];
int ct = 0;
if (<condition for adding first element>)
foos[ct++] = foo1;
if (<condition for adding second element>)
foos[ct++] = foo2;
...
for (int i = 0; i < ct; i++) {
/* do something with foos[i] */
}
IMHO, this is more concise and readable than separating the increment and assignment into two lines.
would count as three lines for purposes of that rule, and the
i++
as one. I could have been more precise, but I thought it would be understood.
The point is, to be on the safe side, don't use ++/-- in any line (in the sense of ;{}-delimited statement) that is also doing something else.
>Don't forget that =, += , ... can all be abused in the same way.
I didn't, and people should use the same safeguards around them, i.e. don't mix them within lines that do other things, "cleverness" or "C golf" be damned.
That's really all there is to it, isn't it? I've never understood why some people feel the urge to modify the same variable multiple times in the same statement. It doesn't help readability, it's harder to reason about and worst of all leads you directly into the land of undefined behavior, where everything that can go wrong, will go wrong.
Yeah, the need to shove as much code as possible into one line seems to go away with experience. Once you have to debug and maintain that crap you realize very quickly that one liners are not necessarily something to strive for.
I used to strive for clever solutions. The shorter the better. I still like to look at them but I want them as far away as possible from my code base. Sure, if a solution is clever and efficient, while still readable, maintainable and easy to understand when documented properly, I'd use it. However, my general rule of thumb these days is: "Write this code, so an utter idiot can read it". Because that's what's going to happen: One day I'll have to read it again, and I'll feel like an idiot, trying to understand what once went through my brain.
That's something that took me way too long to learn and I feel that novice programmers should embrace that more. We all like to feel like Einstein sometimes, but a big code base is not the place for solutions only we can understand... temporarily.
I think I stumped into this once, but didn't have the time atm to check what the problem was... I simply rewrote the code to remove the infixes. The good thing with C is that you can decide not to shoot yourself in the foot
Isn't this actually called "implementation-defined" behaviour in the std rather than "undefined" behaviour? I generally get a segfault for undefined behaviors
It's undefined. That means the compiler and runtime can choose to segfault, do something that looks right, silently corrupt your program, launch the nuclear missiles, whatever. And there's no requirement that this behavior be repeatable, so the code can work with debug flags turned on but fail in a release build.
You can never rely on your compiler's implementation of undefined behavior.
> You can never rely on your compiler's implementation of undefined behavior.
I wish I could somehow get my coworker to understand this. He's mostly "Yeah, well, I know it's undefined, but there's really no way this could be anything else than <foo>. I know what the CPU does there."
I believe this is undefined because the standard doesn't say what a "sequence point" is and not "a sequence point is implementation defined". I'm going from second-hand knowledge and it's been years since I looked at C (and never past a high school level of understanding).
No. You might be right if we were talking about "proper use" in English as a spoken language, but "undefined behavior" is a technical term defined by the C and C++ ISO standards.
Really, that's one of the reason why Java still works well at big companies. The code is usually so verbose and 'un-smart' that it's quite hard for programmers with minority complex to obfuscate it when they're trying to prove that they're smart. It's a shame however that if the language is primitive then you can always create a big framework to obfuscate stuff. :)