Hacker News new | past | comments | ask | show | jobs | submit login
Named Booleans prevent C++ bugs and save you time (raymii.org)
112 points by signa11 on Feb 18, 2023 | hide | past | favorite | 164 comments



The only problem with this approach is that you evaluate all the conditions, even if you don't need to. Sometimes that's inefficient, and sometimes it's invalid, as in

    char *p = NULL;
    ...
    if (p && *p != ' ') { ... }
If you write

    bool p_non_null = (p != NULL);
    bool char_non_space = (*p != ' ');
    if p_non_null && char_ non_space { ... }
Oops.


Back in mid-00s I was thinking about languages and features. One of the keywords I still miss in mainstream is “alias”. Alias is a symbol that only evaluates lazily.

  alias not_null = (bool)p;
  alias not_space = (*p != ' ');

  if (not_null && not_space) {
This may bite you in another way, but living in a world where everything gets evaluated asap-y isn’t syntactically easy either.

Would be also nice to have this:

  alias __cache foo = <heavy(expr())>;

  if (foo && foo.length && foo[0].x) {
`foo` only evaluates once.


I do this in c++ quite often with short lambdas

  auto not_null = [&]{ return (bool)p; };
  auto not_space = [&]{ return (*p != ' '); };

  if (not_null() && not_space()) {
It's not quite as ergonomic but it works.

As an aside. I was curious if the compiler will optimize in the short circuit even if you extract the eagerly evaluated named booleans. However I was pretty sure the rules of c++ are against you and tests show this to be the case

But if you simulate the alias keyword with lambdas then the result after optimization are the same

https://godbolt.org/z/3cczcnK4s


>I was curious if the compiler will optimize in the short circuit even if you extract the eagerly

It will optimize, but not necessarily what you wanted... Because null dereference is UB (with default compiler flags), it's allowed to elide null check.


Isn't alias handled by the C pre-processor's #DEFINE operation?

  #DEFINE NOT_NULL(x) (bool)x
  #DEFINE NOT_SPACE(x) ( ' ' != *x )
  // ...
  if ( NOT_NULL(v) && NOT_SPACE(v) ) { ... }


Technically, this is true, but most C++ programmers are very reluctant to use macros except for stuff like include guards or stuff like NDEBUG.

The reason is mainly the many language construct of C++ cover most traditional use cases of macros when programming C while at the same time being much less error prone than macros.

A sibling show how one easily can achieve the very same thing using short lambdas instead. That variant is also bound to the scope one is currently in, and in order to have that safety with a macro one would need to undefine the macros afterwards. One would also need to check that they are not yet defined beforehand, and if so restore the previous macro afterwards in order to not break completely unrelated code in case of macro name clashes.


Yep, #defines are non-lexical and add way too much clutter to be useful in this case, even when you’re non-reluctant in general.


It is not a direct implementation, and may end up being clunky, but I expect a fairly small library of helper functions could improve ergonomics quite a bit:

Define your booleans as functions of no args. A sample in Python:

    var = 'abc'
    var_contains_a = lambda: 'a' in var
    var_shorter_than_5 = lambda: len(var) < 5

    if var_contains_a() and var_shorter_than_5():
        print(var)


One challenge with the chosen example language, Python, is that functions are truthy. So if you forget to call the function that computes your boolean test, you end up with a default true.

Taking the last top-level statement, the `if`:

    if var_contains_a() and var_shorter_than_5: #note the lack of parens for the second
        print(var)
This will print a string of any length that contains 'a' (or any other collection that implements `in` and does, in fact, contain an element, 'a'). The second function is passed directly as a boolean expression, and being a function, it resolves to true.


I was curious about the concept of inverting this – requiring that all variables lazily evaluate, and in the case that evaluation is needed now, some keyword obtains it. But when I think about it, that's sort of just `await`.

I'm curious as to if it's feasible to syncretise these two aspects of variable evaluation into a shared feature that makes async and sync code feel alike.


It’s called lazy evaluation but it isn’t really anything like await. Await blocks on a computation that is already running; it doesn’t begin evaluation only when you await it.


> I was curious about the concept of inverting this – requiring that all variables lazily evaluate, and in the case that evaluation is needed now, some keyword obtains it.

That's how Haskell works. Everything is lazily evaluated, and in the case that evaluation is needed now, the "seq" function obtains it (https://wiki.haskell.org/Seq).


Scala has lazy val


In any sane language you can just do this using a lambda. Unfortunately C is just broken so :(


The correct rewriting would be:

    bool char_non_space = p_non_null && (*p != ' ');


but then 'char_not_space' is a misleading name, as it could denote the presence of a space due to p being null


I understand your "invalid" point. Your example dereferences a potentially null pointer.

But I don't understand your "inefficient" point. At the end of the day, everything gets turned into SSA. So expression-per-line and nested-expressions generates the same code, right?


It's the same idea, that the && short circuits.

    bool dirty = ...;
    bool success = doExpensiveCalculation(...);
vs

    if (dirty && doExpensiveCalculation)
    {


Not necessarily; in first case, the stuff after the `&&` will definitively not be run. In the second case, it depends on how the code is optimized, and there's no guarantee that things moved to intermediate variables won't still be evaluated even if they're never used. Since the compiler presumably can't know the value of the boolean at runtime, it would need to reorder things to avoid doing the unnecessary work, and I wouldn't think it's something that you could just assume.


Future programming languages could have (IIRC some current PLs allow for such syntax, some of them I believe using 'let'):

    if (p_non_null = (p != NULL)) && (char_non_space = (*p != ' ')) { ... }
But where the IDE only shows the LHS of the expression as:

    if p_non_null && char_ non_space { ... }
But also where the IDE makes it easier to write the earlier full expression above.

Also, the initial code:

    bool p_non_null = (p != NULL);
    bool char_non_space = (*p != ' ');
    if p_non_null && char_ non_space { ... }
This could simply be analyzed by the compiler to not store the named boolean variables in the first place, and to evaluate them inline in the if-condition so as to allow for the short-circuiting to occur fine.


Don't they just want a function with a clear name that returns a boolean value, something to mask the implementation details so it's not part of your cognitive load at that level of abstraction.


This is a good point. I think that if you really want something like these named bools but which keeps the same lazy evaluation behavior, you'd have to use preprocessor macros then.


I think that these bundled conditions are bad, and that having negative booleans is terrible. Using and almost abusing early returns on bad input almost always leads to easier to follow code.

    if (someLongNamedVarIsNotUnknown && someLongNamedMapCountIsZero)
        return false;
    else
        return true;
This is not the same, but I'd suspect there might be a bug in the code above

    if (someLongNamedVarIsUnknown) {
      return false;  // Can't do much
    }
    
    if (someLongNamedMapCountIsZero) {
      return false;  // Nothing to do
    }
    
    // Do stuff
    return true
The named booleans suggested here help adding clarity, but more often than not they'll end up being single-use (const) variables that just paraphrase their definitions. In these cases I'd rather avoid introducing a variable, and rely in comments if I really need to explain something.


> The named booleans suggested here help adding clarity, but more often than not they'll end up being single-use (const) variables that just paraphrase their definitions. In these cases I'd rather avoid introducing a variable, and rely in comments if I really need to explain something.

Interesting. With the same axioms I come to the opposite conclusion. That comments are to be avoided except when strictly necessary, and giving names to concepts via intermediate const variables is one of the best weapons in that fight. Ie. make your code read as close to what the comment would be (but simply via naming) makes it very clear when the names no longer match the behavior, in contrast to comments which have a tendency to bit rot.


This is what I used to love about Obj-C - idiomatically written Obj-C code was always so much easier to read and understand without needing comments.


> The named booleans suggested here help adding clarity, but more often than not they'll end up being single-use (const) variables that just paraphrase their definitions.

That's ok. The worst tradeoff in the "named booleans" pattern proposed in the blog post is that it misses the whole point of chaining predicate evaluations, which is leveraging short-circuiting to prevent further predicates from being evaluated.

Chaining logical comparisons and the way they short-circuit is specially important when a predicate has preconditions and/or is expensive and/or has side-effects.

If your predicates are hard to read you don't fix that mess with "named booleans". You fix them by extracting them to a pure function like

   isNotUnknown(_someLongNamedVar) && isNotEmpty(_someLongNamedVar)


Nest the if statements then, perhaps in an else branch. Relying on short circuiting to avoid side effects or improving performance meaningfully is pretty shaky.


> Relying on short circuiting to avoid side effects or improving performance meaningfully is pretty shaky.

Says who? IME relying on short circuiting is useful and idiomatic in C and C++ code, e.g.:

    if (callback && callback->run()) { ... }
In your opinion would it be better to write:

    if (callback) {
      if (callback->run()) {
        ..
      }
    }
The situation is even more awkward if you try to avoid || short-circuiting, since you need to duplicate the body of the conditional.


I tend to agree with toxic here. Side effects inside conditionals should be avoided, in short circuit conditionals even more so.

In a case like this, my preference would be to give callback a dummy value and always run it.

I subscribe pretty heavily to the Carmack philosophy of always running the code as similarity as possible and making a decision at the end of a procedure.


I think side effect is a red-herring here: it's idiomatic to rely on short circuiting when the RHS relies on the LHS being true to be safe to call at all.

In the above example, we check that the pointer is non-null before calling it. The "side effect" we are avoiding is a segfault.

In the code that splits the conditional into two, the callback is still run inside a conditional, just by itself.


> in short circuit conditionals even more so.

I'm curious, what leads you to believe that skipping unnecessary calls to functions with side effects is something to be avoided? Do you believe that unconditionally calling code with side effects specially when you don't have to is something to be desired?


I believe that running the same code path as much as possible per software cycle is to be desired. The code I'd call in the case that client code didn't specify a function wouldn't produce any side effects. It would be an empty function object/pointer.

I would almost always rather support default behavior through dummy data rather than null checks. I find that a lot of software bugs come from the unintended consequences of branching. Especially branches which create state that is consumed later.

In fact, I find that most null checks lead to bad software behavior as it often hides bugs. Early outting when null is detected is especially silly to me. The client has just tried to run a procedure which had to be abandoned due to a lack of data and you're just going to quit silently?

If you're using dummy data for default scenarios and asserting not null rather than checking, you will eliminate most logical null checks in your code. Though I do concede that in the case where you must support null checks, such a short circuit as above is acceptable and in fact preferable to me over two if statements.


I subscribe to the idea that your code should state your intent as closely as possible.

I question the idea that you would ever want the very linchpin of your function hidden away as a second subclause of an if statement. A function that calls callbacks is probably mainly doing that, let it stand proud and clear in the code text.

If your function is calling callbacks left and right “on its way somewhere else”, I think a refactor is in order.

Maybe you will now present some second scenario with something other than callbacks. Maybe in that case you’ll have me convinced it’s an okay exception. For you see, I said only it’s “pretty shaky”, not a mortal sin.

With that said though, I think in 9/10 cases, it’s something you can solve with an early return.

    if (!callback)
        return;

    callback();


> In your opinion would it be better to write:

in mine, yes definitely 100% the second code is more readable than the first


> in mine, yes definitely 100% the second code is more readable than the first

If anything, adding a nesting level for each chained predicate makes code harder to read, and needlessly so.


It makes code a tiny bit more verbose but much more readible and most of all easily extendible without having to worry about bracket positions


> It makes code a tiny bit more verbose (...)

The issue is not verbosity. It's the needless addition of nesting levels and independent expressions in a non-idiomatic way, which greatly increase the cognitive load of checking a single row of chained predicates. It's bad code.


for me the cognitive load of

    if(an expression) {
      if(another expression) {
      }
    }
is MUCH lower than

    if(an expression && another expression) {
    }
which always raise the questions of:

- did the author think about operator precedence

- did the author want to use && or &

- did the author think about short-circuiting (as evidenced in this very thread, a lot of people do not)


And not to forget, is it okay to omit parentheses, does && take precedence over any other symbol in the expression?


> Relying on short circuiting to avoid side effects or improving performance meaningfully is pretty shaky.

You've got it all backwards. Short-circuiting is not used to avoid side effects. Short-circuiting is used to avoid needless calls to any predicate, which can have and often do has side-effects, and if it doesn't have today it might have tomorrow.

If you don't call them when you don't have to, your code is in a better shape.

And let's not pretend that unconditionally macode with side effects when you don't have to is a good practice.


I usually prefer to write more readable code, and then optimize later if necessary. If the short circuit becomes critical for performance then I would leave a comment, or as the parent suggests just order it with an early return.


> If the short circuit becomes critical for performance the

The point is not performance. The point is readability, and expressing things clearly and with the appropriate level of detail and leaving out nesting and scope. Safety and performance are important but secondary advantages.


True, functions with return values are so much better in such a case. I really miss if expressions and switch expressions in most languages. Most functional languages (and also c#) have them. Something like (let’s do JavaScript in this example):

  const value = if (a) {
    return true
  } else {
    return switch (b) {
      case 1: return false
      default: return undefined
    }
  }


Ada calls these "if expression" and the switch-variant "case expressions". It's the same syntax as the control flow ones, just wrapped with parens.

    value := (if a then True
              else (
                 case b is
                    when 1 => False,
                    when others => Undefined
              ));


Interestingly Rust does this and I agree JavaScript with const feels only 80% there without this feature.


As an aside for those wondering what the name for this is to see what other languages support it, this is a case 'if' being an expression rather than a statement which allows it to be composed within other expressions.


Exactly. Here is an overview of switch expressions for c#, for many cases you can replace if/else if/else with those. The pattern matching is very powerful too:

https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...


"If/else expressions", as opposed to if/else merely doing control flow. Really surprising that they took so long to become mainstream, yet still more than a decade after I started using them I still occasionally miss the more "expression-like" look of the ternary in languages that don't offer both.


A predicate?


No, if, and match, and most things are expressions, and thus have a (typed) value.

In a typical programming language we can write something like x = 5; assigning 5 to the variable x. That 5 there is an expression, we can write x = (3 * 2) - 1; with the same effect, that's a more complicated expression. We can call a function here: x = f(3); or if our language has methods we can even call those: x = some.function(parameters);

In Rust I can write x = if it.should_be_nine() { 9 } else { 5 }; // because if is an expression

Since Rust's loops are also expressions, x = loop { let stuff = get_stuff(); if stuff.is_good() { break stuff } }; // ... once the stuff is good, break out of the loop and we'll assign that stuff to x


Understood, these things are expressions, but must they nit evaluate to a boolean in order for if/switch/match to branch correctly?

My informal understanding of a predicate may need refining.


I think the thing you're missing is that in some languages we're allowed to use these constructs as expressions. Not that they take expressions which is common in modern non-toy languages, but that they themselves are expressions.

It might be you've just always used languages where you can do this, or that you assumed you can but actually you never tried in a language you use.

In C for example we can't write this:

  int x = if (1 == 2) { 3 } else { 4 };  // Not valid C syntax
C provides a special operator, the only one in the language which has three operands and thus it's often just called "the ternary operator" although other ternary operators can technically exist, to achieve the same thing:

  int x = (1 == 2) ? 3 : 4;  // Valid C syntax using ternary operator
But, how about this?

  int x = switch 3 { case 1: 4; case 2: 8; default: 0; };   // Can't do this either
There's no way to express that in C or C++ today. The closest is the "Immediately Invoked Lambda" from C++ which is illustrated nearby in this thread. Barry Revzin apparently proposed a way to do this in C++ 26 because C++ wants to have pattern matching and pattern matching kinda sucks without this feature.


Apparently C++ 26 might get statement expressions in some form, it sounds like Barry Revzin sketched something at Issaquah (the recent WG21 meeting) which can do this.


you can do this in c++ as well with a lambda

  const value = [&]() {   
    if (a) {
      return true;
    }
    return switch (b) {
      case 1: return false;
      default: return undefinied;
    }
  }();


An immediately invoked lambda pattern, but, check out how ugly that was compared to:

  let value = if a { true } else { match b { 1 => false, _ => undefined } };
And notice that because now we're a function our relationship to the world changed, how does your immediately invoked lambda do this?

  let value = if a { true } else { match b { 1 => false, 2 => return foo, _ => undefined }};
With expression syntax returning from the function we're in is fine† but in your lambda you can't do that, you need control flow logic outside the lambda to handle it.

† You might wonder about the inferred type of this expression since it might return. The type of the return expression in Rust is ! aka Never, which is an empty type. In the same way that N + 0 == N in integer arithmetic, any type plus an empty type is the original type again in type arithmetic, so the type of the expression isn't altered.


Sure, in a lot of languages this is possible, but usually much slower, even slower than named functions. Can the c++ compiler in-line the lambda and remove the function call completely?

Still the syntax is a bit confusing. In ML languages (f# and also rust are somehow descendants of ML) you can just use if .. else if .. else expressions. They work like ternaries! But easier readable. Nested ternaries are useful, but get really hard to read very fast. (x = condition1 ? A : condition2 ? B : C;)


> Can the c++ compiler in-line the lambda and remove the function call completely?

LLVM and GCC can at least.


And indeed almost exactly like that in JavaScript - just substitute ()=> for the [&]()


You can’t directly call a function definition in js, so it will look like this:

  const value = (()=>{
    …
  })()


early return solves so many readability problems. i structure my code in favor of it.


Or just put parentheses around your operands defensively even if by precedence rules not necessary as some style guides demand... went from hating and finding it too verbose to accepting and seeing advantage in that.

if ((_someLongNamedVar != FooLongNameEnum::Unknown) && (_someLongNamedMap.count (_someLongNamedVar) == 0))

really not too bad? And easily enforcable and automated checkable by your favourite tool..

Named vars for subexpressions often helpful too.. but especially here you loose short-circuiting?

(Not very C++ specific topic btw..?)


The mistake in the article was a ")" slipped before "== 0". Exactly the same mistake can happen in your code. Your extra parentheses won't make it less likely.


Disagreed, I spotted that in a split second immediately by following that now ingrained rule.


It’s not clear to me how “add more parens” is necessarily an effective solution to avoiding a misplaced paren, as was the case here.


See above, sure you should also structure/format appropriately, but it then similarly sticks out by the same argument that makes it easier by shortening another exoression to a name.


I'm a massive fan of using parentheses even when not strictly necessary. I find it helps me visually parse the code better then a long chain of boolean expressions. I don't bother with it when a term in the expression is simple, e.g

  if (b1 && (b2 != b3))
rather than

  if ((b1) && (b1 != b3))


I think the discussion is silly.

99.9% when the condition is "too hard to read" it's because the condition is too hard to read, period, and the correct fix is to restructure the code to have readable branch conditions, not to throw parentheses on there to make a monster expression slightly less confusing.

It's like discussing the chemistry of air fresheners instead of taking the trash out more often.


I'm in the opposite camp: more names means more identifiers to keep track of while reading in case they might be referenced more than once, even if it's just a short but nonetheless distracting check "ok, that's where the scope ends".

There are certainly times when I take the name approach nonetheless, but more like a last resort, or when I expect full contact stepthrough with a debugger. If I deem it likely that I will get by with merely brackets and copious amounts of whitespace, that's what I'll do.


Lol no, since when is a simple expression of four operands "(a someoo b) && (c someop d)" a too hard to read condition, I could only accept at least 5 .. at least ;)


Do you think that any other fix that makes it readable is incorrect?


You can also sometimes combine the techniques in the article above to skip booleans entirely. Instead of passing in a `bool primeMember` that gets used in a calculation like

    deliveryTime = 3;
    if (primeMember) {
        deliveryTime = 1;
    }
you can pass in a `Membership memberType` value that contains the required information, so the above conditional turns into just the assignment

    deliveryTime = memberType.deliveryTime;
This is more amenable to extension and contraction: you can add or remove different classes of delivery times without changing the logic in which it's used.

You can even use private constructors and public static fields to create a list of "constants" containing the known member types, like the `Membership.REGULAR` and `Membership.PRIME`, retaining the property of the code not being able to use an invalid membership type.

This can be used also for more complicated conditionals like the one in the article, by having a factory method that copies the right "constant" composite value based on whatever logic is desired.


This assignment-then-conditional-reassignment is admirably compact and, in languages like C/Java might well be the most elegant way to handle this.

On the downside, this breaks single-assignment, binding style variables. That is, now deliveryTime can’t be final/val/const.

I wish I had a better solution. Lisp did this so well with if/cond/etc as expressions.


Since nobody seems to have done the traditional "blame the programming language" yet, let me get it out of the way: this particular bug wouldn't have happened in a language which doesn't have implicit conversion between boolean and number. The compiler would reject the code both because the numeric result of ".count(...)" is being treated as a boolean by the "&&", and because the boolean result of the "&&" is being treated as a number by the "== 0".

One example of a language without that implicit conversion is Java. It's fairly annoying to have to put a "!= 0" whenever you want to convert a number to a boolean, but it does avoid a lot of programming mistakes within if conditions.


That's the way to go.

But it is sad that in $current_year, we're still fighting boring, solved problems.


Meanwhile in Python land, this is not a common class of bugs even though this implicit boolification happens.


Lol python is the king of these types of errors. People chose the unsafe because it’s easy when we know types are one of the best tools engineers can have.


Static typing is orthogonal to this issue though. This is more about strong typing. Even a statically typed language could coerce values, e.g. the empty string, to a Boolean false at runtime. I agree that that’s undesirable though. Static and strong typing is the driest, least flexible, but also easiest to get correct way.


My only explanation for that may be because Python has more severe and commonly occuring issues?


I wrote a (len(foo == 0)) instead of (len(foo) == 0) in Python yesterday. I don't understand why you would say this.


… which is exactly what you don’t need to do in Python, because that is just `if foo`, thus sidestepping the issue


I've been bitten by it in Python. It isn't quite up there with PHP style weak typing combining int and string with + but feels similar, even if you can make some modulo 2 arithmetic argument that it should work like other mixed precision stuff.


If only there was a language that defines explicit conversions between booleans and numbers.


Don't they all have == and != though?


Sure, but in say Rust we can write for example ((v.is_empty() as i16) - 10) and Rust is OK with that, and you get either -9 if it was empty or -10 if it was not empty whereas if you didn't have that explicit cast, that's a bool not an integer, and the bool type doesn't have arithmetic so you can't subtract ten from it.


Why not use the ternary operator instead: if v.is_empty() { -9 } else { -10 }


Sure, although that's not a ternary operator, it's just an if expression.

In practice you tend to see such casts in code where we directly needed the integer. I think Clippy (Rust's linter) proposes them where you've written if some_bool { 1 } else { 0 } because well, that's what the as cast does anyway.


I find the if expression easier to read to be honest. With the integer cast example you need to mentally translate the booleans and do a subtraction in your head to figure out what integer values are involved, with the if expression it's directly visible.


Or you compile with -Wall -Werror, as you should do for serious projects.

I'm not saying C++ is a great language, it's not, but if you remove the secu then shoot you in the foot, it's your fault.


> Or you compile with -Wall -Werror, as you should do for serious projects.

-Wall -Werror does not catch the error described in the blog.

It does get caught by a static analyzer though [0].

[0]: https://gcc.godbolt.org/z/nK7MdEcMj


clang-tidy is still probably the best static analyzer for c++, g++ needs to catch up as so far its -fanalyzer is really just for c, not c++


I would add that adding lots of checks can cause clang-tidy to have serious performance problems.

The software I'm working on is safety-critical. I enabled every relevant checker relevant for certificates and tons of additional checkers related to usability and readability. clang-tidy now takes 3x the time it takes to compile :S I ended up splitting off a few checkers in particular into their own separate job.

Some checkers are also quite broken. bugprone-unchecked-optional-access is very broken. It caused hangs, it caused crashes, it caused out-of-memory issues. Fortunately, our use of std::optional can mostly be refactored away but doing so is a significant change in the codebase.


You may also need -Wconversion


Linters like cland-tidy help a lot with finding implicit bool conversions, among other common mistakes. Could even hook to patchset-received to ensure linters are passing ;)


Is it possible to enable a warning in C++ for these cases?


Given that there is a multitude of C++ compilers: Depends on your compiler. But in general, most compilers have a warning for this sort of thing.


1. Not in GCC.

2. This would be caught by static analyzer

See Compiler Explorer example here [0]

[0]: https://gcc.godbolt.org/z/nK7MdEcMj


If you want, you can disallow implicit conversion between boolean and number using a linter.


Or with -Wimplicit.


that's only for c, not for c++


I imagine that you'd get a load of false positives due to textual inclusion of header files.


Compilers have no trouble ignoring header files outside of your project for the sake of warnings with -isystem instead of -I, it's not different for linters


> Compilers have no trouble ignoring header files outside of your project for the sake of warnings with -isystem instead of -I, it's not different for linters

In this [0] example I shared in other related comments, the static analyzer suppressed warnings in such headers.

> 126 warnings generated.

> Suppressed 124 warnings (124 in non-user code).

[0]: https://gcc.godbolt.org/z/nK7MdEcMj


Any linter that gets confused by includes would be rather useless, it couldn't even print the correct line numbers for any errors it finds.


real-world cpp lives together with static analysis, which obviously catches this


It's not really a new technique. It's pretty common in large scale software development to break down complex logic like this. The reader will note that the possible loss in performance is usually more than worth the gain in readability, and hence makes the code much more desirable at scale.


Pretty sure that the compiler optimizes this and there is no performance loss.


You're wrong.

  if (cheapCheck() && expensiveCheck()) { ... }
is not equivalent to

  bool cheap = cheapCheck();
  bool expensive = expensiveCheck();
  if (cheap && expensive) { ... }
unless we're in a lazy pure functional language.


That depends. In some cases the compiler will be able to determine that `expensiveCheck()` has no side effects and make them equivalent. But you can't really rely on it.

And in any case the author misidentified the problem and solution. The problem is that C++ coerces bool to int. I'm 99% sure there's a warning for that that you can turn into an error.


Ah, the mythical sufficiently smart compiler! http://wiki.c2.com/?SufficientlySmartCompiler


That's a different thing - the "sufficiently smart compiler" is postulating that it doesn't matter if a language is not very high performance because somehow magically there will one day be a compiler that can just figure out how to optimise the code in every case.

I'm just saying that in this particular case, sometimes the compiler will be able to optimise it but you can't rely on it.

https://godbolt.org/z/WshcfWPEh


In theory, if this is the case (or the case in other comments about null pointer use and such), you would/could write it as:

   bool cheap = cheapCheck();
   bool expensive = cheap && expensiveCheck();
   if (expensive) { ... }


Can you prove this?


I think the idea is that if there's observable behavior in unconditionally calling the expensive function, the compiler can't skip it without altering the program behavior. Whereas if they're called as part of the if condition, the compiler can always safely skip the second function (and it's documented that it must do so, due to "short circuit" semantics) if the first function is false.

The ability of the compiler to skip the expensive function in both cases are characteristics of a "lazy" language, ie. you can assign an expensive function to a value, but the actual function isn't run until the variable is needed. Languages like C++ can't do this because it would alter the program's behavior if cheapCheck() does return true, because suddenly it would have to call expensiveCheck() after cheapCheck() (rather than before), which is an observably different behavior for the optimization to cause.


My mistake for reading HN too early in the morning. You and the other poster are right.


(Fun fact: I read meindnoch's comment right after I woke up and I, too, was about to reply to them with a godbolt listing to show that it would compile to the same thing, only to realize I completely forgot about short-circuit semantics. So you're not alone!)


It's part of the language [1] [2]

> Builtin operators && and || perform short-circuit evaluation (do not evaluate the second operand if the result is known after evaluating the first), but overloaded operators behave like regular function calls and always evaluate both

[1] https://en.cppreference.com/w/cpp/language/operator_logical [2] https://en.wikipedia.org/wiki/Short-circuit_evaluation


Note that (CPP reference will also tell you) this only applies to the built-in operator.

If the operator has been overloaded which is easy to write in C++, then too bad - the overload doesn't short circuit.

I think if you can't figure out a way to preserve the short-circuit feature then having a way to overload this operator in your language is stupid. It is, I would say, unsurprising to me that C++ did it anyway.

EtA:: It feels like in say Rust you could pull this off as a trait LogicalAnd implemented on a type Foo, with a method that takes two parameters of type FnOnce() -> Foo , and then the compiler turns (some_complicated_thing && different_expression) where both of the sub-expressions are of type Foo into something like:

  {
    let a = (|| some_complicated_thing);
    let b = (|| different_expression);
    LogicalAnd::logical_and(a, b)
  }
To deliver the expected short-circuiting behaviour in your implementation of the logical_and method, you just don't execute b unless you're going to care about the result.


It's easy enough to verify, just add a side effect to the second function. Just think of them as process functions returning success flags.


I do this and sometimes also something else: A local enum as a return value, where you have to compare the enum to use its value (ie you can't just 0/1 it). I of course don't always do this. I do it, when the semantics of the function value is a bit tricky. It is to protect against bugs "oh i'm pretty sure I know what returning true/false here means. It both helps the caller, as well as the implementation. when you are writing the return statements for edge cases, it can be a help that you return not just 'true', but IREALLYINSISTTHISMEANTWECOULDPARSEITANDCONTINUE. Again, I don't do this for simple obvious code.


Personally, I prefer to just split long if statements into multiple lines.

    if (
        (_someLongNamedVar != FooLongNameEnum::Unknown) &&
        (_someLongNamedMap.count(_someLongNamedVar) == 0)
    ) {
        // do something
    }


right, but putting the && at the beginning of the next line makes it more obvious this is a continuation statement


For bonus alignment points and malleability, start with 'true' and place the closing parenthesis on a new line:

    if (true
        && condition1
        && condition2
        && condition3
    ) {
        ...
    }
If you ever need to change the set of conditions, the diff will cleanly add/remove whole lines.


Or borrow a page from the FP language folks, which gets most of the way there (prepending an items is still a -1+2 diff) without the weird initial clause:

    if ( condition1
      && condition2
      && condition3
    ) {
        ...
    }
Obviously the best alternative is to switch to a prefix language and sidestep the issue entirely:

    (when
      (and
        (condition1)
        (condition2)
        (condition3))
      ...)


I actually like putting the && on its own line in between.


My take:

1. Avoid overly long and complex condition expressions - break them up for readability.

2. If a part of a long expression has obvious meaning, assign it to a temporary (probably saving a comment); if it doesn't, break a line after it. I don't like assigning variables with arbitrary names that simply duplicate the expression calculating them.

3. `true` and `false` are magic numbers. Calling `my_function(my_data, false)` is quite the hazard in my view: It makes the reader rather likely to fail to notice the exact semantics. I would expect, say,

    enum : bool { dont_save_copy = false, save_copy_to_file = true };
    my_function(myData, dont_save_copy)
4. This is relevant to almost any language! Even functional ones, where you would be encouraged to use "let" or "with" instead of super-complicated unbroken expressions. If you're writing APL though... you need super-memory and a discipline of steel :-)


The original bug would have been prevented by not overusing parentheses. This:

    if (_someLongNamedVar != FooLongNameEnum::Unknown && _someLongNamedMap.count(_someLongNamedVar) == 0)
does exactly what it’s supposed to.


That is what happens when people zealously try to not be "too clever". They make everything as explicit as possible for "clarity" even when it obviously hinders it for anyone who cared to learn the most basic things about the language they use.


>the most basic things about the language they use

This is C++.

The most basic thing has been deprecated but is still supported for backwards compatibility.


While at it, also don't do if (condition) return true; else return false;

Boolean is a perfectly cromulent data type that you can return from a function directly.


I agree most of the time, but it can make sense in certain contexts. One common example is comparison methods:

   if (x1 < y1) return true;
   if (x2 < y2) return true;
   if (x3 < y3) return true;
   return false;
I wouldn't want to replace the last two lines with `return (x3 < y3)`, even though that's functionally the same thing.

In this contrived example, we could just return a single compound condition, but hopefully it's clear that there are cases where doing so would make more of a mess, too.


For my mind it only works if a meaning of a return value is “compatible” with an expression. E.g. if a function returns a status, but an expression is an integer comparison, the flow of logic skips an important conversion step.

  bool is_db_open() {
    return (db != NULL);
  }

  bool is_time_to_report() {
    if (n_errors >= limit) {
      return true;
    }
    return false;

    // a little confusing way
    return n_errors >= limit;
  }
Also, if there’s a chance of an additional condition, they can be easily be added and documented.

  if (n_errors >= limit) {
    return true; // too many errors
  }
  if (now() >= deadline) {
    return true; // periodic
  }
  return false;
For the same reason I often do this:

  if (x == 0) {
    return 0; // not `x`


Many of these seen in the wild are due to workarounding an extremely annoying and useless MSVC performance warning.


the problem is the length of the variable/enum names. if they were shorter, the problem would be much more obvious - as it is the code is unreadable.

the lengths of names of entities, in any programming language, should be related to their scope - small scope, small names. and as limiting scope is what you should be doing all the time, almost all names should be short.

also, unless you really know what you are doing, don't use leading underscores, or underscores in general, in c or c++.

here is something i wrote a while back, which admitedly does not too much address the single underscore issue: https://punchlet.wordpress.com/2009/12/01/letter-the-second/


This article seems to be targeted towards beginners. From the title, I was expecting it to cover something like https://github.com/joboccara/NamedType, which is a technique I would recommend.


Related, one technique I like to use even though it's more verbose is to replace booleans with enumerations. For instance, maybe you have something that could be selected or not. I generally think instead of having "true" be selected and "false" be not selected, it's better just to have ListItem::Selected and ListItem::Unselected. That way when you pass things to a function, instead of reading a lot of true/false things in random places, you can see "ListItem::Selected" instead. Small upfront cost for a lot of readability down the line.

Typescript is nice for this in that you can use strings but enforce that they're in some set.


I agree, but without pattern matching converting "boolean-like" enums get tedious. For example:

    ConnectionMode connectionMode;
    if (useHttpsCheckbox.selected == ListItem::Selected) {
       connectionMode = ConnectionMode::Https;
    } else if (useHttpsCheckbox.selected == ListItem::Unselected) {
       connectionMode = ConnectionMode::Http;
    } else {
       shouldNeverHappen();
    }
In the real world there are often even longer chains like this - I found that bool-like variables ofter mean slightly different things at different layers of abstraction.

And also, you could simplify that by ignoring the `else` entirely, but there are no guarantees that it won't break later. For example another contributor may add ConnectionMode::TryHttpsElseHttp, and compiler won't help us. Languages with pattern matching don't have this particular problem.


You can of course use a switch statement and the compiler will warn if the cases are not exhaustive for the enum.


If one of the variables preceding the `if` has side-effects then it's inferior to the specified lazy-evaluation of the `if`. Any external function not annotated pure will be called to assign those variables which otherwise wouldn't have been if it was packed into the `if`.

This is actually something I think the C++ standard should address soon. There are cases when extern functions which aren't appropriately pure are still "optional." The example par excellence is debug logging, should one choose to not use macros.


This is a good technique. Not sure I’d consider it particularly novel, though. It’s pretty much boilerplate for how coding is taught.

I tend not to work that way, but it’s really a matter of personal style. I do it to reduce the verbosity of my code (which tends to be fairly verbose, in other ways –especially comments). I will, occasionally, encounter the bug he mentions, but not particularly frequently.


I read comments about need for lazy evaluation. In fact all constructs that ensure lazy evaluation try to solve a not really existing problem,in a hard to review manner. What the heck is wrong with just explicit nested if statements? Not cool enough? ``` if (p!=nullptr) { if (*p == ' ') { } else { } } ``` keep it simple


All those bools in the article should be const.


> Effectively the if statement was inverted.

If this wording means to convey that the complete test in the wrong code version is always giving the inverse of the correct version, then that statement is wrong. The truth tables, using 0 and 1 for false and true ("false" and "true" look too messy here):

The buggy code:

      0 1+ <- second condition
    0 1 1
    1 1 0
    ^- first condition
The fixed code:

      0 1+ <- second condition
    0 0 0
    1 1 0
    ^- first condition
i.e. the test as a whole is not inverted if the first subcondition is true.


As an alternative, you can also create private helper functions to evaluate conditions. You get the best of two worlds: function names are self-explanatory and it supports lazy evaluation.


Isn't this partly why `enum class` was introduced, and is now the idiomatic way to express enums?


Alternative solution: just write a damn test. How did that inverted condition get past testing?


That would be a good addition, but I wouldn't call it an alternative. Good test coverage shouldn't be an excuse for unreadable style.


Excellent suggestion. I use this style myself in many places. Makes the code more legible.

As has been pointed out, there are cases where it's not efficient or correct, but in general it's very useful. (Been coding since 1965, not always C++ obviously).


The bug in question would not happen if a reviewer demanded to split the complicated if. Yes, it would add a few lines depending on the overall control flow, but code in question is hard to read and must be written in a less dense way.


Code Complete 2 is still relevant and advocates useful practices like these.


I use this technique a lot to make the code more readable and less error prone but I have never bothered checking if this was trivially optimized or not. Do you know ?


There may not be anything wrong with it in isolation, but cognitive load is increased when every identifier is effectively a sentence or paragraph.


But cognitive load also increases when you have no idea what the significance of a variable is.


Agree, but it is an optimization problem: maximize meaning while minimizing identifier length. Recognize that it is possible to have identifiers that are too long.


So the topic here really has nothing to do with .count, STL, C++ 20 or even C++ itself. It could happen in almost just every language.

A very confusing article.


Isn't this (1) the most trivial programming advice ever (give names to things), and (2) not equivalent because of short circuiting?


This could have also been prevented using Yoda Notation with 0 == _someLongNamedMap.count(...).

https://en.wikipedia.org/wiki/Yoda_conditions


Thanks for the reference to Yoda conditions. I hadn't heard of Yoda code until a few weeks ago when a Go linter complained about some code I wrote a few years ago. I realized I've been writing Yoda code in C/C++ for quite a while and that habit had continued (somewhat inconsistently) into Go programming. It's nice to skim through pages of code and see the value that will cause the computer to enter an if-block. The linter made me feel like Yoda code was somehow wrong. It's good to know that others think it can help in some situations.


First thing I thought of when I saw the original wrong code. I prefer the author's approach, though, as I do believe it is more clear, unless there is something performance critical about it that requires short-circuits for maximum efficiency.


Rust and Go also avoid C++ bugs.

Why are we even still talking about C++?


Neither Rust or Go hit Github's Top 10 languages (though they are among the fastest growing): https://octoverse.github.com/2022/top-programming-languages.

There is a lot of code out there already written in C++ and a lot of people working on that code. Not to mention an active standards body.

Expect to be hearing about C++ for a long time to come


Dommage.


The post peaks with this example is this one:

  bool usersHairColorMatchesThisMonthsAdsColour = _user.hair == HairColour::Red;
  bool userFeetSizeFitsInShoe = _user.feetSize <= _requestedShoe.size;
  bool shoePriceFitsInUserBudget = _requestedShoe.price <= BudgetHelpers::Calculator(_user);

  bool shoeIsProbablyOkayForUser = usersHairColorMatchesThisMonthsAdsColour && userFeetSizeFitsInShoe && shoePriceFitsInUserBudget;

  if(shoeIsProbablyOkayForUser)
That they compare, with the "uglier":

  // this months ad campaign color is red 
  if(_user.hair == HairColor::Red && _user.feetSize <= _requestedShoe.size && _requestedShoe.price <= BudgetHelpers::Calculator(_user))   
I very much prefer the latter, I'm sorry (except it needs some new lines, of course). It performs better (that `BudgetHelper::Calculator` doesn't look very cheap ;) but crucially it is, to me personally, more "readable". First, I find a comment easier to read than a variable name, particularly when you use the not so readable dromedaryCaseToCrambleWordsInLongSentences. Thus, a variable adds no value when it's only used once (arguably, it adds more noise, since one has to go twice through the same text). And it is often debatable how a programmers poor translation of code into English adds any value: in fact, I find `_user.feetSize <= _requestedShoe.size` to be way more succint, precise and easier to parse than `userFeetSizeFitsInShoe`.

We really need to get over this school of thought that "the more wordy your variables the more readable" because it is simply not true. You don't have to go all the other way to the other end and write code using greek letters like in math. Meet somewhere in the middle, be aware of context and use the tools of the language (e.g. lexical scoping) to your power.

Unlike many of these bloggers want us to believe, you can't just boil "readability" down to some simple universal rules, and only experience makes the master. Taste, preference, purpose and context take a big part. As such, I find it useless to talk about "readable" code: readable for who? A law text, a maths paper, a school maths book, a user manual, a tutorial, a novel, a HN comment... they may all be written in English and with great skill in consideration for their "readability" by their target audiences, yet manifest completely different styles even when discussing the same topics.

For me, I like my code like I often enjoy most good writting: succint and precise. Wordy variables that add no value or abstraction are of little use to me.


I find that people want to find a name for some condition or computed result so they can box away a concept and use it without thinking about it anymore. That's great when you're writing software and you have lots of time to consider and remember your names, but not always so great when you revisit software and now you need to actually _know what those things are and whether they are logically correct_. Sometimes even the best name isn't enough to describe what it is you have in your hands right now, and instead of seeing it where it's used, you have to go find where it was originally defined, and you potentially have to repeat this process for several named objects at once, keeping them all in your head. IMO you need a better reason to name something than the example from the article, or you actually sacrifice clarity and readability.


I tend to prefer the method of defining clearly named variables, but I agree those names are ridiculous.

    hair_col_matches && size_fits && in_budget
is less than half the length and imho more readable because it doesn't turn each variable intoNeedlesslyWordyCamelCaseSentences.

> you can't just boil "readability" down to some simple universal rules

Yes, in the end blindly following rules doesn't turn out as well as understanding why those best practices exist in the first place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: