Hacker News new | past | comments | ask | show | jobs | submit login
Reducing Code Nesting (eflorenzano.com)
114 points by craigkerstiens on Jan 2, 2012 | hide | past | favorite | 113 comments



Yikes. Lots of return points have always felt like a code smell to me. I've found that it tends to create surprising and frequently hard-to-maintain code. I do use early returns somewhere, but I'm rarely happy about it.

For things like the first example, why not something like (in Ruby):

    # Returns a cached user object either by their user_id or by their username.
    def get_cached_user(user_id = nil, username = nil)
        user =  cache.get_user_by_id(user_id)        ||
                cache.get_user_by_username(username) ||
                db.get_user_by_id(user_id)           || 
                db.get_user_by_username(username)    ||
                raise ValueError('User not found')            
        cache.set_user(user, user.id, user.username)
        return user
    end
You just try all your getters, cheapest first, and when the getter returns a nil or false value, the next one will be tried. Once one is found, no more are tried. Then you just set the resulting value in your cache and return the value. One return statement, no ugly nested ifs, and your unnecessary statements never evaluate, which is what you want anyhow.


Sigh, I knew using that contrived get_cached_user function with all those arguments would detract from my point, which is why I tried to couch it in disclaimers.

My point still stands though: nesting is bad. The takeaway from your comment is that you can solve the issue sometimes by factoring out smaller more focused functions instead of returning early, as you've demonstrated.


I have always liked the Linux kernel coding style in this regard:

Now, some people will claim that having 8-character indentations makes the code move too far to the right, and makes it hard to read on a 80-character terminal screen. The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.

Yes indentation != nested ifs, but in the end its about the same. I don't mind a bit of nesting, but I think Linus is onto something with >3 meaning: you've done something wrong, step away from the keyboard and think about what you've done.


> The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.

That particular cliché doesn't work if the essential complexity of data/algorithms you're working with implies more than three levels of decision-making. This can easily occur with graph-related operations like subgraph matching and graph rewriting, for example.

Splitting out the inner levels of such algorithms into separate functions doesn't gain you anything (unless the particular nesting you've got represents a subgraph that is meaningful in its own right out of context, of course) so deep nesting within a single function may be making the best of a fundamentally awkward modelling problem.


>if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.

A single if statement in a class method would put the indentation at 3, meaning if you have something like this...

    class MyClass(object):
        ...
        def my_method(self, param):
            try:
                self.calculate(param)
            except MyException:
                if param == 100:
                    ...
                else:
                    ...
you're screwed? I don't see how you can set such a definitive measure of bad code.


Linux is written in C, so every "top-level" instruction is going to be one-level deep. In Python, the "top-level" is often the body of a method, which means it's two deep. So to apply Linus' advice to Python code, you can go four levels deep:

    class Foo(object):
        def my_method(self):
            try:
                if foo is bar:
                   self.quux()
            except:
                 pass
This seems acceptable to me. Factor out the try block if you want more conditionals and use polymorphism instead of conditionals. Rather than writing:

    class X(object):
        def quux(self):
            if self.foo is self.bar:
                self.quux_foo()
            else:
                self.quux_bar()
Write:

    class X(object):
       def quux(self):
           self.quux_bar()

    class FooIsBar(X):
        def quux(self):
            self.quux_foo()
No conditionals and your intent is encoded in your class hierarchy instead of inside some random method.


Thanks for the thorough reply. I think his argument fits for C, but the linked article uses three different OO languages that would require a deeper "top-level", so I don't think it's fair to copy/paste his quote without any explanation or context.


FooIsBar doesn't need to be a subclass of X, does it? The polymorphism comes from the interface (implements `quux`), so inheritance doesn't seem relevant.

Feel free to correct me if I'm wrong. Still trying to learn the difference between inheritance and "evil" inheritance :)


Python doesn't have traits/roles, so it's hard to give a good example. You should build classes by composing them from smaller pieces of classes. There are two good ways to do this; delegation and role-based composition. Delegation is where you pass the constructor of your class an instance of another class, and your "foo" method becomes an alias to that instance's foo method. This allows you to do the same interfaces/roles as the object you're "wrapping", but still lets you customize the behavior of each method call you choose to delegate. (The Law of Demeter protects you from other callers touching the non-delegated methods.)

Roles let you implement pieces of classes without being a full class yourself. A classic example is a "Comparable" role, that requires the consumer to implement a "cmp" function, and then provides lt, gt, and eq implemented in terms of "cmp". You mix this into your class that does cmp, and now it gets lt, gt, and eq for free. You also get the security of knowing that you're calling the right method; duck typing will treat a tree's bark and a dog's bark as the same actions, while roles will let you determine whether you have an object with a wooden outer layer, or an object that can talk like a dog.

Python's philosophy doesn't seem to push delegation or composition, so we use inheritance in the above example to stick with Python programmer's expectations. Just beware of invoking the wrong type of bark; duck typing is flexible, but it ain't safe.


In my experience, a lot of the lessons of kernel/driver code don't really carry over to the more general purpose app development using higher level languages.


Your get_cached_user function is very authentic and is a great example of the kind of hairy code that a programmer should be looking for in his own code. There is no need to backpedal on this choice of example; it's fine.

You should go one step further and conclude that nesting is a warning sign often indicative of bad code, and that just fixing the nesting is not the solution to the problem.


The "||" idiom is popular in many languages, but keep in mind that it skips not just NULL (nil) values! It also skips otherwise perfectly desirable values like FALSE, 0 (zero), or "" (empty string).

In SQL there is the handy COALESCE function [1] which does exactly what I want in this case: It simply returns the first non-NULL value of its arguments, even if it is FALSE, 0 or "".

    def get_cached_user(user_id = nil, username = nil)
        user = coalesce(
            cache.get_user_by_id(user_id),
            cache.get_user_by_username(username),
            db.get_user_by_id(user_id),
            db.get_user_by_username(username)
        )
        if user.nil? raise ValueError('User not found')            
        cache.set_user(user, user.id, user.username)
        return user
    end

I always wonder why none of the other programming languages provide a handy COALESCE function/operator out of the box. That way, they wouldn't encourage bug-provoking hacks such as abusing the "||" operator.

Of course you can always implement your own COALESCE function, but that won't provide the nice short-circuit ability. So this is only feasible in languages with macros (e.g. Lisp), or languages which are lazy-evaluated (e.g. Haskell).

All other languages should either provide a COALESCE function/operator, or a sane alternative to "||" which skips only NULL values and nothing else.

[1] http://www.postgresql.org/docs/9.1/static/functions-conditio...


> All other languages should either provide a COALESCE function/operator, or a sane alternative to "||" which skips only NULL values and nothing else.

In Ruby, only nil and false are false, so you can use || to catch 0s and empty strings. In Perl, 0s and empty strings are false, but there's the // operator (as well as the || or or operators) that catches the first defined value.

I'm not sure if this is something specific to the database framework that you're using - sorry if you know all that already.


> In Ruby, only nil and false are false, so you can use || to catch 0s and empty strings.

But there's still trouble with things like boolean configuration variables ...

> In Perl [...] there's the // operator

Thanks for mentioning that! [1] The // operator is indeed a clean operator, almost equivalent to COALESCE.

So we have Perl and SQL. Any other language with such an operator?

[1] http://perldoc.perl.org/perlop.html#C-style-Logical-Defined-...


C#'s ?? operator does exactly that.

`x ?? y` returns y iff x is null.

In daily practice, I've hardly ever used it. I use early returns all the time, though, and the code becomes very nice and readable.


That looks nicer, but you changed the code; The original code would not call cache.set_user when the data already was in the cache.

You would need something in-between like:

    # Returns a cached user object either by their user_id or by their username.
    def get_cached_user(user_id = nil, username = nil)
        user =  cache.get_user_by_id(user_id)        ||
                cache.get_user_by_username(username)
        if not user:
            user = db.get_user_by_id(user_id)        || 
                   db.get_user_by_username(username) ||
                   raise ValueError('User not found')            
            cache.set_user(user, user.id, user.username)
        return user
    end


I knew someone would catch that. :)

It's dependent on your infrastructure and demands, of course, but I probably wouldn't bother optimizing that unless cache sets were prohibitively expensive.


I really tend to like early returns and use them often.

I usually try to make them use the form:

return x if y

To me, this makes it easier to follow a particular flow when reading code.

As soon as I encounter a return that matches it's conditional, I don't have to care about the rest of the method anymore.

During debugging, this also quickly lets me test assumptions by putting some log statements right after the return that should be occuring.

The only time early returns irk me is when they're deeply nested, but as mentioned by TFA and others in this thread, having deeply nested code is a bit of a code smell in its own right.


Those are typically considered to be guard clauses, and I agree that they're one of the more acceptable forms of early returns, but only if they're in the first few lines of the function. Returns in the middle of a 20-line function usually give me the twitches.


I'd love to hear your experiences as to how it leads to surprising and frequently hard-to-maintain code.

I've found quite the opposite.

Furthermore, I've found that early returns allows one, even in the original post's example, to easily specify a distinct error message for each case.


Multiple return points are a signal to me that I'm usually (though not always) trying to encapsulate more functionality into a function than I should, or I'm using returns when it might be more semantically correct to use exceptions. I'm a big fan of the "each function does exactly one thing" rule of thumb, and when I start returning all over the place, it usually means that I'm starting to violate that rule. Except in the simple cases, multiple return statements usually means that I've got a function that's composed of multiple micro-functions, and can/should be refactored. Guard statements are an obvious exception.

Sometimes you can't avoid it, especially if you're working with duck typing functions (which are a whole 'nother discussion), and it's not a hard-and-fast rule that multiple returns are evil. There are a lot of legitimate reasons to use them - performance being the primary one - and I would absolutely take them over an 8-deep nested if structure. But, they're often a warning sign that I'm putting too many eggs in that one basket.


This is clearly a very beautiful way to do this.

But somehow I still havent internalized the fact that when assembly code is generated for such an OR statement it is done in a way such that the if any sub statement is evaluated to be true then all the other sub statements are not evaluated at all.This is true with ANDs as well.

Can you please give me a link that proves conclusively that gcc and clang do this?


It's guaranteed by the C and C++ standards, and for the ternary operator (x ? y : z) as well. http://en.wikipedia.org/wiki/Short-circuit_evaluation


hmm...well I almost got convinced that I should use above technique a lot until I read that.

Short-circuiting can lead to errors in branch prediction on modern processors, and dramatically reduce performance (a notable example is highly optimized ray with axis aligned box intersection code in ray tracing)[clarification needed]. Some compilers can detect such cases and emit faster code, but it is not always possible due to possible violations of the C standard. Highly optimized code should use other ways for doing this (like manual usage of assembly code).[citation needed]


Citation needed indeed. It's logically equivalent to an if/else and the compiler probably knows that.

That sounds like someone may be thinking about the conditional-move and family on some CPU instructions. That could certainly interact with the branch prediction, but there's nothing about the operators that necessitate it.

Sure, it may be that some compilers do generate different code and some runs faster than others, but that's just not the kind of thing you can tell without benchmarking those very specific conditions. If you're writing a ray tracer, you're probably are going to want to hand-tune the assembly but that's not the common case of C and C++ programming.


I don't think that comment is intended for the scenario listed above, it's more appropriate for SIMD code where it's often better to calculate both execution paths and then select (using vsel/vec_sel(a,b) on Altivec or the equivalent bitwise logical operations on SSE) the value using masks generated by comparison operators, if only because the penalties associated with performing a branch on a SIMD result are often significant.


"... if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program." - Linus Torvalds, Linux kernel coding style (partly justifying why the Linux kernel style uses 8-space indentation)

http://www.kernel.org/doc/Documentation/CodingStyle


I was thinking of that quote too. And with Python-style OOP languages, I tend to increase that to 4 or 5... However, I never really ran into a problem with it until I started doing stuff in NodeJS and entered callback hell. The excellent Futures and Async libraries have helped reduce a lot of nested indentation, much more than getting rid of lambdas does. (And getting rid of lambdas has its own drawbacks.)


> Every added level of nesting is another piece of context that your brain has to keep track of.

I disagree entirely. Nesting manifests the conditional-evaluation context in the presentation of the code. Without nesting, the context shifts must be held purely in the mind of the programmer (instead of offloading that storage to the layout of the code.) In dynamic languages especially, I've also found that early returns lead to more test failing while refactoring (or more bugs if the code has low test coverage.)

The issue I have with named functions (as shown in the article) is that context for inner blocks must be passed through the outer blocks, which means your intermediate function signatures have some irrelevant (to the local scope) cruft. There are some rough corner-cases where nested code is far less complicated than juggling context batons.


With early returns, there are no context shifts. You're just getting rid of contexts as and when they are looked at. So there's no question of keeping a stack of context in mind.

What you say works well when you can physically see the structure in the code. With deep nesting in large functions, you can't - it's easy to get lost in it.

That said, I can understand what people say about multiple returns leading to bugs in refactoring - personally, I always prefer early and multiple returns to nested code, and I've developed the habit of checking the full function for any exit points whenever I refactor it.

To each his own, I guess.


There is actual evidence to back up his original statement, and even to say that nesting increases defects. It's called cyclomatic complexity, and can be tested for quite easily.

http://en.wikipedia.org/wiki/Cyclomatic_complexity

I wrote an article on a similar concept here:

http://www.jasonlotito.com/programming/blocks/

It covers one method I use to reduce the complexity of my functions.


It's a compromise. The end point is to improve readability and understand-ability of your code. Sometimes nesting play well. I think the author took a very simple example just for the sake of it. The reality is when you don't have alternatives, you end up with spaghetti code.


You can also place blocks of code in a list and recurse through the list until the user is found.

  let cacheUser user =
    cache.set(user.id,user)
    user

  let anonymousUser id =
    Some(new User())
 
  let sourceList = [(fun id -> cache.getUser(id),
                    (fun id -> db.getUser(id)),
                    anonymousUser] 

  let getCachedUser id =
   sourceList
   |> List.pick (fun source -> source id)
   |> cacheUser


BTW, in this case, you don't even need the first two lambda functions. You may just write

    let sourceList = [cache.getUser; db.getUser; anonymousUser]


That's one of the main things I like about functional programming: a lot of logicy code smells can be factored away into reusable generic functions like, in this case, List.pick.

Likewise, engineers who have at least played with FP tend to be sensitive to code smell. In the absence of other, firmer data, I'm willing to loan some trust to a programmer with FP experience.


I've done this sort of thing to satisfy DRY in general (this kind of nesting problem is quite repetitive), and to generally make the intention more obvious. It's not so pretty in certain other languages, but still worth it.

(Edit: I posted before you edited.)


F# always gets me in the editing mode because after I write it the first time I realize there is some built in function that makes the whole thing so much easier. I realized my match statement inside of a recursive function was just reimplementing List.pick


(I thought it was OCaml at first... then I realized you had `null` in there which isn't what you'd want.)

OCaml/F# are too convenient that way. There's a lot of library stuff that you won't always think to look for.


It's strange that many programmers in this topic feel that multiple return points are bad. It is very clear how nesting makes code logic more complex, but why many return points are a problem at all? They don't require you to do extra work while reading or writing the code, actually it is exactly the reverse IMHO, every return point remove some possible future state.

Also many times early return is an exact translation of the way we think in our natural languages: "if that is this way don't continue at all", "if this precondition is meet return that value", and so forth.


It is because it is unstructured -- in the Dijkstra 'structured programming' sense. The flow is no longer nested, so you cannot skim over -- mentally fold -- parts. It also means the effects of transformations are less 'limited'.

With structured code, if you put a statement after a block it will be executed (notwithstanding exceptions, but that is a slightly different matter...), no matter what is changed. But if you have returns, breaks, etc. that structural condition is lost.

If you have a function with a statement, or block, at the very end, to clear something up or something, but then someone puts an early return in, the clear-up will be missed. There is a certain error-proneness there.

The best use of the early-return pattern ('replace nested conditional with guard clauses' in the Refactoring book) is where that is the only thing going on in the function. So you are effectively reading/understanding the whole thing as an 'early-return function'.

(And software is very different from natural languages. When you look at software what you see is not language, but a machine. And it has a particular hierarchical structure.)


Mentally folding sections of code is easier with multiple returns- the code that satisfies the conditions established by the early return is just the rest of the function, rather than until the denesting point you have to look for.

RAII, GC, and/or try/finally solve the "error-proneness" of early returns, and in a language like C you can simulate that with a simple (good use of) goto.


Why couldn't he write the first example like this? Relying on short-circuiting "or", I find this far more readable and maintainable than his final solution:

    def get_cached_user(user_id=None, username=None):
        """
        Returns a cached user object either by their user_id or by their username.
        """
        user = (cache.get_user_by_id(user_id)
                or cache.get_user_by_username(username)
                or db.get_user_by_id(user_id)
                or db.get_user_by_username(username))
        if not user:
            raise ValueError('User not found'))
        cache.set_user(user, id=user.id, username=user.username)
        return user


That's another great way to reduce nesting--factor it out into smaller more-focused functions. I came up with an intentionally contrived example to specifically induce nesting, but you're right that's a better way to write it.


Note that the "or" operator has some important downsides. For reducing bugs, using a COALESCE-like operator is IMHO more feasible. (see also: http://news.ycombinator.com/item?id=3415522)


Considering the semantics of the method (getting a user object or identifier), I don't think that's an issue. And a falsy OR also allows for using good (and falsy) null objects instead of nulls in object-oriented languages, an ability which would be lost (and lead to more reliance on NULLs, a behavior which should not be promoted) through a stricter selective operator.

The inability to trivially use null objects in conditionals is, in fact, one of the things I dislike in Ruby (where only `false` and `nil` are falsy).


Can you exemplify that inability? What is it you can't trivially do?


An issue here is that you're going to reset your cache after every lookup instead of only on cache miss.


Hm? How come? `or` is short-circuiting in the sense that when we have `a() or b()`, b() is only called if a() returns something falsy. Where would the issue arise?

Sure, abusing or in this way might seem hackish, but I feel like it's a net win for readability.


I didn't realize I instinctively picked up the early return habit until I started doing iOS development, where it seems the convention encouraged by apple is nesting rather than early returning.

I prefer the early return style. I generally try to put all error handling / sanity checking code up top with early returns (ie, the preconditions of the function), so that the "meat" of the code at the bottom of the function can be more concise and easier to grok. However, this requires a reading style of "first read the bottom of the function to get the gist of it, and then read all of the pre conditions above it". Unfortunately I have to explain that to my coworkers, bu when I do, they seem to understand wha I'm going for.


Good old guard clauses. I use them for this exact purpose.

The admonition to use single returns from a function arose from the days when a) GOTO was the major flow-control primitive and b) Dijkstra and others were starting to build up a head of steam on the we-should-treat-programs-as-mathematical-proofs thing.

If you have a single return point out of a function, it's easier to prove various things about it.

But like a lot of findings, it broke loose of its moorings in the problems of that time and floated away to lead an exciting care-free life of its own.

Analysis has improved since then, but more to the point, single-return code often leads to carrying around a bunch of book-keeping variables whose only purpose it is to artificially prop up that nostrum. They do not relate to the problem domain: they add to the difficulty of understanding the code.

That's why I feel single-return is a principle that is well past its glory days. Return from a function where it makes sense.


I prefer early returns for precondition checking too. i.e. before anything interesting has occurred. Thinking about it though, I've just realized that the early-return problem is sort of a by-product of side effects. If your function simply takes input and produces an output, and you control where side effects are applied, then early returns are far less of an issue.


Returning early is, in my experience, a recipe for disaster.

Even in trivial 4 line functions I assign my return values to a variable and then return that at the end of the function.

This may just be my pedantry left over from when I first studied C at university, but more than once I've spent a while trying to figure out why a return value wasn't what I was returning (from some code that had been previously written by someone else) only to find that there was a return statement on the 3rd or 4th line of the function.

One could certainly argue that if a function were concise and readable and commented and all that stuff then this wouldn't be a problem, but that's not always the case and a little return statement sitting someone in the middle of the function can really throw you sometimes


I've noticed most coding style tips only make sense in concert. I'm assuming the reason you got bitten was that you were looking at a function which was several pages long (ie, you didn't have time to read the entire function before makin an edit). The early return thing is a strategy I definitely subscribe to, but I works best when you favor lots of small functions rather than few long unctions


returning early is a good practice for all the guards and sanity checks you have to perform before the main logic of the function kicks in. if you use it this way, your function will visually consist of a bunch of small blocks each with a return, followed by one larger chunk of code that is the "actual" function, and where you can enforce a single return at the end.


I concur. This is how I have been structuring function bodies for a while now, and coworkers tend to easily follow the logic of both the whole function body and the "actual" work.


I'm increasingly a fan of local named functions in JS, vs. pulling callbacks entirely out of scope. An example:

    function compile(filename, ready) {
      return fs.readFile(filename, make_fn)

      function make_fn(err, data) {
        if(err) return ready(err)

        ready(null, new Function(data))
      }
    }
Which neatly addresses the desire to retain closures, while avoiding unnecessary nesting.

Also, whenever possible, I like to nix callbacks by using Function#bind:

    res.on('data', accum.push.bind(accum))

    // vs:
    res.on('data', function(data) { accum.push(data) })


I tend to favor one of these for clarity:

    Array.prototype.push.bind(accum)
    [].push.bind(accum)
(i realize the latter is wasteful, but arrays are so cheap that you wouldn't notice unless in a hot loop)


> Also, whenever possible, I like to nix callbacks by using Function#bind:

There's one (and only one) non-DOM jquery method I enjoy for that sort of things, because it avoids repetition: $.proxy(Object, String)

    accum.push.bind(accum)
becomes

    $.proxy(accum, 'push')
I think it spells out the intent better than the bind dance.

Of course, things would still be better if javascript just bound method on instance prototype lookup.


I largely try and do the same - use return and continue early and often. Aside from the things that you mentioned, one more benefit is that doing this makes it very easy to step through code using a debugger, and see which condition is failing (rather than having to print out various conditionals, or add log statements in and recompile/rerun).


Another simple way to reduce nesting is to use "continue" rather than wrapping entire loops in if statements.

For instance, this:

  for (file in files) {
    if (isOK(file)) {
      ...  // nested two levels
    }
  }
becomes:

  for (file in files) {
    if (!isOK(file)) continue;
    ...  // nested only one level
  }


i vacillate between using continue/return to skip code and using indentation. It is shorter, but I find that it is not as skim-able -- (all?) other control flow is hinted at by indentation, but continue/break/return is not.


return/continue/break really jump out with keyword highlighting.


Does anyone else hate multiple returns?

A do/while condition would satisfy this situation. or while True: if False: break in python


It depends on the compactness of the code. If it's a large function, I try to avoid it, but 100% adherence to "one entry, one exit" kind of misses the spirit of the idiom. Something like:

    if (condition)
      return true;
    return false;
is no less readable and sometimes more intuitive than the equivalent

    bool x = false;
    if (condition)
      x = true;
    return x;


How about this:

   return condition;


I'm certain he was using True and False as arbitrary values. It's a shame his point was missed. Please try to focus on his actual argument.


I'm not so convinced that it's not a fundamental thing.

Maybe someone wants to post a 'better' example and we'll see where it goes?


How about:

    if (condition())
        return 10;
    return 20;
versus

    val = 20;
    if (condition())
         val = 10;
    return val;


Well:

    val = condition() ? 20 : 10;
    return val;


Did you invert the logic of the parent post on purpose?


Or even:

    return condition() ? 10 : 20;


Yeah, that's definitely a more reasonable way to implement the above example. I didn't put much thought into the specifics of the code, since the specifics have nothing to do with what point I was making. I was responding to a question about multiple returns; the simple example was supposed to illustrate my opinion on multiple returns, not how to return a boolean value given a boolean condition.


if `if` was an expression rather than a statement one could write

return if condition() 10 else 20

I am not sure this is more readable though.


There is an 'if' expression, it's just spelled '?' and else is ':'. Mentioned a few posts up.


Yes. I generally try to avoid them in my code, unless doing so makes the logic significantly more contorted and then I put a prominent comment way out on the right margin to draw attention to the inner return.

That first get_cached_user example probably doesn't need to be nested. I don't know about Python but I would expect an optimizing compiler to short circuit the redundant conditionals if it really mattered. If it really did matter for performance, the aesthetics become a distant second consideration.

I don't see anything wrong with the get_media_details example, other than perhaps the logic of the example itself. It looks readable to me, but a blank line after 9, 13, and 19 would help with readability. If it were to grow more complicated I might look at refactoring using some sort of object polymorphism.

To me the only thing worse than complex deeply-nested control structures is code that tries to hide that complexity for aesthetic reasons in control structures that are only superficially simpler.


I strongly hate multiple returns. OTOH, I wish python had curly braces.


Cyclomatic complexity has been used for decades as a measure of software quality. ( Less is better. ) Any discussion of "reducing code nesting", should include a mention of it.

http://en.wikipedia.org/wiki/Cyclomatic_complexity


Cyclomatic complexity (CC) is good as long as it doesn't become the end-all be-all measure of code. I've run into crazy types that use it as an abusive bludgeon.

Code that has low CC feels strange if you're not used to it, but there are some good reasons to like it. One of the things that I find appealing to it is that by putting things into named functions, you're better expressing your intent by giving it a name.


I couldn't agree more.

It more important to be aware of it than to live by it. We're all adults here. ( Well, most of us anyway. ) So, as long as we understand the tradeoffs, then we can decide for ourselves when the complexity is too great.


I like the article, but changing languages on me midway without warning me kind of threw me off.

Especially since I didn't notice that I was reading a different language (subconsciously, code is code) and then it kind of hit me in the face that python doesn't have curlys


I felt exactly the same way. I don't know Python, but it's similar enough to Ruby that I could follow the post.

I was able to grok it, but I think the points could have been made better by sticking to one language.


There is some empirical evidence that excessive nesting correlates with bugs: http://franktheblue.blogspot.com/2011/04/minimize-nesting-of...


The "Maybe" monad solves exactly this problem.


Every monad solves exactly this problem.


Sort of; it's Maybes being composed with <|> rather than with >>=. The code in Haskell is still extremely ugly, even though there's no indentation moving the code over to the right:

   get_user :: Cache -> Database -> Maybe Id -> Maybe Username -> Maybe User
   get_user cache db id name = 
       (id >>= get_user_by_id cache)                     <|>
       (name >>= get_user_by_username cache)             <|>
       ((id >>= get_user_by_id db) <|> 
        (name >>= get_user_by_username db)) >>= 
           \user-> cache_user cache user >> return user) <|>
       Nothing
So while I like the look of this code more than I like the look of the Python, it's still shit. The problem here is not that there is nesting, it's that the code isn't well-factored.

The first problem is that the function does too much. There should be two methods, one that accepts a username argument and another that accepts an id argument. The dispatching between the two types of invocation is one level of nesting that is completely unnecessary; get_user_by_id(42) and get_user(id=42) are both equally easy to read, but the latter is much harder to implement cleanly.

The next problem is that the caching logic is handled in this random data lookup function, instead of some place where we can reuse the common pattern of "check cache and return, otherwise fetch, cache, and return". So let's write that:

    def cached_lookup(cache, object, method, key):
        # already cached
        if cache.contains(key):
            return cache.get(key)
       
        # not cached; lookup in database and add result to cache
        value = object.method(key)
        if value is not None:
            cache.insert(key, value) # we are punting on calculating the key 
                                     # from the value here, but that is something
                                     # the cache could be taught to do with a 
                                     # decent metaobject protocol.
        return value
Now we can rewrite the get_user functions:

   def get_user_by_id(self, id):
       return cached_lookup(self.cache, self, 'get_user', {'id':id})

   def get_user_by_name(self, name):
       return cached_lookup(self.cache, self, 'get_user', {'username':name})
Now the functions are separated by concern. The get_user_by_* functions do one thing: compute a cache key and ask the cache for the object. The cache is then responsible for the fetching logic. That means if we want to change how caching works (perhaps to add eviction), we only need to change one thing in one place. If we made every get_foo_by_bar function handle caching, the code would quickly become unmaintainable.

So to summarize, nesting draws our attention to problems, but removing the nesting is not the solution to the problem. Neither are monads. The solution to the problem is to refactor.


Yeah, I'll agree that the functions weren't very good. I actually had a hard time coming up with examples that didn't require a bunch of extra explanation.



One thing I like about early returns/continues, as opposed to nesting, is that when you're collaborating with people the diffs are much easier to read and make sense of. You practically never re-indent code.

OK, with changes in nesting you could just use a whitespace-free diff, but then you can't just apply/commit it. If you apply it, you have to reindent before committing. (This is tricky in python.) Alternatively, if you get a full diff, you have to apply it and then extract a whitespace-free diff yourself in order to read the actual changes properly.


First time I write in HN, but I really wanted to give my take on this.

I have to say that I really feel that to get readability, over nesting too, you often should refactor a bit the code.

I would in fact write the get_cached_user method by using separate methods and a @cache decorator. Every single function is very readable by itself.

(I'm not fluent in python, pseudo-python follows... but you should get the idea)

    def cache(function_to_cache):
        '''
        A Decorator that caches a function result
        '''
        def wrapper(param):
            cache_key = function_to_cache.name + " on " + param
            if cache.contains(cache_key):
                return cache.get(cache_key)
            else
                value = function_to_cache(param)
                cache.set(cache_key, value)
                return value
        return wrapper
    
    @cache
    def get_cached_user_by_id(id):
        return db.get_user_by_id(id)
    
    @cache
    def get_cached_user_by_username(username):
        return db.get_user_by_username(username)
        
    
    def get_cached_user(user_id = None, username = None):
        user = None
        if user_id:
            user = get_cached_user_by_id(user_id)
        else if username:
            user = get_cached_user_by_username(username)
        
        if not user:
            raise ValueError('User not found')
        
        return user


I don't think I'd want to maintain this bit of code although I'd prefer it to the mess of nesting in the original implementation. Each separate way of caching gets its own function which is ... ok I guess. Its explicit, which is pythonic, but overly verbose for multiple methods and we have to draw the line somewhere.

I think decorators are complex enough that I can't immediately look at the thing and know what its doing in the same way as early returns. A closure that takes a function pointer with some introspection magic is just a lot of mental juggling. At least with your case the complexity doesn't compound and is pushed as low as possible (as per Code Complete).

A final note: functions can have static members just like classes in python. fuction_to_cache.name would access the .name member of the function if such a thing existed (it will be an AttributeError? in this case). You are looking for function.__name__ I believe.


I don't think that decorators are too complex to be used usually... In python it should be a known and understood pattern and therefore it should not scare at all.

(Moreover it should handle multiple args, and there are better and more popular implementation such as @memoize)

However my assumption in this specific case is that the project is not small and therefore that decorator and this "pattern" could be used somewhere else too. If this is not the case I agree with you that it would just add too much complexity and it would hence be better to not have the decorator at all. The get_cached_user_by_username and get_cached_user_by_id would just handle the cache hit/miss by themselves.

Still it would be much more readable, because the main point I would say is to separate a long or too nested method in more methods.

    def get_cached_user_by_id(id):
        user = cache.get_user_by_id(id)
        if not user:
            user = db.get_user_by_id(id)
            cache.set_user(user)
        return user
    
    def get_cached_user_by_username(username):
        user = cache.get_user_by_username(username)
        if not user:
            user = db.get_user_by_username(username)
            cache.set_user(user)
        return user
    
    def get_cached_user(user_id = None, username = None):
        user = None
        if user_id:
            user = get_cached_user_by_id(user_id)
        else if username:
            user = get_cached_user_by_username(username)
        
        if not user:
            raise ValueError('User not found')
        
        return user

(Thanks for the .__name__ tip... I have not use python for a while)


For Javascript, async.js[1] offers some good tools for different kinds of control flow with callbacks.

[1]: https://github.com/caolan/async


Depth of code nesting can be reduced using many syntactic tricks. You have shown the use of multiple return. This trick is generally considered as a hidden goto (go to the end of the function). 'GOTO considered harmful' is practically biblical law amongst many programmers, but it's worth remembering that he made that statement in the context of an argument with Donald Knuth. Knuth won: (http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pd...) Strict application of usual "good" coding conventions works well in most of the cases, but always results in poor code for the remaining cases. I think a relaxed application of coding conventions would be better, but in case of code review, this requires a lot of efforts to explain why the not compliant code is cleaner. Writing code compliant to coding convention is frustrating, but is often easier than to convince stupid bosses. The compliant way to reduce code nesting is to create small functions (with a single return), even if they are called only once.


I'm surprised Eric didn't mention the Zen of Python:

  >>> import this
  ...snip...
  Flat is better than nested.


I think the last Erlang example should be simplified further. It uses two different values of Resp to signal an error ("Error" and "{timestamp, Start, Error}"), which can be unified for more clarity:

    do_some_file_thing(File) ->
        Resp = case file:open(File, [raw, binary, read]) of
            {ok, Fd} ->
                {timestamp, now(), process_file_data(Fd)};
            Error ->
                {timestamp, now(), Error}
        end,
        case Resp of
            {timestamp, Start, {ok, Processed}} ->
                {ok, Start, now(), Processed};
            {timestamp, Start, Error} ->
                Error
        end.


There is something wrong about this.

Code is a tree, code is about nesting. If you do not like nesting, you do not like code.

Code is not 'text'. You do not read code top to bottom like text. Code has a structure, and you read that structure.

And if an example of 'improvement' doubles the line count, you have a pretty good indication you are doing something wrong.

What seems to have happened is a small piece of advice has been taken too far. The early-return shortcut is reasonable. It is indeed advocated by Fowler and Beck (who deserve some trust) -- they call it 'replace nested conditional with guard clauses'. But that is something very particular. It does not suggest removal of all nesting in general.


Just because code is a tree doesn't mean a deeper tree is better than a shallower one.


Regarding the JavaScript part: I often use small state machines to unroll deep nesting. You get a better state tracking, less nesting and easier error handling :-) You should give it a try....


Perl style tends to use early return and also has the benefit of being able to emphasise the control flow by using a return with a trailing conditional. I.e.

    return $user if $found;


Did anyone else thing that the fact that he needed to do

        cache.set_user(user, id=user.id, username=user.username)
twice in his improved solution a bit of a giant red flag?


It was a red flag, but the javascript "curry" solution is like a nuclear explosion of madness.

A pity, because the general idea and techniques in the article are fine, but the examples try so hard to be simple that they fail to illustrate the point: the actual "improved" versions of the code are much worse than the supposedly broken originals.


This issue comes up a lot between me and my coworker. He is very anti-nesting, and i prefer methods with single points of return. Obviously it is a balance to create readable code.


I wrote two articles with the same goal, using simple rules describing when you should simplify. I also cover cyclomatic complexity, which you seem to want to describe.

Regardless, the two rules I focus on:

* Idents are intents to modularize http://www.jasonlotito.com/programming/blocks/

* Comments indicate future refactoring http://www.jasonlotito.com/programming/comments/

They are both covered here:


Factor (the language) seems to encourage code with very low levels of nesting -- for most words, none at all.

http://factorcode.org


I actually found the second example to be quite nice. Even in my current sleepiness-fuzzed mental state, I was able to follow it easily. It was almost refreshing.

I guess that's one of the dangers of using contrived examples. Or maybe it's just proof that this is as much a matter of taste as anything.


Unfortunately the logic of this article is ass-backward. Multiple return paths involve jumps and make compiler jump and stack optimizations very difficult. What is easier to read and follow for a person does not always produce good instructions for a computer to follow.


The following also mentions something on the similar lines http://www.kernel.org/doc/Documentation/CodingStyle


Code Complete book has a very good section on reducing code complexity and gives various tips and tricks with the analysis of possible outcomes. Must read for developers, I think.


Goto


This is, in fact, a great solution in certain contexts:

http://wwwens.uqac.ca/~flemieux/PRO102/goto_Rubin.pdf


Agreed, goto can actually be a great way to adhere to DRY when dealing with a lot of similar error condition checks. Used in the right context, it's bad rap is undeserved.


> it's bad rap is undeserved.

Its bad rap is entirely deserved. Goto is a fine tool in a few cases, but it's an awful general-purpose tool when better control structures exist, and its usage in this case — common when structured programming got its start — makes code significantly worse.


This is why I've become very fond of a little advertised part of haskell: the where clause. Every function can have its own little library of subfunctions, without polluting the toplevel namespace.

Whenever code starts getting complicated, I break it out into a subfunction in the where clause, which means I give it a name, which makes my code more self-documenting, and incidentally keeps the indentation level sane. Breaking things out into lots of little functions like this often makes it apparent when the function in the where clause is more generic, and does belong at the toplevel, and then the code is already separated into a function that's easy to move out of the where clause (closures do mean additional parameters sometimes need to be added, but the compiler will make sure you get this right).

The other nice thing haskell brings to the table, which would be more helpful in the first example given, is the ability to write your own control flow functions. For the first example, which keeps trying different actions from a list until one succeeds, and returns its value, I would write a generic function to do that. Its type signature would be:

    firstM :: (Monad m) => [m (Maybe a)] -> m (Maybe a)
Of course, I don't need to write that function.. I can just paste the above type signature into Hayoo, and get directed to an existing implementation: http://hackage.haskell.org/packages/archive/darcs/latest/doc...

If I did need to write firstM, I'd feel special to have been the first to think up such a generic and useful function. So it's win-win-win all the way. :)

Anyway, the code to use it would look something like this:

    get_cached_user uid name = fromMaybe nouser <$> find
      where
        cache a = do
          r <- a
          cache_set_user uid name r
          return r
        nouser = error "user not found"
        find = firstM
          [ cache_get_user_by_id uid
          , cache_get_user_by_username name
          , cache $ db_get_user_by_id uid
          , cache $ db_get_user_by_username name
          ]
As another example of this refactoring of control flow, looking at the cache function above I realized I've written those three lines several times before. So I just added this to my personal library:

    observe :: (Monad m) => (a -> m b) -> m a -> m a
    observe observer a = do
      r <- a
      observer r
      return r
(I seem to be the first person to think of this function.. yay!)

With this, the "cache" function can be written as just

        cache = observe $ cache_set_user uid name




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: