I'm not saying the entire book taken literally is how everything must be done. I...

danShumway · on May 26, 2021

> But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.

Well, but that's exactly what I'm pushing back against. I think the rule of 30 is often a mistake. I think if you're going out of your way to avoid long functions, then you are probably over-abstracting your code.

I don't necessarily know that I would inline a quicksort function, because that's genuinely something that I might want to use in multiple places. It's an already-existing, well-understood abstraction. But I would inline a dedicated custom sorting method that's only being used in one place. I would inline something like collision detection, nobody else should be calling that outside of a single update loop. In general, it's a code smell to me if I see a lot of helper functions that only exist to be called once. Those are prime candidates for inlining.

This is kind of a subtle argument. I would recommend http://number-none.com/blow/john_carmack_on_inlined_code.htm... as a starting point for why inlined code makes sense in some situations, although I no longer agree with literally everything in this article, and I think the underlying idea I'm getting at is a bit more general and foundational.

> Do you think that inlining the partition function would make this easier or harder to read?

Undoubtedly easier, although you should label that section with a comment and use a different variable name than `i`. Your secondary function is just a comment around inline logic, it's not doing anything else.[0]

But by separating it out, you've introduced the possibility for someone else in the same class or file to call that function without your knowledge. You've also introduced the possibility for that method to contain a bug that won't be visible unless you step through code. You've also created a function with an unlabeled side effect that's only visible by looking at the implementation, which I thought we were trying to avoid.

You've added a leaky abstraction to your code, a function that isn't just only called in one place, but should only be called in one place. It's a function that will produce unexpected results if anyone other than the `quickSort` method calls it, that lacks any error checking; it's not really a self-contained unit of code at all.

And for what benefit? Is the word `partition` really fully descriptive of what's going on in that method? Does it indicate that the method is going to manipulate part of the array? And is anyone ever going to need to debug or read a quicksort method without looking at the partition method? I think that's very unlikely.

----

Maybe you disagree with everything I'm saying above, but regardless, I don't think that Clean Code is actually advocating for the same ideas as I am:

> Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function.

I don't think that claim is one that Martin would agree with. Or if it is, I don't think it's a statement he's giving actionable advice about inside of his book.

----

[0]: In a language like Javascript (or anything that supports inline functions), we might still use a function or a new context as a descriptive boundary, particularly if we didn't want `j` and `pivot` to leak:

  function quicksort(data, lowIndex, highIndex) {
    if (lowIndex >= highIndex) { return; }

    const pivotIndex = (function partition (data, lo, hi) {
      //etc...
    }(data, lo, hi));

    quickSort(data, lowIndex, pivotIndex - 1);
    quickSort(data, pivotIndex + 1, highIndex);
  }

But for something this trivially small, I suspect that a simple comment would be easier to read.

  function quicksort(data, lowIndex, highIndex) {
    if (lowIndex < highIndex) { return; }

    /* Partition */
    let pivot = data[hi];
    //etc...

    quicksort(data, lowIndex, partionIndex - 1);
    quicksort(data, partionIndex + 1, highIndex);
  }

Remember that your variable and function names can go out of date at the same speed as any of your comments. But the real benefit of inlining this partition function (besides readability, which I'll admit is a bit subjective), is that we've eliminated a potential source of bugs and gotten rid of a leaky abstraction that other functions might be tempted to call into.

TeMPOraL · on May 27, 2021

> Remember that your variable and function names can go out of date at the same speed as any of your comments.

A very good point, thank you for voicing it!

As the luck would have it, two days ago I was writing comments about this at work during code review - there was a case where a bunch of functions taking a "connection" object had it replaced with a "context" object (which encapsulated connection, and some other stuff), but the parameter naming wasn't updated. I was briefly confused by this when studying the code.

danShumway · on May 27, 2021

Ha :) This is something that's also been drilled into me mostly just because I've gotten bitten by it in jobs/projects. The most recent instance I ran into was a `findAllDependents` method turning into `findPartialDependentsList`, but the name never getting updated.

Led to a non-obvious bug because from a high level, the code all looked fine, and it was only digging into the dependents code that revealed that not everything was getting returned anymore.

tharkun__ · on May 28, 2021

Absolutely agree that all naming can go out of date. With at least the tools I use nowadays it's even easier for comments to go out of date that it was previously because of all the automatic folding away in the IDE.

But one of the best reminders that comments don't do sh+t was early on in my career when my co worker asked me a question on a line of code (and it was literally just the two of us working on that code base). I probably had a very weird look on my face. I simply pointed to the line above the one he asked about. He read the comment and said "thank you".

I guess my point is that all you can do is to incorporate the "extra information" as closely as possible to the actual code, so that it's less likely to just be ignored/not seen. Thus incorporating it into the variable and function aiming itself is the closest you will get and as per your example (and my own experience as well) it can still happen. Nothing but rigorous code review practices and co workers that care will help with this.

But I think we can all agree (or I hope so at least) that it's better to have your function called `findAllDependents` and be slightly out of date than to have it called `function137` with a big comment on top that explains in 5 lines that it finds the list of all dependents.

tharkun__ · on May 26, 2021

Glad you admitted subjectivity. I will too and I am on the other side of that subjectivity. For the quicksort example, that was the pseudo code from the Wikipedia article.

I personally think that the algorithm is easier to grasp conceptually if I just need to know 'it partitions the data and the runs quicksort on both of those partitions. Divide and conquer. Awesome'.

I don't care at that level of abstraction _how_ the partitioning works. In fact there are multiple different partition functions people have created that have various characteristics. The fact that this changes its parameters is geberally bad if you ask me but in this specific case of a general purpose and high performance sorting function totally acceptable for the sake of speed and memory use considerations. In other 'real world' scenarios of 'simple business software' I would totally forsake that speed and memory efficiency for better abstractions. This is also where Carmack is basically not a good example. His world is that of high performance graphics and game engine programming where he's literally the one dude that has it all in his head. I can totally see why he would have different from someone like me that has to go look at a different piece of code that I've never seen before every day multiple times.

You mention various problems with this code such as the in place nature and bad naming and such. Most of that is simply the copy from Wikipedia and yes I agree I would also rename these in real code. I do not agree however with the parts about 'someone else could call this now'. To stick with Clean Code's language of choice, the partition function would actually be a private method to the quicksort class. Thus nobody outside can call it but the algorithm itself, as a self contained unit is not just a blob of code.

Same with your inlining of collision detection and such. I don't think I would do that. I think it has value to know that the overall loop is something like

  do_X() 
  do_Y() 
  detect_collisions() 
  do_Z()

Overall "game loop" easily visible straight away. The collision detection function might be a private method to that class you're in though. Will depend on real world scenario I would say.

You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends. In this example it's sort of easy to see. As the code we are talking about gets larger it's not as easy any more. So you have to make sure to make a new comment above every 'section'. Problem is that this can be forgotten. Now I need to actually read and fully understand the code to figure out these boundaries. I can no longer just tell my editor to jump over something. I can no longer have the compiler ensure that the boundaries are set (it will ensure proper function definition and calls).

danShumway · on May 27, 2021

> The collision detection function might be a private method to that class you're in though.

Definitely making things private helps a lot, although its worth noting that classes often aren't maintained by only one person, and they often encapsulate multiple public methods and behaviors. It's still possible to clutter a class with private methods and to have other people working on that class that are calling them incorrectly. This is especially true for methods that mutate private state (at least, in my experience), because those state mutations and state assumptions are often not obvious and are undocumented unless you read the implementation code (and private methods tend to be less documented than public methods in my experience).

Writing in a more functional style (even inside of a class) can help mitigate that problem quite a bit since you get rid of a lot of the problematic hidden state, but I don't want to give the impression that if you make a method private that's always safe and it'll never get misused.

> You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends.

In this example, I felt like it was overkill to include a closing comment, since the whole thing is like 20 lines of code. But you could definitely add a closing comment here. If you use an editor that supports regions, they're pretty handy for collapsing logic as well. That's a bit language dependent though. If you're using something like C# -- C# has fantastic region support in Visual Studio. Other languages may vary.

Of course, people who don't use an IDE can't collapse your regions, but in my experience people who don't use an IDE also often hate jumping between function definitions since they need to manually find the files or grep for the function name, so I'm somewhat doubtful they'll be too upset in either case.

> I can no longer have the compiler ensure that the boundaries are set

You may already know this, but heads up that if you're worried about scope leaking and boundaries, check if your language of choice supports block expressions or an equivalent. Languages like Rust and Go can allow you to scope arbitrary blocks of code, C (when compiled with gcc) supports statement expressions, and many other languages like Javascript support anonymous/inline functions. Even if you are separating a lot of your code into different functions, it's still nice to be able to occasionally take advantage of those features. I often like to avoid the extra indentation in my code if I can help it, but that's just my own visual preference.