Hacker News new | past | comments | ask | show | jobs | submit login
Linear code is more readable (separateconcerns.com)
412 points by dmarto 11 months ago | hide | past | favorite | 357 comments



It’s a matter of style, and like cooking, either too much or too little salt will ruin a dish.

In this case I hope nobody is proposing a single 1000-line god function. Nor is a maximum of 5 lines per function going to read well. So where do we split things?

This requires judgment, and yes, good taste. Also iteration. Just because the first place you tried to carve an abstraction didn’t work well, doesn’t mean you give up on abstractions; after refactoring a few times you’ll get an API that makes sense, hopefully with classes that match the business domain clearly.

But at the same time, don’t be over-eager to abstract, or mortally offended by a few lines of duplication. Premature abstraction often ends up coupling code that should not have to evolve together.

As a stylistic device, extracting a function which will only be called in one place to abstract away a unit of work can really clean up an algorithm; especially if you can hide boilerplate or prevent mixing of infra and domain concerns like business logic and DB connection handling. But again I’d recommend using this judiciously, and avoiding breaking up steps that should really be at the same level of abstraction.


> In this case I hope nobody is proposing a single 1000-line god function. Nor is a maximum of 5 lines per function going to read well.

This is the key. Novice devs tend to write giant functions. Zealot devs who read books like Clean Code for the first time tend to split things to a million functions, each one a few lines long (pretty sure the book itself says no more than 5 lines for each function). I worked with a guy who extracted each and every boolean condition to a function because "it's easier to read", while never writing any comments because "comments are bad" (according to the book). I hate that book, it creates these zealots that mindlessly follow its bad advices.


Or, the fun one I run into is devs who write a mix of 1000 line functions and tiny little 5 line functions with no discernible pattern to which option is chosen when.

The truth is that what makes code readable is not really (directly!) about function size in the first place. It's about human perceptual processing and human working memory. Readable code is easily skimmable, and should strive to break the code up into well-defined contexts that allow the programmer to only have to carry a handful of pieces of information in their head at any given moment. Sometimes the best way to do that is a long, linear function. Sometimes it's a mess of small functions. Sometimes it's classes. Which option you choose ultimately needs to be responsive to the natural structure of the domain logic you're implementing.

And, frankly, I think that both versions do a pretty poor job of that, because, forget the style, the substance is a mess. They're both haphzardly newing up objects and mutating shit all over the place. This code reads to me like the end product of about four sprints' worth of rushing the code so you can get the ticket closed just in time for sprint review.

I mean, let's just think about this as if we were describing how things work in a real kitchen, since I think that's pretty much what the example is asking us to do, anyway: on what planet does a pizzeria create a new, disposable oven for every single pizza? What the heck does

  pizza.Ready = box.Close()
even mean? Now we've got a box containing a pizza that's storing information about the state of the object that contains it, for some reason? Demeter is off in a corner crying somewhere. What on earth is going on with that 'if order.kind == "Veg"' business, why aren't we just listing the ingredients on the order and then iterating over that list adding the items to the pizza? The logic for figuring out which ingredients go on the pizza never belonged in this routine in the first place; it's ready aim fire not ready fire aim. etc.


This is in the article (seems to have been edited after your comment):

>I wasn’t sure if I wanted to mention this or not, but I ended up editing the post because there is something that bothers me with this function, and it is that business with the oven.

>[...]

>this code makes no sense: why would you create a whole new oven to make a pizza? In real life, you get an oven once, and then you bake a whole lot of pizzas with it, without going through the whole heating cycle.


The problem with any good idea: As soon as it becomes dogma, it doesn't matter how good the original idea was, it will turn itself bad.


And programmers are way worse than people in general when it comes to being dogmatic.


I think this is an experience thing. There are a lot of inexperienced developers, and inexperienced developers tend to become very attached to development philosophies, and attribute all problems to the lack of adherence of their pet philosophies.

This tends to wear off with exposure to the real world. Not only will you find undeniably good code that's written in flagrant disregard to the holiest of doctrines, you'll also find garbage written The Proper Way, more damning still sometimes you'll discover it was written by your own hands.

I don't think it's a coincidence many of the ideas that have the most fervent and zealous followers have names that sound righteous, if it isn't clean code it's pure functions or more recently memory safety. Clearly nobody who is on the "side" of dirty, impure and unsafe code can be right?


> I think this is an experience thing.

Ahhhhh...yes, but...

There are two kinds of people when it comes to dogma: The faithful, and the preachers. The inexperienced developers are the faithful. They cling to the scripture without proof that it is actually necessary.

Insofar, I agree with your post.

But the faithful need someone to preach the faith, and those are usually not the inexperienced ones. Those are usually experienced developers. Their personal reasons to cling to the dogma are varied: Some may have started as faithful themselves, for some it's stubbornness, an unwillingness to change, maintaining a feeling of superiority, the fear of becoming obsolete, ...

So here we are in disagreement. The preacher is the product of experience and development over time. And in my book, the preachers of dogmas are more of a problem than the faithful who follow them. Because it's the preachers who write the scripture, the preachers who make up arguments why alternatives to the ideology are bad, and the preachers who seek to isolate their flock from the "evil" preditions of alternatives.

> I don't think it's a coincidence many of the ideas that have the most fervent and zealous followers have names that sound righteous, if it isn't clean code it's pure functions or more recently memory safety.

Closing my answer on another point of agreement, it is absolutely not a coincidence, that the wording of dogmas in programming sound eerily similar to that in religious teachings ;-)


Well said. Following a set of rules is no substitute for experience, despite any pressure to do so.

Since every situation is unique, even if similar to others, I think it's always best that programmers rely on their judgement and experience when writing code instead of a set of axioms.

And besides, blindly embracing a set of rules is acting more like a robot, instead of a human being.


"Agile" is another of those words, used to whitewash stiff, heavy and dogmatic dev processes.


I prefer to read this in such a way that implies that programmers are not people. :p

I jokingly refer to our customers/users as "humans" at work.


Hard not to become dogmatic when your living depends on getting every comma exactly right and you're getting instant impartial feedback every time you get something not quite right :)


I disagree. Programming allows for so much freedom and flexibility in how you approach problems. I find that a large portion of the job is following your gut feeling and putting it into words.


I don't understand this widespread sentiment at all. If the dogma-idea is right, it's right. No matter how dogmatic the people who are defending it are.


Most of these ideas are tradeoffs. If a particular tradeoff is the right thing to do 80% of the time, then it is clearly a good idea. But if you understand it as dogma, you'll do the wrong thing 20% of the time.

Take the pizza example. Which is better, linear code or small functions? It's a series of tradeoffs. Once you get above a screen of code, or 10 ifs, functions become hard to read. Once the same logic has been written 3x, abstracting it is usually a win. Even if it is small. And there is a fuzzy area where it isn't obvious which is better, and debating it is probably a loss over writing it and moving on. Doubly so if you're defaulting to the same kind of decisions every time and so the style is consistent.

In a world full of pragmatic tradeoffs, dogmatism is rarely the right choice. (Unless you haven't learned the tradeoffs.)


My pithy description of software development is "a series of decisions in pursuit of a goal".

dogma invites people to stop using their critical thinking skills.

One of my favorite examples of this:

Everyone would agree that having a newborn in a car means safety is paramount. Everyone also agrees that left turns are less safe than right turns. No one would agree that this implies you should never make left turns in a vehicle with a newborn.

^But the above is how both security people and TDD proponents tend to act, as if there can be no risk assessment and critical thinking involved. We've all made right hand turns when we really wanted to go left because there was just too much traffic, even without a newborn in the vehicle.


I was thinking about no-tradeoff truths. But you’re right on those tradeoff situations.


There are very few ideas that are absolutely right. I don’t consider “facts” as ideas, although these days even the facts have alternatives too.


It doesn't matter how good the underlying idea is, a dogma is always bad.

Because ideas are highly unlikely to be universally correct, no matter how good they are. Even if an idea is supported by all available evidence, it MUST subject itself to scrutiny and possible falsification, all the time.

A dogma flies in the face of that. It is, by definition "any belief held unquestioningly and with undefended certainty" (quote from wikipedia). Once people follow an idea dogmatically, they are very likely to apply that idea no matter if it makes sense or not. They stop following logic and start following scripture.

It is bad enough if this happens in science, where we really do have systems so far supported by all available evidence. But it becomes a lot worse in areas like software development, where we know there isn't "the one" true way of doing things.

And to head off one likely reply to this: Yes, I apply this logic also to the assumption that "dogmas are always bad". If someone could present proof showcasing a dogma that only has good outcomes, with no negatives attached, then I am willing to change my mind on this.


just because people interpret a fact incorrectly doesn’t mean the fact stops being a fact, lol wth is this sub


What's a sub?


They're likely a reddit refugee referring to this site as a subreddit.


Oh man it's easy to spot someone who blindly follows Clean Code. I personally don't like it, but I am I fan of all of Martin's other books. It's just aggressively opinionated in a way that I just can't get behind. I'm sure I'm not alone but reading that book made me feel insane since he described things as objectively good that I found awful.


I don't know if it's because of clean code or because he calls himself my uncle or what else, but he's always rubbed me the wrong way.

I always ask people to think about printed pages and they look at me as if I'm crazy... But it's like, if you have to pick up a reference book or something, carefully find the right section addressing your problem, you want to read it, how many pages do you want to read at a sitting? For most the answer is ideally 1 but you can read 2-3 and still not get annoyed, right? If it gets longer than 10 then that's doable but not what you signed up for. Well if I print code onto those pages, and just assume that most of English lines are kind of filler which programming languages don't have, so no compression due to the sparseness of code, then you get ~40 LOC on a textbook page, ideally you would solve a problem in 40 LOC but if it took 120 LOC that is still perfectly readable, it's when it gets to 400+ish that something has really started to get confusing about the structure.

Same with diffs, 400-line diff is still reviewable, but barely.

The printed page isn't the point, the point is that these are kind of objective numbers, if I describe them in printed pages everybody seems to agree on these numbers... But then a book like Clean Code comes around, people want to have these tiny little scraps of an eighth of a page, bound together in a little flip book of half-index-card strips each pointing at other strips, “bake the pizza (see strip 37)”, and nobody thinks about whether this is actually an informational presentation mode that anybody really wants to use. “It works better for review time because you encourage only reviewing one or two pages of flip book at a time.” Yeah Bob I see what you're saying but, like, is this my “crazy uncle” now who insists that the usual book is going away because with the advent of Wikipedia and infinite content feeds all knowledge and story will forever be stored in such flipbooks? Just because editors who don't care about the overall story anymore because their attention span are shot to hell find it easier to review half an index card at a time? This is a good thing? Something feels off!

You get this same argument from people who believe in the layered server architecture. “Business logic needs to go in the business layer, database logic in the data access layer, presentation logic in the view layer, routing logic in the controller layer.” but you would never voluntarily read a book that was structured this way! “Matt saw Alice sitting there, a young girl of maybe 16, gorgeous in her melancholy and disaffected way, an old schoolmate of his. He waved. She beckoned. He said “Hi, how are you?” and she replied...” Right, the author gave you a data structure of adjectives to associate with Alice, you didn't have to flip to the Characters section looking for “ALICE_INTRODUCTION” and wade through all of the different ways she appears in the book to find “when Matt first sees her she is melancholy and gorgeous” and then flip all the way back to the story that you were reading, then flip to the Dialogue section looking for MATT_ALICE_INTRO_DIALOGUE, hope you left a bookmark back in the Main Story section back there! “Oh, but it is so easy to read the whole book if you can skim through the Main Plot part of the book without ever knowing anything about the characters or settings or repercussions or dialogue, “Matt saw Alice (ALICE_INTRO), she beckoned, they talked (MATT_ALICE_INTRO_DIALOGUE), he walked to the diner...”. And if you complain about the big all caps stuff someone says “well in a modern hypertext reader, those just become links and you never need to see them directly!” except you do because you have to maintain it... And it's like, I get it! You can probably compress most modern novels considerably if you remove all their descriptions and dialogue to appendices, it's not wrong! But writing is so much slower in that format, debugging is surprisingly so much slower in that format, the things that are faster are queries like “Did Alice ever mention her father to Matt in the recorded dialogue?” and then you make a refactoring change on the basis that Matt should not know anything about Alice's father and then it turns out that it generated a plot hole because somewhere in the Exposition layer the two were connected more obliquely, Alice wrote about it in a post-it on the fridge or something, that she was going to see her father who was ill.


The point of abstraction layers is more about responsibilities (and the abstractions).

For example, the DAL shouldn't know that a missing record is going to return a 404, instead it needs to be able to express record not found in it's API. The business layer should also not know that a record not found is going to return a 404. Instead the business layer needs to be able to express that a record was not found. The web layer needs to know that when a record not found is expressed, we return a 404.

This is why I'm not a fan of ORM's without a DAL. Too many people will sprinkle the ORM code directly into a controller and call it a day, and then ORM's will come up with all of these unmaintainable ways to "re-use" queries and all the nasty performance knobs that come with that.

And I'm not saying the gap between the layers needs to be that thick. If a DAL wants to hand back ORM models directly, more power to them, just disconnect them from the DB before you do. If the web layer wants to use those same models as the api contract, more power to them, that can always be fixed when and if they diverge.

And it's not as if these layers themselves must be proper layers, that's what I meant when I said responsibilities. The web layer is responsible for web concerns (security, api contracts, etc). If you want to treat the controller as an orchestration mechanism that you get from 15 different Dependency Injected services, great. It doesn't need to be a physical layer, but it should be a logical layer and the layers below being able to express everything the layer(s) above need to know is an important part of that sort of design.


I was this dev early in my career. A sharp overreaction to a giant ball of mud architecture with no tests and minimal consistency. I read all those books looking for some better way and inflicted all those rules on people.

I don't regret the learning, but I do regret being dogmatic. It was interesting that no one around me knew better either way, or felt they could provide reasonable mentorship, so we went too far with it. These days I write the pizza function on the left, and use comments sparingly where they add context and reasoning.


I just inherited a code base written by somebody in that exact same situation. Saving a single file associated with a model in 27 steps instead of about 4.


Clean Code says "Functions should not be 100 lines long. Functions should hardly ever be 20 lines long".

I think both 100 and 20 are a bit low, but much better than 5. As I mentioned in a comment a few days ago when I also corrected someone that misremembered a detail from the book, I am not a huge fan. But I also think it is mostly correct about most things, and not as terribly bad as some say. Listening to fans of the book is more annoying than to actually read the book.

(And that other comment when I corrected someone was about bad comments. Clean Code definitely does not say that you shall never comment anything.)


I went back to check and this is what the book says, verbatim:

"Every function in this program was just two, or three, or four lines long. Each was transparently obvious. Each told a story. And each led you to the next in a compelling order. That’s how short your functions should be!"

So I think it's fair to say the book advocates for functions 2-4 lines long.

And about comments, from the book:

"So when you find yourself in a position where you need to write a comment, think it through and see whether there isn’t some way to turn the tables and express yourself in code. Every time you express yourself in code, you should pat yourself on the back. Every time you write a comment, you should grimace and feel the failure of your ability of expression"

"Some comments are necessary or beneficial. We’ll look at a few that I consider worthy of the bits they consume. Keep in mind, however, that the only truly good comment is the comment you found a way not to write."

With opinionated sentences like these, it's not hard to see how one would read the book and adopt a "no comment" mindset.


It also completely misses the point of why comments are useful.

"Store user.age in age variable", is a useless comment which is indeed better expressed with clear code.

"Store user age in struct because when xyz() iterates over this it has no way to access the user object" is useful because it tells us why something is done, where it is used, and why the obvious solution isn't right.


I fought and lost this battle with one of our teams.

The tech lead insisted they use XML comments (Visual Studio) for everything.

///<Summary> ///Represents the User ///</Summary> public class User { ///<Summary> ///Users Age //</Summary> public int Age {get;set;}

  ///<Summary>
  ///Users First Name
  //</Summary>
  public string FirstName {get;set;}
}

ad nauseum.

Here's the thing. Swagger (.net) can pick up the XML file generated from these comments and it gives developers the ability to add more information to the Open API Spec file (swagger generates a UI off of it).

So it has a legitimate use, but if you don't have anything than to repeat what the damned code already says, it's harmful to the readability of the codebase.


A classic criticism of CC: https://qntm.org/clean


I love qntm's writing, be it criticism a technical book or blowing my mind with science fiction. I strongly recommend "There Is No Antimemetics Division".


I think that no sensible rule of thumb is possible unless we're specifying a language, because language "density" can vary so greatly.

I follow wildly different rules of thumb about everything from line of code count to how many methods per class to whether or not that functionality belongs in a class in the first place, depending on whether I'm writing Java or Python or F# or Racket.


It's not from Clean Code, but Refactoring.


I think I completely agree with that line. 5 is a nice goal to aim for. Sometimes you hit 1, some things really do take 20. 100 lines is almost always a bad idea (unless it's a 100 really boring and obvious lines).

I haven't read the book, and I can see how people can go overboard and can turn good advice into a caricature of it, but short, well-named functions that focus on a single thing are generally better than long ones that do dozens of different things. Separate those concerns.


I literally just pushed a 101 line function to prod that is named "download_and_extract" that downloads some files from a place, extracts them, then has a lot of error checking and a couple of logging statements and hands off to a few smaller functions to move and re-arrange files. It is long but it is readable and doesn't really fit a more abstract way of doing things. But that's my style I guess.


Length of function is not even a metric, it is at worst a code smell.


> I worked with a guy who extracted each and every boolean condition to a function because "it's easier to read"

Obviously, readability is important, but I've also seen things like this so often in my career where it's used as an excuse for anything. Most recently, trying to stop a teammate turning nearly every class into a singleton for the sake of "simplicity" and "readability", which I thought was a real stretch.


> pretty sure the book itself says no more than 5 lines for each function

The book was written by a Java dev who was dipping his toe into Ruby.

Go code, covered everywhere in an obnoxious rash of error handling, will be bigger.


That's why I don't read books.


>In this case I hope nobody is proposing a single 1000-line god function.

Why not? Who said it's worse? What study settles the issue?

Some times a "1000-line god function" is just what the domain needs, and can be way more readable, with the logic and operations consolidated, than 20 50 line functions, that you still have to read to understand the whole thing (and which then someone will be tempted to reuse a few, adjust them for 2-3 different needs not had by your original operation, and tie parts of the functions implementing your specific logic to irrelevant to it use cases).

And if it's a pure 1000-line function, it could even be 10,000 lines for all I care, and it would still be fine.


Yeah, when code gets spread out across too many classes and functions, it's like you're trying to navigate a maze without a map. You hit a breakpoint, and you're left scratching your head, trying to figure out what the heck each class is supposed to do. Names can be deceptive, and before you know it, the whole architecture feels like a jigsaw puzzle. It's a cognitive load, having to keep track of all these quirks. Maybe it was easier for the author to do it that way when they started from scratch, but after they finished, it's another deal.


Okay, I don't care much about all of the unproven "software engineering" cargo cult rituals, but maybe 10,000 lines is pushing it a bit!


1000-10000 lines typically mean the developer just doesn’t know how to abstract. Don’t go overboard with the function extraction but also don’t make me read every line of your code so I can find the one tiny part I want to change.pseudo-functions, like the commented segments of code like in the linked post, helps but it’s not obvious which data those segments of logic are depending on.


I think the only good use case I have for 50+ lines functions are finite state machines and renderers, whatever the form.

Do you have other examples of 50+ lines functions where you thought it was the best to not separate issues?


Specialized parsers. Encoding and decoding tasks. Complex computation is often isolated to help with peer review. Pattern matching routines.

Also, constructors with many validation steps that are compiler constrained to their local scope. That seems common.


Aren't specialized lexer/parsers finite state machines? As are encoder/decoders?

OK fir complex computation, I left the world of mathematics 7 years ago, and I wasn't at the edge on that, I trust you, but to be clear, all your examples scream 'FSM' to me. If you have a pattern matching routine of 50+ lines that isn't a finite state machine, you're doing something wrong imho, and should consider changing abstraction (I'm not a big OO guy, but maybe use dynamic dispatch?)


Yeah any routine that is not Turing complete is a FSM. Most things are FSM, so it’s not a useful distinction


Anything consisting primarily of a switch statement with a great many cases.


Finite state machines then. I never used more than 3 cases unless I had to write one.


Also event dispatchers. I've written switch statements for various event systems that need to handle 60-100 events. You can easily get to hundreds of lines without things getting unreadable.


If we go with the cooking analogy, if you have to describe to someone how to cook a meal, and at one part of the meal you have to put the fond in, it is reasonable to explain how to make the fond in a seperate section. The fond is it's own thing and it has one touching point with the food,therefore it is okay (or even benefitial) to move it out.

Also: cooking recipes are also very abstracted. When they say you need to lightly fry onions they assume you know a way to cut onions and a lightly frying algorithm already. If they would inline everything it would become unreadable.

Code is very similar. If you want it strictly without abstractions it will be as low level as your language allows you, and that is definitely not readable code.

If you e.g. instead of using pythons "decode" method tries to do unicode decoding yourself it would become very hard to understand what your program is actually about. Now there are probably zero people who would do that, because the language provides a simple and well tested abstraction — but what makes that different from you creating your own simple and well tested abstraction and using that throughout the actual business logic of your code?

The hard part is creating abstractions that are so well chosen that nobody will have to ever touch them again.


To stay with the fond analogy: It gets interesting if the fond preparation involves deglazing a pan (mutable environment) with meat bits and juices left at the bottom (state/precondition). Two options:

- Linear code: The meat frying (state-producing) and deglazing (state-requiring) steps are below each other in the same recipe, so to verify that it works you can just linearly go through line by line. However if the recipe becomes long and a lot of stuff happens in between, it's no longer obvious. You'll have to use good comments ("// leave residue in the pan, we'll need it for the fond") because otherwise you might accidentally refactor in a way that violates the precondition (swaps/scrubs the pan).

- Modular code: You need to clearly describe the precondition on the fond preparation subroutine to have any chance to keep using it correctly. On one hand this forces documentation, on the other hand it's probably still easier to forget since the subroutine call ("Prepare the fond.") doesn't directly make the precondition obvious.

Either way has its advantages and drawbacks, and the right choice depends on the circumstances. This is assuming you only want to cook this specific meal and aren't writing a cookbook - otherwise you should definitely modularize to remove repetition.


> But at the same time, don’t be over-eager to abstract, or mortally offended by a few lines of duplication. Premature abstraction often ends up coupling code that should not have to evolve together.

A relatively common piece of feedback from me to the team at work is usually to take a half step back and look at the larger problem domain and consider whether these things are necessarily the same, or coincidentally the same.

Just because the lines of code look similar right now doesn't mean they need to be that way or need to stay that way. Trying to mash together two disparate use cases because "the code's basically repeated" is often how you get abstractions that, especially over time, end up not actually abstracting anything.

As the various use cases get too divergent, the implementations either move much of the logic up to the caller (shallow abstractions, little value), or expose the differences via flags and end up with two very different implementations under the hood side-by-side (less clear than two independent implementations).


> In this case I hope nobody is proposing a single 1000-line god function.

I’ll take well-structured 1000-lines function over bad spaghetti of hundreds small functions any day.


Have you ever seen a well structured 1000 line function?

I'm sure they exist - maybe some sort of exceedingly complicated data transform or something. But in almost every situation I've seen, a 1000 line function has countless side effects, probably sets a few globals, takes loads of poorly named arguments, each of which is a nested data structure which it reaches deeply into and often has the same for loop copied and pasted 10 times with one character changed.

Often a 1000 line function is actually 5 or 6 20 line functions. I'm sure there are legitimate exceptions, but I've never seen them.


https://github.com/github/putty/blob/master/terminal.c#L3281

This function is 1830 lines long. It's reasonably well structured I think. Although the #if 0 are maybe not so good.


A lot of those if-/case-blocks are precisely where I'd put functions :)

If you changed a bunch of those to separate, pure (i.e. side-effect-free) functions it would if nothing else make unit testing a breeze, and then you'd be free to fix bugs in the logic without fear. As it is, if I had a bug in that huge function I'd be really worried about breaking some edge-condition or implied-state 500 lines up etc.


They can’t be side effect free, that’s the point. The switch statements are mutating the input.


There's a lot that can be, take this for instance (line 3336):

  if (c == '\033')
      term->print_state = 1;
  else if (c == (unsigned char)'\233')
      term->print_state = 2;
  else if (c == '[' && term->print_state == 1)
      term->print_state = 2;
  else if (c == '4' && term->print_state == 2)
      term->print_state = 3;
  else if (c == 'i' && term->print_state == 3)
      term->print_state = 4;
  else
      term->print_state = 0;
This could be turned into a pure function that takes c and print_state as input, and returns a new print_state which the outer function assigns. That's 12 lines turned into 1.

  term->print_state = newState(c, term->print_state);
(I am not a C developer so the sytax could be wrong). Just because the outer function is impure doesn't mean it can't in turn call pure functions.


It's a Mealy state machine, it should be encoded as a state machine!

Why some folks will absolutely insist that a series of manually-written case statements in a single thousand line function is the epitome of style, when there's a pure state machine model inside wishing you would free it from the shackles of if else if else if else....


I have some experience with state machines, and I absolutely hate them. It's so easy to get lost, you never know where a function will go to next.

There's a reason stacks are ubiquitous, they're much easier to fit in your head.


You can still have side-effects in your “big” function, based on the return values of pure functions.


I totally agree. And you can always write mutating “functions” (what get called “algorithms” in the C++ world), like C++’s `std::ranges::sort(range)`.


You can always use return values, and allow the caller to decide how to use them (e.g. caller may mutate other values).

Depending on use and language, this may be "expensive" (e.g. you could be allocating for and then copying some huge data structure only to pass it back where is is simply copied over the top of the input), but this is where discretion comes in and decisions are made on what is appropriate (i.e. is performance critical, or is correctness and maintainability more important?)


Oh my God. The sneaky `else` at line 3400.

edit: another one at 3826 with a preprocessor define interleaved.


Well that's horrendous. Sorry, Simon. Each of those big "switch" statements should be broken out.


Why? That would create lots of small functions that each is called only once from only one place in the code and still are extremely coupled to the invoking switch statement, often having multiple inputs and multiple outputs which then the invoking statement needs to write in the proper places to which the function wouldn't have access.

Breaking out functions makes sense when you get either reuse or decoupling, but in this case you don't get any of these.


Because it would objectively and shockingly obviously make the code easier to understand.

You would be able to see what it does at a high level at a glance and then drill down into the functions and sub-functions to focus on a particular part of it.

With this version, to get the high level overview you have to scroll through multiple pages of code and find the comments that say what each section does.


So the problem is being able to drill down? Most decent IDEs allow you to collapse code blocks (i.e brackets) if that's what you want to do, and the comments accomplish the same thing a function name would. Some editors support region comments that start out collapsed and can be named. I don't see why separate functions would be anyones first instinct to reach for here.


I don't see why anyone's first instinct to reach for would be relying on some editor's ability to optionally collapse code blocks.

What happens if someone wants to use a different editor that doesn't have that feature or doesn't display it quite right anymore?

Lol.


I`m not OP, but yes, I saw.

Thats my personal opinion, and nothing more:

Something like complex one time financial/workflow/maintenance operation that includes calling dozens of different smaller functions, but very well structured.

It does not make sense to separate it more into different functions, because execution is generally linear and having to deal with tree of calls where some calls is depends on state of the previous is become cumbersome and makes reading and making changes more complex.

Again, thats my personal feeling, and nothing more.


Going further, I'll take a 1000-line shitty code, over split-to-small-functions shitty code. In the long code, all I have to think about is the code. With the functions, I have to pay attention to what calls what, also also because the code is shitty, surely the function names also are, adding two things at the same time to the confusion mix.


Even better instead of a interrupt driven state machine implemented as a switch statement that linearly progresses. Do it using 20-25 small chained callbacks spread over a couple of files.

Bonus: Uncle Bob teaches us not to use comments.


This sounds like a nightmare. One that I've lived through.


It is easy to nod along when someone speaks about different styles. But there are also a few objective truths down there, and it makes sense to try to identify them.

For example, I have been at this for over three decades now, and there are some things that almost never fails. From the article, the kind of person who advocates for the more "testable" code with a few more lines and more abstractions, is never the same person who can maintain that codebase a handful years later.

That should tell us something. For what it's worth, I agree with the article that simpler is better, which often coincides with fewer lines of code. I personally wouldn't have chosen objects that look like "pizza.Sliced = box.SlicePizza()" but most of the time the structure is already in place and it is best to go along with it.

As to that 1000 line function, if it is in an imperative style it might well be the easiest form to read. Have you seen the Python source code? That language success owes to a simple interpreter with ginormous functions that anyone and their brother can read from top to bottom and dare modify without having the brain the size of a planet.


> In this case I hope nobody is proposing a single 1000-line god function.

this made me feel a certain type of way. (dont ever look at video game source code, by the way; 1000-lines is quite short by some standards)

if a 1000-line long main is what makes sense then you should do that.

I find 1000-line long methods which are linear far easier to read than code which has every method call broken out into its own method. it's so bad I literally can't read JavaScript that is written in the contemporary style anymore. absolutely impenetrable for me.

it's true that I am not a "real" developer in that I don't work on code full-time, but I've written probably millions of lines of code in my 30-year career. I am not a novice.

if the solution calls for a 1000-line main method, then that's what I'm writing, "best practices" can go in the corner and cry. I'm writing what I need to solve the problem and nothing more.


My biggest pain is Javascript developers who get to high on Java concepts, most often after using NestJS. Providers, Models, Services and what not.

Remember an import script I wrote in ExpressJs. Was like 50 lines. Did things like copy databases, clean up config etc. There were hardly any layered ifs, just steps, I didn't see much use in breaking it up, was easy to read.

Another developer, who was smart but liked abstract concepts, overenginered the hell out of it, moving it to 20 places, a bunch of provider, and I could never find & make sense out of it after that, was very hard to read was going on. Was such a pain always to update it.


The main reason I have a distaste for dependency injection is because of this, promotes separating code into multiple places and over-abstracting things, making code hard to follow. Most of the times it is not worth the trade-off.

Doing module mocking for unit tests instead of dependency injection in runtime code is almost always a better idea in my opinion. Dependency injection was invented for languages that can't do module mocking.


> So where do we split things?

Cyclomatic complexity: https://en.wikipedia.org/wiki/Cyclomatic_complexity

Overhead: https://en.wikipedia.org/wiki/Overhead_(computing)

Some programming language implementations and operating systems have more overhead for function calls, green threads, threads, and processes.

If each function call creates a new scope, and it's not a stackless language implementation, there's probably a hashmap/dict/object for each function call unless TCO Tail-Call Optimization has occurred.

Though, function call overhead may be less important than Readability and Maintainability

The compiler or interpreter can in some cases minimize e.g. function call overhead with a second pass or "peephole optimization".

Peephole optimization: https://em.wikipedia.org/wiki/Peephole_optimization

Code linting tools measure [McCabe,] Cyclomatic Complexity but not Algorithmic Complexity (or the overhead of O(1) lookup after data structure initialization).


> Also iteration. Just because the first place you tried to carve an abstraction didn’t work well, doesn’t mean you give up on abstractions;

C. Muratori calls this method "semantic compression" . https://caseymuratori.com/blog_0015


What's described there is what I understand DRY ("don't repeat yourself") and the associated "rule of three" to mean.


As soon as conversations stray into lines of code etc. I think we've veered directly into Goodhart's Law.


Sometimes I use an anonymous scope instead of extracting a single use function. This is especially nice when you would otherwise have many parameters/returns


The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.

I think you have to consider things like reusability and unit-test-ability as well, and having all your code in a single function can make reasoning about it more difficult due to all the local variables in scope that you need to consider as possibly (maybe or maybe not) relevant to the block of code you’re reading.

That being said, when I look back on my younger, less experienced days, I often fell into the trap of over-refactoring perfectly fine linear code into something more modular, yet less maintainable due to all the jumping around. There is something to be said for leaving the code as you initially wrote it, because it is closer to how your mind was thinking at the time, and how a readers mind will also probably be interpreting the code as well. When you over-refactor, that can be lost.

So I guess in summary, this is one of those “programming is a craft” things, where experience helps you determine what is right in a situation.


> The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.

One of the best reviewed functions I wrote at work is a 2000 line monster with 9 separate variable scopes (stages) written in a linear style. It had one purpose and one purpose only. It was supposed to convert from some individual html pages used in one corner of our app on one platform into a carousell that faked the native feel of another platform. We only needed that in one place and the whole process was incredibly specific to that platform and that corner of the app.

You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them. Yet, each step had subtle assumptions about what happened before. The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own... For code that's barely ever needed elsewhere. We even had some code that was similar to some of the middle parta of the process... But just slightly didn't fit here. Changing that code caused other aspects of our software to fail.

The method was not any less debuggable, it still had end to end tests, none of the intermediate steps leaked state outside of the function. In fact 2 other devs contributed fixes over time. It worked really well. Not to mention that it was fast to write.

Linear code scales well and solves problems. You don't always want that but it sure as hell makes life easier in more contexts than you'd expect.

Note. Initial reactions to the 2000 line monster were not positive. But, spend 5 minutes with the function, and yeah... You couldn't really find practical flaws, just fears that didn't really manifest once you had a couple tests for it.


I don't know if it is still like this, but the code for dpkg used to be like this, and it was amazing: if you ever needed to know in exactly what order various side effects of installing a package happened in, you could just scroll through the one function and it was obvious.

To this end, I'd say it is important to be working in a language that avoids messing up the logic with boiler plate, or building some kind of mechanism (as dpkg did) to ease error handling and shove it out of the main flow; this is where the happy path shines: when it reads like a specification.


I don't think the fact that a function works well is a good enough reason to write a 2000 line function. Sometimes there are long pieces of code that implement complex algorithms that are difficult to break into smaller pieces of code, but those cases are limited to the few you mentioned.


Computers execute code in a linear fashion, why on earth would you "need a reason" to NOT abstract something? Just because abstraction is often the right thing to do doesn't make it the base case.

It's like saying you need a reason not to add 4000 random jumps in your assembly code just to make it more difficult to read...


Source code isn't written to be executed by computers, it's written to be read by other humans.

Source code tends to be very far removed from how computers execute anything, so I wouldn't use that as a justification for any sort of code style.


> Source code isn't written to be executed by computers, it's written to be read by other humans.

It is pronounced "documentation".


> that implement complex algorithms that are difficult to break into smaller pieces of code

My longest code is always image processing. It's usually too hard to break up for the sake of breaking up. There's nothing to reuse between the calls to filters/whatever.


The default should be reversed, don't break into smaller pieces unless there's a really good reason.


>I don't think the fact that a function works well is a good enough reason to write a 2000 line function.

The fact that it works well and reads well (when it does, as in the parent's case), is.

Aside from those factors what else would be against it? Dogma?


I guess all we know is there were 2K lines of code and the commenter thinks that was the right way to do it. It would be necessary to see the code to appropriately critique it.


Not just the commenter, but his team as well. It passed code review with flying colors, apparently. The moral of the story is that there always exceptions and developers should not be ideologically committed to one approach above all else.


we know more than that: You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them. Yet, each step had subtle assumptions about what happened before.

what we don't know is if it would have been possible to abstract those assumptions away so that functions could have been defined without them.


We do know that if we trust the poster, they said very clearly it could have been done but they didn't consider the value to outweigh the downsides.


yes, i meant we don't know if it would have been possible to extract functions in such a way that they are actually safely reusable.


Even the contrived example in the post can be factored differently (and better imo). How do we know those 9 scopes are appropriate?


>The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own

Why? Why can't the functions say "to be used by <this other function>, makes assumptions based on that function, do not use externally"? Breaking out code into a function so that the place it came from is easier to maintain... does not mandate that the code broken out needs to be "general purpose".


Specifically, in that place, there was no need. And prematurely splitting it would have caused us to overthink and over generalize. Having a long, linear and tested function was a better choice.


I understand your point, but perhaps that would have simply been an opportunity to refine your approach to code design. If such a situation leads to excessive deliberation and overgeneralisation, your code base must be riddled with unnecessary overthinking and overgeneralisation.


Or maybe it was just a long, sequential algorithm where breaking it up wouldn't have been an improvement.


I have been programming for more than 30 years. Except for code generated explicitly to be only consumed by machine, I've never come across a function consisting of 2000 lines of code that should not have been broken up. Something is wrong there, and if you show me the code, I'll tell you what's wrong with it.


Glad you can see that without even looking at the code.


Some things you don't have to see to know whats going on. Function with 2000 lines of code? Have fun rationalising this.


I worked with an engineer that wrote the most clear and elegant linear code. It was remarkable, never seen anything like it since. I can't reproduce it but I do have an idea of what a well designed linear function looks like.. a story.


I was just thinking that if I _needed_ to refactor this I might structure the stages as chapters in a book. One might be able to write an inner class or some such that had a “table of contents” function that called each stage in sequence as a void function with data managed out of line, maybe via cleverly designed singleton structs. Then the code itself can be written in order with minimal boilerplate between stage boundaries.

I think I’ve worked with some Python that looked and worked this way. I can’t place the details but probably in a processor pipeline running over a particularly hairy data format. Consider ancient specifications written by engineers talking on the phone encapsulated in relatively “modern” but still vintage specifications, sometimes involving screen-scraping a green screen mainframe terminal, wrapped in XML and sent over the internet. Anyway, point is I couldn’t agree more about stories.


I will agree that it takes some skill, not that I am great at it. It's a different kind of skill than abstraction. Reading error handling in c code offered good insights for me to learn linearity better (c code that uses goto to jump to the end of a function for cleanup when an error occurs, for example).

However, if you screw up linear code, you screw up locally. If you write poor small functions, the rest of the team screws up because they barely ever read the contents of your functions that call other functions that call other functions. I've had way more problems with stuff being called slightly out of order, than with large functions.


That is true of well designed nonlinear code as well.The code needs to tell a story or it will be a mess.


You don't have to write tests to prove that private methods work on their own. Just test the public behaviour.


At first I thought how horrible, but basically you have sort of 9 functions within the same scope, each having a docstring. So I guess not too different from splitting them up.

I read you have "end to end" tests.

One question though: Wouldn't each part benefit for having their own unit tests?


Maybe, maybe not. For our particular case it would have been mostly wasted effort.

I found that I like to write tests at the level of abstraction I want to keep an implementation stable. I'd be totally fine if someone went in and changed the implementation details of that long process if needed. We cared that stuff got cleaned up at the end of the process, that the output matched certain criteria, that certain user interaction was triggered and so on... In that case it made more sense to test all our expectations for a larger scope of code, rather than "fix" the implementation details.

Tests usually "fix" expectations so they don't change from build to build. Tests don't ensure correctness, they ensure stuff doesn't alter unexpectedly.


Tests effectively freeze requirements; you should test those things which should be preserved throughout any changes, and not test those things which should be open to change. In this case, it seems that is no real requirements for any of these 9 steps - perhaps the implementer could figure out how to do the same outcome by skipping a step or merging two steps, and the existence of unit tests for these 9 functions somehow encodes the idea that these 9 functions each are inherently needed, which is not necessarily true.


>One question though: Wouldn't each part benefit for having their own unit tests?

Not necessarily better, especially since this allows for the case where individual unit tests pass fine, but the combined logic fails.


If the sub-functions could be reused and people would be tempted to change them, then that’s what your tests are for. In fact, it’s often tricky to test the sun-function logic without pulling them out because to write the test you have to figure out how to trick the outer function to get into certain states. Follow the Beyoncé rule: if you like it: put a test on it. Otherwise it’s on you if someone breaks it.


> You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them.

Good thinking. Now they’ll just add 50 flags and ten levels of nested ifs instead which is much simpler.


2000 lines is like a small project. I cant imagine putting that all in one function.


>”but then devs would be tempted to reuse them”

Isn’t that the fucking point? Having a 2000 line function is a code smell so bad, I don’t care how well the function works. It’s an automatic review fail in my book. Abstractions, closures, scope, and most importantly - docs to make sure others use your functions the way you intended them. Jesus.


Some devs did find it a code smell... But each scope had a clear short high level comment describing what it did, there were end to end tests for the method, and very little state flowed from scope to scope (some did) - because that's what scoprs do... Prevent variables from leaking.

My point is the code smell isn't always accurate, and there are times and even for 2000 line monsters other devs agreed that it was the best way to hide complexity away from the rest of the codebase in that case. If we ever needed to factor things out (we never did), we could spend some effort and do it.


Have you tried reading code instead of smelling it?


A code smell means you should look into it, not that it's wrong.

Some things are genuinely 2kloc-complex. Maybe not that many. Do check! But some are.


Definitely not that many. Even for me this was an outlier, but it made me more comfortable with functions most people would consider long.

I'd like to clarify this was not necessarily 2kloc-complex, this was just 2kloc-long-and-not-really-meant-to-be-reused. It was a fairly long but linear process that was out of the ordinary for the rest of the codebase. It could easily have been split (hell, I had 9 fairly separate stages), but calling any of the intermediate stages out of order or without the context of the rest of the execution flow... would have been a foot gun for someone else. And, as time showed, we never needed those stages for anything else.


Agreed. I’ve written plenty of software of all kinds and have never had to write a 2000 line long methods (although I have had the joy of refactoring such messeses a time or two).

Just don’t do that. Your code doesn’t have to have abstractions out the wazzo, but if your class (or method) is getting bigger than 1000 lines that’s a great sign that it’s doing too much and abstractions can be teased out. Your future self will thank you, as well as your team.


I like this from Sandi Metz:

> You can't create the right abstraction until you fully understand the code, but the existence of the wrong abstraction may prevent you from ever doing so. This suggests that you should not reach for abstractions, but instead, you should resist them until they absolutely insist upon being created.


At least in the mobile world, I find that this “no abstraction” approach is the default one, and it usually leads to huge objects which do everything from drawing views to making network requests to interacting with the file system. These kinds of classes are quite hard to work in, hard to test, and also keep snowballing to get bigger and bigger. Things usually end with unmaintainable code and a full rewrite.

I am not saying you need to create complex abstract hiarchies right off the bat. But usually, it’s pretty easy to tease out a couple significant abstractions that are very obvious, and break down your classes by a factor of two or three. Just getting such low hanging fruit will prevent you from ever having a 2000 line long method.

And for the folks who are saying that they make sure to not add abstractions too early - are you disciplined enough to go back and add them later? I feel like if you’re the kind of engineer that busts out 2000 line methods, you’re also not going to refactor it as this method grows to 2500 or 3000 lines or beyond.

Probably most robust software you depend on is full of solid, quality abstractions. Learning to write code like this takes practice. The wrong abstraction might be wrong, but it’s one step closer on your journey to growing as an engineer. You won’t grow if you never try.


So where's the proof that the function'd code scales? As the complexity of the overall code grows, so would something that gets chopped into dozens of functions to the point of being unreadable.

Suddenly, you realize that the dozens of functions __need to be called in specific orders__, and they are each only ever used once. So really what you're doing is forcing someone to know the magic order these functions are composed in order for them to be of any use.


The truth is that either one can be done wrong.

Unfortunately organizing your code along the right lines of abstraction is something that just takes skill and can't easily be summarized in the form of "just always do this and your code will be better"

If you organize your code into units that are easy to recompose and remix, well you get huge benefits when you want recompose and remix things.

If you organize your code into units that can't be easily recomposed, then yes you've added complexity for no benefit. But why make units that can't be treated individually?

"As the complexity of the overall code grows, so would something that gets chopped into dozens of functions to the point of being unreadable."

So the answer to this is, "don't chop it into functions in a way that leaves it unreadable, instead chop it into functions in a way that leaves it more readable."

That may be unsatisfying, but it gets to the point that blindly applying rules is not always going to lead to better code. But it doesn't mean that an approach has no value.


There's an easier approach that will also aid you in telling you how to precisely chop up your function.

Simply don't chop up your function until you need a slice of it somewhere else. Then refactor out the bit you need. You'll find out exactly which bits need to be replaced with variables and exactly where the slice needs to happen.


This is the correct answer right here if you have a good enough team. It is still the way I want to work. Unfortunately, I find that there are too many developers who haven't learned that you should always be considering to "refactor as you go". I'm trying to teach by example, but it's an uphill battle.


Exactly. Start with the straight-ahead linear approach and factor out once it's unwieldy.

Same thing for copy pasta funcs -- the first copy is fine, the second one may be too, but after that consider extracting to a parameterized func (a permutation of the Go Proverb "A little copying is better than a little dependency.")


A single use function absolutely makes sense - you are effectively naming a block of code in some way, documenting it.


The API shouldn't be that. Expose something easy to use. That is the point of abstractions. It doesn't matter if there are a dozen methods called in order if those dozen methods are called by a helper method, beyond maybe some implementation details.

Really the question should always come up when there are more than say two ways to do things. If I can make a pizza from scratch, reheat a chilled pizza, create a pizza and chill it, reheat a half dozen pizzas, or make three pizzas of the same kind and chill them suddenly the useful abstractions are probably something you can figure out between those helper methods.

Honestly that is the real fear of the left way of thinking. If you add a quantity, whether to cook and whether to chill parameters you end up with a hard API where certain combinations of parameters don't make sense.

Have a clean API and make the implementation as simple as is feasible. Reuse via functions when it makes sense but don't add them willy nilly.

Aka "it is a craft and you figure things out" as someone said in the comments here


I'm very dubious of anyone resorting to "readability" as a justification.

What you're doing by breaking things into functions is trying to prevent it's eventual growth into a bug infested behemoth. In my experience, nearly every case where an area of a code base has become unmaintainable - it generally originates in a large, stateful piece of code that started in this fashion.

Every one who works in said area then usually has the option of either a) making it worse by adding another block to tweak it's behaviour, or b) start splitting it up and hope they don't break stuff.

I don't want to see the "how" every time I need to understand the "what". In fact, that is going to force me to parse extraneous detail, possibly for hundreds of lines, until I find the bit that actually needs to be changed.


> What you're doing by breaking things into functions is trying to prevent it's eventual growth into a bug infested behemoth

Not every piece of code grows into a bug-infested behemoth. A lot of code doesn't grow for years. We're biased to think that every piece of code needs to "scale", but the reality is that most of it doesn't.

Instead of trying to fix issues in advance you should build a culture where issues are identified and fixed as they come up.

This piece of code will be a pain to maintain when the team gets bigger? So fix it when it actually gets bigger. Create space for engineers to talk about their pains and give them time to address those. Don't assume you know all their future pains and fix them in advance.

> In my experience, nearly every case where an area of a code base has become unmaintainable - it generally originates in a large, stateful piece of code that started in this fashion

In my experience it gets even worse with tons of prematurely-abstracted functions. Identifying and fixing large blocks of code that are hard to maintain is way easier that identifying and fixing premature abstractions. If you have to choose between the two (and you typically do), you should always choose large blocks of code.

The great thing about big blocks of code is that their flaws are so obvious. Which means they are easy to fix when the time comes. The skill every team desperately needs is identifying when the time comes, not writing code that scales from scratch (which is simply impossible).


> Suddenly, you realize that the dozens of functions __need to be called in specific orders__, and they are each only ever used once. So really what you're doing is forcing someone to know the magic order these functions are composed in order for them to be of any use.

That's where nested functions show their true utility. You get short linear logic because everything is in functions, but the functions are all local scope so you get to modify local scope with them, and because the functions are all named, it is easy to determine what is going on.


In a decent programming language you can nest functions, so all the little functions that make up some larger unit of the program are contained within (and can only be called within) that outer function. They serve less as functions to be called and more just as names attached to bits of code. And since they can't be called anywhere else, other people don't need to worry about them unless they're working on that specific part of the program.


If you have dozens of functions that need to be called in specific orders, design and use a state machine and then use a dispatch function that orchestrates the state machine.


Dozens of functions need to be called in a specific order?

Oh my God.


> but the idea doesn’t scale.

You are wrong here.

> this is one of those “programming is a craft” things, where experience helps you determine what is right in a situation.

You are right here.

The key insight on why giant linear functions are often more readable (and desirable) is because they allow you to keep more concepts/relationships simultaneously together as a single chunk without context switching which seems to aid our comprehension. An extreme proponent is Arthur Whitney (inventor of the K language) who writes very terse (almost incomprehensible to others) code so as to accommodate as much as possible in a single screen.

Two examples from my own experience;

1) I found reading/understanding/debugging a very large Windows message handler function (i.e. a WndProc with a giant switch statement containing all the business logic) far easier than the same application rewritten in Visual C++ where the message handlers were broken out into separate functions.

2) The sample code for a microcontroller showed an ADC usage example in two different ways; One with everything in the same file and another where the code was distributed across files eg. main.c/config.c/interrupts.c/timer.c/etc. Even though the LOC was <200 i found the second example hard to understand simply because of the context switch involved.


> The key insight on why giant linear functions are often more readable (and desirable) is because they allow you to keep more concepts/relationships simultaneously together as a single chunk without context switching which seems to aid our comprehension.

The problem with giant linear functions is that those concepts get separated by sometimes thousands of lines. Separating out the high-level concepts vs the nitty-gritty details, putting the latter in functions that then get called to implement the high-level concepts, does in my experience in most cases a better job of keeping related things together.


See my other comment here : https://news.ycombinator.com/item?id=37522577

The issue is one of Policy vs. Mechanism - https://en.wikipedia.org/wiki/Separation_of_mechanism_and_po...

It is "Mechanism" which should be separated out and encapsulated while "Policy" (aka business logic) is what is better centralized as a linear (possibly large) function.


YEAH, but the moral that should be taken from that is not "it's always better to write huge, linear functions". Rather, "there are cases where huge, linear functions make sense because of the way the code needs to interact with things". Along the same lines, there are cases where breaking the code up into smaller functions, and calling them from the main function, makes more sense".

> Linear code is more readable

^ Wrong

> Linear code is sometimes more readable

^ Better


Not quite what i was trying to convey. Linear code actually has a lower burden on one's cognitive load and thus easier to comprehend. But of course it breaks down at certain sizes (variable) which is when it makes sense to partition it into smaller pieces and apply Abstraction etc.

See for example Cyclomatic Complexity - https://en.wikipedia.org/wiki/Cyclomatic_complexity


> Linear code actually has a lower burden on one's cognitive load and thus easier to comprehend. But of course it breaks down at certain sizes ...

I agree with this, but I think the combination of those two sentences winds up being "linear code is sometimes easier to comprehend, and sometimes not". The statement "linear code is easier to comprehend" is misleading. Your statement makes it seem like "at certain sizes" is the edge case; whereas, in my opinion, it's the only case that really matters. For a small enough block of code, "easier to comprehend" becomes a moot point.

> See for example Cyclomatic Complexity

I think that's only tangentially related. Cyclomatic Complexity deals with branching, which is somewhat orthogonal to refactoring out code to separate functions (though refactoring can make the branching easier to read, since it shows more in a smaller area).


> an extreme proponent is Arthur Whitney (inventor of the K language) who writes very terse (almost incomprehensible to others) code

But k has a small set of built-in commands and a built-in database; it was made for fast analysis of stock information, so with that you have everything you need and you use the same semantics. The only thing you need to know is the data structure and you can build whatever you need.

So in this way, it's very likely that, given two tables A + B and 'bunch of operations' X on A and 'bunch of operations Y' on B where Y depends on the result of X, and given the tasks to;

- create X' = X

- create XY' = X + Y

to implement XY without knowing X already exists rather than figure out X exists and reuse it.

The problem with not k (or programs written in similar style; it doesn't really matter what the programming language is), that we have learned to use the second style from the article, and, more extreme, to separate everything out in layers. You cannot even reach the data model without going through a layer (or more) of abstractions which makes it necessary not only to know the datamodel in detail but also find the matching findXinAandApplyWithYToB(). Where X & Y & A & B are often some kind of ambiguous and badly named entities. And then there is of course badly designed databases which is also quite the norm as far as we see, so there is a much lower data integrity which means that if you create something without checking all the code that touches it, that you might change something and the data becomes inconsistent.

I notice the same when working on systems built with stored procedures on MSSQL/Postgres; it is far quicker to oversee and (at least basically) understand the datamodel (even with 1000+ tables, which is rather normal for systems we work with) than it is to understand even a fraction of a, let's say Go, codebase. So when asked to do do a task XY', you are usually just not searching for X'; you are simply reading the data used in X & Y and whop up a procedure/query/whatever yourself. It's simply much faster as you have a restricted work surface; the model and sql (I know, you can use almost any language in postgres, but let's not here) and you can reason about them and the tasks at hand when you shut off internet and just use your sql workbench.


Nice post. If i understand you correctly; you are saying that K is specialized enough (my knowledge is only cursory here) that you can directly work with the data model easily rather than going through multiple layers of abstractions and hence linear code is normal. In other languages it may not be so easy to do and multiple layers of abstractions only make things harder to comprehend. True, IMO Abstraction should always follow Understanding of the Problem space and not some arbitrary dogma. What i find infuriating nowadays is "cargo culting" where people blindly follow something because they read it somewhere/listened to somebody without thinking through the motivations involved and whether it is applicable to their current problem. In other words "they don't think" for themselves. Examples are "OO is bad"(it is not), Agile/Scrum processes will magically solve all your PM problems(hell, no!), Using the latest library/framework/API/fad will magically make your system better (no!), etc. etc.


> If i understand you correctly; you are saying that K is specialized enough (my knowledge is only cursory here) that you can directly work with the data model easily rather than going through multiple layers of abstractions and hence linear code is normal.

Yes, it is often just easier to write the linear code than figure out if you can reuse anything because the space is small. I think a good 'feeling' for this is, if you need internet search/package managers/copilot etc for something or if you can just write working code sitting on a desert island, quite possibly on paper. For instance, for C, asm (arm/68k/z80/8080 and older intel), k and some others I can write working code like that for non-toy applications in specific domains. And, at least for me, those languages lend themselves very well for this linear programming. Incidentally, but not related, this is for me also the most enjoyable way of programming; I kind of really hate using libraries. That's also because we work in heavy regulatory areas where you cannot just add them; you have to inspect them and sign of on them and, of course, most of them are terrible...


Nice; you have found a niche work domain for yourself which you seem to enjoy.

May we all be so lucky :-)

PS: You might want to consider adding your contact info. to your profile.


I have seen many instances where people just out of habbit factor out a lot of linear code that will never be reused into separate functions.

These pieces of code then often end up being private functions of a class. With state. Since they are private functions now, they are not really testable.

So now we got a lot of private functions that are only called once and typically modify side effect state. When these functions are grouped together with the caller, it is actually still a bit readable in simple cases.

But then after a while someone adds other functions in between the calling function and the factored out ones.

Now we have bits and pieces modifying different side effect state that no one knows if they are called from different places without getting a call graph or doing a search in the class file.

If you insist on making the code non-linear, I'd beg you to at least consider making these factored out private funcs inner funcs of the calling function if your language supports that. This makes it clear that these functions won't be called from anywhere else.

As with so many things in life, in a real codebase this is not an either/or, but an art of combining the two into something that stays readable and maintainable.


If the function was truly linear having a long function wouldn't be so bad. But it actually isn't, the example contains multiple branches!

Will people bother testing all of them? Or will they write a single test, pass in a pizza and just glance at it actually working? My guess is the latter, as testing multiple branches from outside is often tedious, vs testing smaller specialized functions.


> The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.

...that's basically why common sense and taste in programming is still required, it's not a purely mechanical task. That's also why I'm not entirely a fan of automatic code formatting tools, they don't understand the concept of nuance.


Everyone saying "linear code doesn't scale" actually has it backwards - it's concise functions with a deeply nested call stack that really becomes a nightmare in large codebases. It's never obvious where new code should be added, the difficulty of understanding what the effects of your changes will be increases exponentially since you have to trace all the possible ways code can get called, you end up with duplicated subroutines, etc etc.

99% of the time, you haven't actually come up with a good abstraction, so just write some linear code. Prefer copy/pasting to dubious function semantics.


Another risk is if you add print_table() then someone else is going to find it and use it in their code, but also add a little flag to adjust the output for their use case.

12 months later you have:

  print_table(
    rows,
    headers = None,
    is_unicode = False,
    left_align = False,
    align = [],
    remove_emoji = None,
    max_width = 80,
    potato_mode = 7,
    _debug_frontend = not FLAGS.dont_debug,
    ellipsis_for = 0,
    no_print = False,
  )


I think we all know at least some functions like this in a code base. All it takes is for a newcomer to come across a complex function that they need to update some logics for but also don't understand it enough to refactor, so they just added some parameters with default values and call it a day.

> no_print = False

love this


I just had to implement potato_mode in a report, what a rabbit hole that turned out to be.


To play devil's advocate - what's the issue with this?

Is print_table() + print_table_without_emoji() better than print_table(remove_emoji= False)?


The issue is that this approach is almost guaranteed to produce basically untestable code with a myriad of invalid/nonsensical/completely broken input combinations, and its impossible to refactor, too, because you don't even know which parts of the parameter space are actually ever needed, or even how they are supposed to interact.

Whenever function semantics need to change, everything degrades further because of refactoring uncertainties (=> you end up with even more parameters).

This will also be extremely resistant to optimization because even finding the "happy path" is non-trivial.


Honestly? Both reek of SRP violation. Why should print_table specifically care about emoji?

if needed remove the emoji, then print. if performance/table size is an issue, working via streams/generators/etc. should be on the (heh) table anyway.

But if you have conceded to being in quick&dir^H^Hpragmatic-land anyway, IMO both can be ok depending on the context.


As a fresh dev, I’d like to know the answer to this as well. Abstract to function w multiple params, abstract to multiple functions, no abstraction and keep as switch statement.

`print_table() + print_table_without_emoji()`

vs

`print_table(remove_emoji= False)`

vs

`switch table_name: case emoji: print(table) case no_emoji: print(table no emoji)`


If callers typically know statically whether they want emoji or not (in other words, if the parameter would typically be a literal true or false at the call site), then the parameterless version is better. (Note that you can still factor out common code from the two functions into a separate private function if you like. So the parameterless version doesn’t necessarily imply code duplication.) If, on the other hand, callers tend to be parameterized themselves on whether to use emojis or not, then the parameterized version is better, because it avoids having to make a case distinction at the call site.


It depends on the implementation, but in general you prefer the parameterless version because in theory a bug that shows up in print_table has less code it could be in (less surface area to debug).

Without understanding the implementation no one can truly say which is the better approach, but this idea of "surface area for bugs" is something that should be considered when approaching these types of decisions.


This example looks totally legit to me as long as it preserves backward compatibility and doesnt add unnecessary junk flags


That’s what tests are for. And if `print_table` is factored properly then they won’t want to add flags, they’ll make a new function out of the pieces of `print_table` that has distinct behavior of its own.


Well you're describing a readability problem. And you're essentially saying readability is what causes it not to scale.

If we consider the concepts orthogonally meaning we don't consider the fact that readability can influence scalability then "everyone" is fully correct. Linear code doesn't scale as well as modular code. The dichotomy is worth knowing and worth considering depending on the situation.

That being said I STILL disagree with you. Small functions do not cause readability issues if those functions are PURE. Meaning they don't touch state. That and you don't inject logic into your code, so explicitly minimize all dependency injection and passing functions to other functions.

Form a pipeline of pure functions passing only data to other functions then it all becomes readable and scalable. You'll much more rarely hit an issue where you have to rewrite your logic because of a design flaw. More often then not by composing pure functions your code becomes like legos. Every refactoring becomes more like re-configuring and recomposing existing primitives.


I disagree. It's not the purity of the functions, its having to know the details of them. The details, which could have existed here, are now in two other places. If you need to figure out how a value is calculated, and you use a half dozen functions to come to that value, you now have a half dozen places you need to jump to within the codebase.

Small functions increase the chances of you having to do this. Larger ones decrease it, but can cause other issues.

Also, many small functions doesn't make code modular. Having well defined, focused interfaces (I don't mean in the OO sense) for people to use makes it modular. Small functions don't necessarily harm it, but if you're not really good at organizing things they definitely can obscure it.


I find how easy it is to name something is a pretty good indicator. If I'm struggling to name a function then it probably needs some more attention.


I think you’re right about side effects being the missing ingredient to this discussion, that is leading people to talk past each other. The pattern’s sometimes called “imperative shell, functional core”.

And I totally agree, this is how you write large code bases without making them unmaintainable.

Where to go “linear” vs “modular” is an important design choice, but it’s secondary to the design choice of where to embed state-altering features in your program tree.

I think people dislike modular code because they want to have all the “side-effects” visible in one function. Perhaps they’ve only worked in code bases where people have made poor choices in that regard.

But if you can guarantee and document things like purity, idempotency, etc, you can blissfully ignore implementation details most of the time (i.e. until performance becomes an issue), which is definitionally what allows a codebase to scale.


Yeah few people have seen the light. But you're right. The only downside is performance. But this is rare and sparse.


The example code would be less distracting if it at least attempted to stick to the pizza metaphor in a meaningful way and weren't subpar Go code.

`prepare` is a horrible name for a function. I would expect a seasoned Gopher to call it something like `NewPizzaFromOrder`.

I don't see any reason for putting `addToppings` in its own function. If you have to have it, I personally would have made it a method on Pizza something like `func (p *Pizza) WithToppings(topping ...Topping) *Pizza { /* ... */ }`. Real pizza is mutable, so the method mutates the receiver.

Why is a new oven instantiated every time you want to bake a pizza? You should start with an oven you already have, then do `oven.Preheat()`, and then call call `oven.Bake(pizza)`. You can take this further by having `oven.Preheat()` return a newtype of Oven which exposes `.Bake()` so that you can't accidentally bake something without preheating the oven first. Maybe elsewhere `Baker` is an interface, and you have a `ToasterOven` implementation that does not require you to preheat before baking because it's just not as important.

Without changing the code, I'd also reorder the declarations to be more what you'd expect (so you don't have to jump up and down the page as you scan through functions that call each other).

IDK I have to leave now but there are just so, so many ways in which the code is already a deeply horrible example to even start picking apart the "which is more readable" debate.


John carmack said much the same and I have been following it ever since. Of course linear code is easier to read, if follows the order of execution. It minimizes eye saccades.

Some code needs to be non-linear for reuse. Then execution is a graph. If you code does not exploit code reuse from a graph structure, do not bother introducing vertexes where a single edge suffices.

http://number-none.com/blow/blog/programming/2014/09/26/carm...


Something Carmack calls out but the OP doesn't is that if you can break out logic with no side effects into its own function that's usually a good idea. I think the left side would have benefited from

  pizza.Toppings = get_pizza_toppings(order.kind)
in this case to keep the mutation of the pizza front and center in the main function here.


I actually sort of agree that linear code is more readable, but that’s not what makes good code practices alone. So while good linear code is more readable, at least in my opinion, it’s also a lot less maintainable and testable. I have a few decades of experience now, I even work a side gig as an external examiner for CS students, and the only real world good practices I’ve seen over the years is keeping functions small. I know, I know, I grade students on a lot of things I don’t believe in. I’m not particularly fond of abstraction, or even avoiding code-duplication at all costs and so on, but “as close to single purpose” functions as you can get, do that, and the future will thank you for it.

Because what is going to happen when the code in those examples run in production over a decade is that each segment is going to change. If you’re lucky the comments will be updated as that happens, but they more than likely won’t. The unit test will also get more and more clunky as changes happen because it’s big and unwieldy, and maybe someone is going to forget to alter the part of it that wasn’t obviously tied to a change. The code will probably also become a lot less readable as time goes by, not by intend or even incompetence but mostly due to time pressure or other human things. So yes, it’s more readable, and in the perfect world you probably wouldn’t need to separate your concerns, but we live in a very imperfect world and the smaller and less responsibility you give your functions the easier it’ll be to deal with that imperfection as time goes on.


Sure, it's less testable BUT in the specific case at hand it's all mutations that need to be performed in a specific sequence. IMO if you are taking an object through a specific set of states, you either split that and use types to mark the transitions (bakePizza takes a RawPizza and returns a BakedPizza, enforcing the order of calls at compile time) or you write one big function because it doesn't make sense to create a pizza and then not bake it before you box it.

I obviously prefer the former for readability, correctness, and testability etc. However, in most PL changing the type of an object involves creating a new object and has a runtime cost. For hot code path, it makes sense to mutate in place, but in that case it's better to keep it all in one linear function.


I recently started reading Sussman's Software Design for Flexibility and what you write is directly in line with that book https://mitpress.mit.edu/9780262045490/



Hard agree. And I used to belong to the other camp.

The basic tension here is between locality [0], on the one hand, and the desire to clearly show the high-level "table of contents" view on the other. Locality is more important for readable code. As the article notes, the TOC view can be made clear enough with section comments.

There is another, even more important, reason to prefer the linear code: It is much easier to navigate a codebase writ large when the "chunks" (functions / classes / whatever your language mandates) roughly correspond to business use-cases. Otherwise your search space gets too big, and you have to "reconstruct" the whole from the pieces yourself. The code's structure should do that for you.

If a bunch of "stuff" is all related to one thing (signup, or purchase, or whatever), let it be one thing in the code. It will be much easier to find and change things. Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization.

[0] https://htmx.org/essays/locality-of-behaviour/


I went the opposite direction: I used to be in the linear code camp, and now I'm in the "more functions" camp.

For me the biggest reason is state. The longer the function, the wider the scope of the local variables. Any code anywhere in the function can mutate any of the variables, and it's not immediately clear what the data flow is. More functions help scopes stay small, and data flow is more explicit.

A side benefit is that "more functions" helps keep indentation down.

At the same time, I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.


Some important context re: my style...

> Any code anywhere in the function can mutate any of the variables

Regardless of the language I'm using, I never mutate values. Counters in loops or some other hyper-local variables (for performance) might be the inconsequential exceptions to this rule.

> More functions help scopes stay small, and data flow is more explicit.

Just write your big function with local scope sections, if needed (another local exception to the rule above). Eg, in JS:

   let sectionReturnVal
   {
     // stuff that sets sectionReturnVal
   }
or even use IIFE to return the value and then you can use a const. "A function, you're cheating!" you might say, but my goal is not to avoid a particular language construct, but to maintain locality, and avoid unnecessary names and jumping around.

> A side benefit is that "more functions" helps keep indentation down.

This is important and I maintain it.

See "Align the happy path to the left" (https://medium.com/@matryer/line-of-sight-in-code-186dd7cdea...)

It is also worth noting that solving this problem with function extraction can often be a merely aesthetic improvement. That is, you will still need to keep hold the surrounding context (if not the state) in your head when reading the function to understand the whole picture, and the extraction makes that harder.

Using early returns correctly, by contrast, can actually alleviate working memory issues, since you can dismiss everything above as "handling validation and errors". That is, even though technically, no matter what you do, you are spidering down the branches of control flow, and therefore in some very specific context, the code organization can affect how much attention you need to pay to that context.

> I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.

Precisely, just take this thinking to its logical conclusion. You can (mostly) have your cake and eat it too.


The better solution to this is to use nested functions that are immediately called, rather than top level functions. That lets you cordon off chunks of state while still keeping a linear order of definition and execution. And you don't have to worry about inadvertently increasing your API maintenance burden because people started to depend on those top level functions later.


Ha, I started writing my reply before seeing yours, and suggested almost the same thing.


> Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization

What about for testing? What about for reducing state you need to keep in mind? What about releasing resources? What about understanding the impact of a change? Etc.

Consider an end of day process with 10 non-reusable steps that must run in order and each step is 100 lines. Each step uses similar data to the step before it so variables are similar but not the same. You would really choose a 1000 line single function?


> What about for testing?

For "use-case" code like this with many steps, you are typically testing how things wire together, and so will either be injecting mocks to unit test, in which case it is not a problem, or wanting to integration or e2e test, in which case it is also not a problem.

If complex, purely logical computation is part of the larger function, and you can pull that part out into a pure function which can be easily unit tested without mocks, that is indeed a valid factoring which I support, and an exception to the general rule.

> What about for reducing state you need to keep in mind?

Typically not a problem because if the function corresponds to a business use-case, you and everybody else is already thinking about it as "one thing".

> What about releasing resources?

Not a problem I have ever once run into with backend programming in garbage collected languages. Obviously if you are in a different situation, YMMV.

> Consider an end of day process with 30 non-reusable steps that must run in order and each step is 100 lines.

I would use my judgement and might break it down. Again, I have never encountered such a situation in many years of programming.

You seem to be trying to find the (ime) rare exceptions as if those disprove the general rule. But in practice the "explode your holistic function unnecessarily into 10 parts" is a much more common error than taking "don't break it down" too far.


  let DebugFlags = {StepOne=false, StepTwo=false, StepThree=true};
  
  if (DebugFlags.StepOne) { ... }
  if (DebugFlags.StepTwo) { ... }
  if (DebugFlags.StepThree) { ... }
Your training in structured, DRY and OOP will recoil at this: More branches! Impossible. But your spec says "must run in order". It does this by design. Every resource can be tracked by reading it top to bottom, and the only way in which you can miss it is through a loop, which you can also aim to minimize usage of. The spec also says "uses similar data to the step before it". If variables are similar-not-same, enclose them in curly braces so that you get some scope guarding. The debug flags contain the information needed to generate whatever test data is necessary. They can alternately be organized as enumerated state instead of booleans: {All, TestOne, TestTwo, TestThree}.

Long, bespoke linear sequences can be hairy, but the tools to deal with them are present in current production languages without atomizing the code into tiny functions. Occasionally you can find a useful pattern that does call for a new function, and do a "harvest" on the code and get its size down. But you have to be patient with it before you have a good sense of where a new parameterized function gets the right effect, and where inlining and flagging an existing one will do better.


They both read linearly. In the version with smaller functions taken out, there's a table of contents at the top of the page and it summarizes the dataflow between the steps. It seems like an appealing read order, assuming you're going to read the whole thing.

For it to stay this readable, though, you'd need to move the functions around if you change the order of the steps. And that's fine if they're private functions, called only from the table of contents. Only, nothing forces you to keep them in order, or even to think about how it reads overall.

It often happens that functions start being reused in a way that can't be linearized anymore. Sometimes people give up and sort them alphabetically, or it's just random.


In my experience the more familiar that someone is with the code, the more they think pushing code into smaller functions is the correct path. They have already built up a mental model of the code at hand, so the cleanest implementation to them is one with very few lines.

But the next person to come along has to bounce back and forth, performing mental stack push/pop operations to create the same mental model which is much harder to do when you don't have any of the original context


Not if the code makes sense. If the code is well written with elegant abstractions, slim interfaces and decent documentation, you often don't need to bounce around that much. For example how often do you read source code of your language's standard library? I almost never do, I mostly just look at method signatures and maybe read some docs if it's a bit complex or new.

The whole point of interfaces is that you're not supposed to care how a method is implemented, only what it does which is explained by a combination of context, naming and documentation. But a lot of devs don't understand(or care about) this, so they write code that doesn't make sense and then it doesn't matter whether they made it linear or modular. They do things like make a service class where you have to call one method to get some data and then you have to call another to get some other data and then you have to call a third method to get some data that needs to be consolidated with the other two and now what the hell is the point of your service? It exposes all the internal complexity to the outside.

You aren't supposed to force small methods, there's no point having 20 ~5-line functions that are all only called once and do super specific stuff and have to be called in the right order etc. That's not clean code, that's more like cargo cult programming. You are supposed to abstract things appropriately so that they make sense both to new and seasoned team members, are easy to reason about and hide complexity in places where the complexity makes sense.

This is not easy to do but it is possible.


>The whole point of interfaces is that you're not supposed to care how a method is implemented, only what it does which is explained by a combination of context, naming and documentation.

There are, however, cases where code is a better explanation of "what it does" than naming and documentation. Both naming and documentation are hard and can become out-of-sync. Code is less ambiguous than natural language.


Sure, I'm not saying nobody should ever read code. I'm just saying we should aspire to write code which does not need to be read to be understood at a high level. If you really need the low-level details they're there, I'm just saying I don't want to have to think about them if I can avoid it.

As an example I've never read the source code of any language's String implementation, but I've used them in many different languages.

I also don't like the "they can become out of sync" reasoning. That's like saying speed limits are pointless because people break them. If you change the code you update the name and comment. That's your job. I'm not saying you should document every class in your system like it's the Java standard library. A standard library doesn't change that much and its documentation is viewed by millions of devs so it makes sense to spend a lot of time documenting it.

That's the gold standard, it would be great if our code could be like that but it would be impractical given the frequency of change in most active development systems. So we find a middle ground, we focus our effort on the interfaces between subsystems. You section your codebase into subsystems so that the application's core can interact with the database without worrying about database details, or get some data from an API without worrying about whatever weird quirks the API has. You construct a subsystem around the API which handles all the API details so that your core can interact with the API without having to worry about auth or weird API quirks or the fact that the API entities have 50 properties and you only need 7 of them. You hide away all that stuff in a subsystem and then you design a nice and clean interface that the application core interacts with. If there are any implementation details that the consumer of the interface needs to know to use it, you document it.

Just try your best to make your subsystems usable without having to deep dive into them for implementation details every 5 minutes.


> The whole point of interfaces is that you're not supposed to care how a method is implemented

That's exactly how you end up with O(N^4) code. Your job is to care.


Yeah sure let's reduce my advice to a hyperbolic niche situation.

This has nothing to do with performance. I'm explaining general rules for designing maintainable systems - it is possible to follow them and write performant code at the same time. It is also possible to break them if necessary. Though it usually isn't a problem at all, you're just reducing it to absurdity in order to make your point.


BS. You wouldn't even get past your first hello world if you read all the code that underlies a one liner hello world.


So you read the machine code that your code spits out? how interesting...


That's not quite enough. The kernel and associated device drivers are also "interfaces" that a program binary invokes, so all the kernel code paths triggered by the program calling into the kernel should be inspected too.

That said pretty sure the GP hasn't had a deep dive into whether say the C library or Linux kernel has some funny O(n^4) stuff happening.


I disagree, but maybe there's a difference between people who read/think bottom-up and those who think top-down.

My son struggled in school despite easily being smart enough for it, and one of the many people we spoke to about his needs explained to him that schools tend to teach bottom-up, whereas he was very much a top-down learner. He first needs an overview before he dives into the details, whereas others first need to grasp the details before they can assemble an overview. And schools tend to teach to the second group.

It's possible we've got something similar with programmers here.


> But the next person to come along has to bounce back and forth

Only if they can't read code! Code is meant to be read as written, at least the first time; if you try to read it as executed, you are in the wrong.

If the previous developer wrote a function BakePizza, just assume that the pizza will be properly backed and move to the next line. If you start dwelling in details like oven temperature while trying to understand how to run the restaurant, you will not understand how the restaurant works, and you will forget the correct oven temperature.


This is why we need better tools like projectional code editors.

There should be an editor toggle to inline functions temporarily.

No more bouncing.


This blogger obviously wouldn’t get along with Sean Parent of Adobe. It’s old but I have everyone on my team watch this “no raw loops” presentation: https://youtu.be/W2tWOdzgXHA?si=4LKv1-sau60U63op in which he identifies reusable patterns hiding in code (“That’s a rotate!”) I myself was skeptical at first but have found over the years that breaking functions into pieces is the only way to maintain short functions that can be reasoned about in isolation, and as a side effect, surfaces reusable code. If you can’t write functions that easily fit on a page, I posit you don’t actually know what the function is supposed to be doing, and there’s probably a bug. (If you can’t hold the whole function in your head, how can you be sure there isn’t a bug?)


The code that is more easily unit testable, is the code I care about.

Neither example is easily tested.

Neither support injecting the dependencies, which make mocking really difficult.

On the left, you're testing one big method with a whole bunch of conditionals, which leaves you with a whole ton of tests for that one big method.

On the right, there is a bake() method and it does oven.New(), but where does oven come from? Is it some global somewhere?


Here is an idea for a unit test for this code.

Pass in an order, assert the pizza that comes out is correct.

The entire function is a unit which can fit on my phone screen and has no external dependencies other than possibly oven, which was discussed in the article, it should probably have been passed in, aka dependency injection.


Since the example was golang, I personally love uberFX to define modules and dependencies between modules. When you do it that way, unit tests become really easy.

It isn't necessary with golang to do this at all, but it really helps build consistent structure throughout the entire app, so I do it.

Speaking from personal experience. I built a small golang process that ran on around 25k worker machines. It had to be bug free cause if it crashed and stopped running, it meant updating a whole lot of computers across multiple data centers, by hand.

We unit tested everything and the project worked out really well because of that.


Both functions take an order and return a pizza. This seem perfect for testing.

Why would you care where the oven comes from, if the function deliver perfectly baked pizza? A unit test should test against the public interface and not be coupled to implementation details, since that will hamper refactorings.

> which make mocking really difficult.

Mocking is an antipattern anyway, and should be avoided except for nondeterministic components like current time or stateful external services.


Apologies, but such mindset is the essence of the worst programming traits.

> The code that is more easily unit testable, is the code I care about.

Author argues that his code is more readable. Sounds like you're saying that being unit-testable is more important than being readable.

> Neither example is easily tested.

Only if you're a unit testing zealot. Integration/E2E testing is easy for both.

> Neither support injecting the dependencies, which make mocking really difficult.

Mocking is not a virtue. Also, if mocking is the sole reason you're using DI, you're doing it wrong.


Arguing for unit testing, is the worst?

Come on, look in the mirror.


unit test is overrated because most of the problems can be solved via correct by construction methods. Like, do you really need to check if this "kind" variable is equal to "Veg"? This could have easily been solved by using Enum. Similarly, global or not can be solved by using classes/structs that don't have any constructors or something like that.

Functions should exist at the level of concepts:

1. arr | flat | map | collect as HashMap makes sense.

2. CreateFlattenMappedHashMapFromArr does not.


Agree. Furthermore if you are going to split it, you want to make sure no one boxes a raw pizza because they added a bad switch statement or something, as that would be clearly bad. So if you want a split version than you should use types to mark the transitions between states.

boxPizza should take a CookedPizza, and BakePizza should take a RawPizza and return a CookedPizza etc.


Heck no.

If you do some "real world pizza making" instead of toying, that function would be like at least 1k lines, including how you carefully shape the dough, how to handle exceptions when you tear some holes, and how you should observe and rotate in the oven by how much, how you should redo it if the roller blade just didn't cut through properly, so on and so forth. Of course it's better to have top-down overview like prepare -> bake -> box otherwise the readers will surely lose themselves in details without figuring out what is happening.

People in the game industry told me their horror story of helping designers with a Lua script that they were writing over the years. And it turned out the "Lua script" was a single file, with 100k+ lines, that bearly had several functions in it. That would be SO linear.


This "level of abstraction" euphemism actually means "the level at which I'm not reading code anymore even and especially if I should". Of course, linear code is more readable! Linear everything is more readable. Have you ever seen a novel with "levels of abstraction" in it?

But nobody reads the code anymore. Why bother? You're not going to stay on a single project for long enough for the attention investment to pay off. So the common best practice at the moment is to pretend that you read the code without actually reading it. For this purpose, the green code is much much better.


> Linear everything is more readable. Have you ever seen a novel with "levels of abstraction" in it?

Not if you're in the business of writing novels. What happens if you decide to edit out a scene - do you re-read the entire book to double-check that the deleted scene wasn't referenced anywhere?


I realize this is tongue in cheek, but really: Read code!

If you ever start plateauing in your code skills, start digging into the code of your favorite open source project. Accept that things have been done in another way than you would for a reason and try to understand that reason.

Try joining advent of code[0], and make sure to spend half you time block on reading and understanding alternative solutions.

[0]: https://adventofcode.com/


Mixing different levels of abstraction makes the code harder to understand. Linear code is probably good because the examples in the body are simple. It's one thing to separate code into separate files, but it's another to break up code snippets in one file.


I agree. I really hate having to jump all over a file (or multiple files!) for something that could fit into a single page of linear code.


Someone smart said, "When you've lost something, and finally find it, don't put it there again. Instead put it the first place you looked."

I think that applies to code. When I read something I wrote, if I'm annoyed at how it reads, I try to refactor it to be what I wanted to read, and remember to do it that way in the future.

But sometimes what the reader wants is too much work for the writer, so I don't push that effort beyond what it's worth.


There’s a whole style of coding dedicated to that very notion. It’s called test driven development.


I don’t agree that’s what TDD does. You spent inordinate amounts thinking about how you should want it to be, when you could just write it, find where and what about it you dislike, write it again and have actual good code. Also called WET. You spend less time with better results that way and you gain what OP was talking about in the process.


I agree too. Another example: I find early returns in functions easier to read than “else” with one “return” at the end. Basically vertically linear code as opposed to unnecessary branches and too much indentation, keeping the code slimmer is healthier!


It's also a naming issue. A good name means I don't have to jump.


The linear version is hard to test. The split-function one is much more testable. There is also that thing called complexity, which increase with function length and has been proved to correlate with bugs count.

The problem with the example is that it is both extremely artificial and shows a single use case. Even with the artificiality, one can easily imagine baking a calzone instead, which could reuse all the factored-out oven functions in the split version.

(The comment about pizza vs baked pizza is one about using typing to encode your logic, but is separate from the issue that your functions should do one thing.)


Perhaps we can try to do it in a proper functional language?

  (ns restaurant.pizza
    (:require [restaurant.oven :as oven]
              [restaurant.package :as pack]))

  (defn make-order [size sauce cheese kind]
    {:size size
     :sauce sauce
     :cheese cheese
     :kind kind})

  (def toppings-map
    {"Veg" "Veg toppings"
     "Meat" "Meat toppings"})

  (defn prepare [order]
    (assoc order :toppings (:kind order)))

  (defn bake [prepared-order]
    (oven/bake prepared-order :pizza))

  (defn box [baked-pizza]
    (pack/box baked-pizza :pizza))

  (defn pizza [order]
    (-> order
        prepare
        bake
        box))

  (comment 
    (def order (make-order 26 "Tomato" "Mozzarella" "Meat"))
    (pizza order))


It's short and overwhelmingly granular, but for the sake of illustration. Large and complex codebases sliced up this way has not alternative in terms of ease of testing and reasoning about the code.


In your experience, how do LISPs compare to ML family languages in typing and readability?


I don't necessarily have opinions on ML languages. Haven't used any of it to the extent of writing more or less complex production system.


I find Linear B more readable than Linear A, but I agree with the OP, if there were additional explanatory comments in Linear A code, then it would be probably more readable than Linear B.


This is not a constructive comment. I just want to say that I appreciate this joke, it's pretty funny.



When you grow older, and become lazier, you only create function/methods when they need to be called more than once. Some languages, like C# and JavaScript, also allow you to define them local (inside a method). When these are used to perform some checks, I usually just place them before they are used, and when the perform some operation, I usually place them just below where they are called. The latter usually involves async of parallel execution. I just realize that this helps to keep the code more linear. So, I think I have a strong preference for linear code.


this is anecdotal of course, but as someone who has never written a line of production Go code (but can tell at a visual glance this is in fact, Go), small functions (green) made sense to me as soon as I started reading it. The single function code (red) became hard to follow at some point. It felt like the function was doing 10 different things with a lot of branching and no particular single purpose. Maybe it's the Python background in me, but I am not seeing how the single function is better to read than small, self-contained functions.


It's not a Go thing. I've inherited a number of large linear functions of the author's favored style in several C-syntax family languages, they all become increasingly incomprehensible as they grow longer and older. For any advocate of that style, the only way to maintain them (and retain their supposed clarity) is to extract functions and then re-inline those functions after a comprehensive refactoring. Otherwise, they accrue so much cruft over time that their legibility is completely lost.

Your only other option is to freeze them and never make changes, that doesn't happen much in real-world code (though it probably should).


Literally extracting the functions and then re-inlining them makes no sense. Having that as a sort of mental model while you're working on the code does make sense.


It’s to enable refactoring when it grows large. Most effective way I have found for 1k SLOC or larger functions. I usually don’t re-inline because the result after refactoring is almost always clearer.

Trying to in-place refactor those things is an exercise in frustration. That’s part of why they grow so large, from observing their proponents in action. They don’t actually know what the functions do, only where to add a new path and repeat themselves.


I don't think the author's point is truly contrarian. The author is a proponent of clarity in the code, and of the readability.

The deal here is that both versions are fairly readable, written by someone with intent to make it clear what the code should be doing. As a result the two versions are just examples of two expression styles, while the focus is on showing how the transition between these styles could be done.

What's worth underscoring here is the cohesiveness of stretches of code, such that their execution could be summarized by a descriptive function name.

Often in grand god-functions the contexts are so intertwined and mixed that it is hard to see cohesiveness in stretches of code.

Thus, the refactoring is very much a tool to creating such cohesiveness and proper logical sequencing.

Scooping out the code into separate function or commenting it out is more of a style judgement. Though putting the code into a function with a descriptive name indeed enforces this sort of analysis.


The differences are magnified when arbitrary test coverage metrics are imposed. The first example is easier to write tests for because all the conditionals are easier to scan. The second, green, example requires the test writer to follow multiple stack frames to write the test unless there’s some mocking and spying framework at hand.


I have been working on a system for programming some specialty hardware on customer premises for a while, and most of it was written in a pseudo-language implemented by another backend programmer. Think BASIC-like implements in a YAML file, with arbitrary python inserts here and there.

Despite the code being not very visually attractive (long corridors of imperative statements reading and writing from SMBus addresses), I was always surprised how easy it was to maintain the code, and how quickly I could get back "in the zone" after not working on it for months.

There is something painfully trivial about old clunky languages that makes them somewhat easier to get back into. The cost in abstraction capabilities is obvious though. The only reason I can afford to write concise, linear, imperative code for this project is its narrow, specialized scope that most of modern programs cannot afford to limit themselves to anymore.


I think the fundamental problem is that, despite our wishes, there are programs which are inherently complex, and cannot be refactored into a simple, by-pieces testable, form. And if we try to do that anyway, all we end up with is just more fluff (mocking, I am mocking you) that hides the complexity.

The internal complexity doesn't necessarily come from complex abstractions. Take for example some implementation of a tax code, i.e. code calculating taxes. There is probably gonna be a lot of interdependencies, dealing with special cases. That's your typical "business logic". This code is not inherently complex because the primitives are complex, but because there is a lot of dependencies in the calculation. That fact in itself makes it difficult to unit test.

On the other end of the spectrum, we have something like a library of functions, for example, mathematical functions. The inner workings of how to calculate, say, a gamma function, can be very complex to understand, but the surface (API) of each of the function is very small, and that makes the library itself simple and easy to unit test.

We can make an analogy with books instead of programs. On one end, you have a novel, which despite being written in a plain language, has many interdependencies of the characters interacting. You cannot "unit test" a novel by reading a single chapter, you have to read it all. You can have a summary of the novel (like the top function in exhibit B in the OP's example), but the summary of the novel is not exactly the novel, you're not really testing the novel if you read just the summary.

On the other end, there are reference works like dictionary or encyclopedia. We can unit test these easily, since each entry should stand on its own (if you want to evaluate quality of a reference work, you can pick a few entries and test that, and it's gonna be pretty representative). They are not emergently complex like a novel is, despite the fact that entries might use specialized jargon and be harder to read.


> That fact in itself makes it difficult to unit test.

Verifying a tax code implementation is a good place to make use of property based testing.


I agree, and that's why I favor it to unit testing (although to be fair, they are pretty complementary, because each addresses different end of the spectrum). To properly unit test, you need to have a different implementation, which you can compare with, you cannot IMHO unit test under the same assumptions that the code makes.


I too find the code on the left/red (linear) more readable. However the version with all the functions is quite extreme. When I'm splitting my code into functions I decide if something should be it's own function on the basis of: is this chunk of functionality required to be reusable? Am I repeating code, only slightly changed?

If the answer is yes, a function gets created. I never do what I assume authors did here, find the smallest logical units code can split into and generate a bajillion functions. I'm not paid by the line of code after all.

The same reason makes me like object programming (especially inheritance, abstract functions, operator overloading). IMO with a good IDE such code is much more succinct(within the constraints of the language) and more readable, but taking it to extreme is a mistake.


I have to agree that the code on the left is far more readable (one function). I've worked with developers that have written code on the right (lots of functions), and it's always the worst to iterate on.

The problem in the second approach is the functions aren't clean abstractions, they often hide logic&state transformations that only make sense in the calling context. So the dear reader is forced to jump back and forth between the multiple functions to understand the entire process.

And just to throw a bit of shade, I encountered this type of programming more in webdev, and especially devops communities-- than with data scientists, ml, or data engineers. ;) And also when the director of eng wanted to get their feet wet every now and then.


I still prefer the one the right. I'm able to skip entire sections of code, and assume what the function does. Only if I require details do I go deeper.

The comments are metadata, and where function names are tied into the code. One is going to stay up to date. The other isn't.


I feel like there a happy medium between the two, the left can easily be made more simple by factoring out one or two functions however the right went too far.

The prepare and addtoppings functions should be one function, prepare effectively just fills in a struct and calls add toppings, its pointless to seperare them.

The Bake function simply prepares the over for cooking, which the author mentioned should be a dependency with a method and then factors 4 lines of code into a new function for no reason. The bake and bake pizza function should be one function.

You can then keep the box function as is.

That would be both easier to maintain and easier to read.


> You can then keep the box function as is.

The box function is broken too. You box the pizza and then return the pizza...but the box is logically a wrapper for the pizza. `box(pizza)` should return a boxed pizza. A box with contents=[pizza]. Maybe some sauce and pepperoncini in there too.

Plus all these functions are impure. Which isn't always bad but if you can prevent things like boxing it before baking it, you should.

And what even... this entire example is just horrendous. You box the pizza and then slice the pizza? Ready = box.Close()? Can the Close() operation fail? And then the pizza is not ready? Why not throw an error, now the caller has to check if the pizza that got returned to them is even ready...? And that fact is even more hidden on the right side. Same for Sliced and Boxed.


This entire function is clearly a factory for a boxed pizza with toppings which is baked.

I'd argue the entire box.Close() method is slideware and wouldn't exist since it likely is just a return true. You can just as easily just say pizza.Ready = true. Reading this code afterwards I would think there was some stupid requirement somewhere for a pizza.Ready property so someone added it and would check a commit log to see if it can just be removed.

Decent catch there though, the box can also be a dependency that get's passed in.


I know it’s (usually, mostly) implied, but one of my dearest wishes for programming discourse is for people to say that something is more/less readable for them rather than declaring it a universal.


This post presents why object oriented programming is harder than it looks.

“I’m gonna return a pizza because I want a pizza

When of course, what one really wants is a pizza in a box. And the oven objection is also kind of funny. It leads to a “but computers are so fast, why can’t they build me a new oven for each pizza?”

People think they want real-world analogies, which they hope will make code easier to reuse and maintain when what they really want are deep modules with clean interfaces, for which object orientation is not necessary in the least.


> When of course, what one really wants is a pizza in a box

That's what Boxed<Pizza> is for, but this is more costly than a Pizza directly.


I agree that placing sequentially executed code in order of execution often improves readability over abstracted code (especially dynamic dispatch and static/dynamic traits). A similar article is at http://number-none.com/blow/john_carmack_on_inlined_code.htm..., but linear code has its own failure modes, if code is not factored into blocks with identifiable functionality and constrained/documented side effects (for example 500-line functions twiddling hardware registers and reading/writing global variables). Carmack later wrote an article in support of small-f functional programming and avoiding side effects and global state when practical (https://web.archive.org/web/20190123060017/http://gamasutra...., the article lost all line breaks during migration to gamedeveloper.com)

Another article that touches on this idea (among others) is https://loup-vaillant.fr/articles/source-of-readability which advocates that "code that is read together should be written together" (reading it made me confused until I realized it meant "placed together"), specifically "Consider inlining functions that are used only once".


How about this: most code has hard chunks or even sections that can be nearly impossible to figure out without a significant time investment. We can skip the intermediate steps and just move straight to a document that explains the architecture so that we may stop trying to jump through hoops to avoid writing non-comment documentation.

The amount of places I’ve worked at that don’t even have accessible DB schemas is mind-boggling.


Wrong thing is discussed in this post.

Code on the right isn't good because it's non-linear. It's good because it outlains business processe clearly, making it easy to get a grasp on it, if you never baked pizza before and aren't an author of the original piece.

It's possible to write non-linear code using for eg unnecessary events or to much levels of abstraction and have the same issues for completely opposite reason.


If you never have to write any tests, perhaps this is ok.


What do you mean never write any tests? The api should be the same. Order goes in, pizza comes out. The rest is implementation details that should not be exposed to a test.


you're right, the api for ordering a pizza will probably stay the same.

the cooking process won't though. stuffed crust? add some stuff in the middle. square? add some stuff in the middle. deep dish? add some stuff in the middle.

iterate a while and your "one golden test" is what falls down.


YAGNI

Refactor when those things are needed, right now the cooking process is stick it in a warm over for x minutes.

What are you testing there?

The oven was preheated? Put in an assert, that doesn't need a test.

That it stayed in for x minutes? You assuming the builtin sleep function is broken? Don't test library code, that's not your job.

That the oven actually preheated correctly, that was discussed in the article, the oven and it's preheat method should be a dependency that gets passed in, again not needed to be tested here.

Also in your example you are testing whether an if condition was evaluated as true.

Give me an example of a stuffed crust pizza cooking process that has a unit test which cannot be checked by looking at the resulting pizza.


Those items are all testable through the createPizza method. There should be lots of tests! You've made up the one golden test scenario as a strawman. Every scenario you listed changes the expected output(the pizza). If you are testing internal methods, your tests are going to tell you you have broken, even if the pizzas created are 100% correct. So people won't clean the code, because the tests break, and they don't know if they are actually broken.


every single comment extrapolating a trival example is a strawman


> The rest is implementation details that should not be exposed to a test.

But it's always implementation details all the way down!

If `prepare` is not worth testing, why would `createPizza` be worth testing? `createPizza` is someone else's implementation detail.


The goal of this unit of code is to create a pizza from an order. There are no observable side affects. To verify the correctness of this unit of code, the only requirement is to map all inputs to outputs. Any assertions other than that mapping are not requirements to test correctness.


Yes, and I can create an even larger 'unit' of code around it, declare that it has 'no observable side effects', and claim that `createPizza` is just an implementation detail of that, and therefore needs no testing.


Maybe? We don't know about the other code. Most code I work with would be sticking this output in a db or email or something, so any larger piece can't be isolated, so is no longer a single unit. It's about isolation from side affects, not how much code you can stick behind a single method call.


The core lesson here is that "readability" is personal, and any attempt to reason about "more" or "less" that doesn't translate that into more specific, measurable outcomes (like time to find a bug, or time to add a new feature) is a very good way to nerd-snipe a large number of people into creating a lot of hot air.


Linear code can increase the state that you need to hold in your head while reading through it. You typically can’t just start reading in the middle and understand what is going on, because you have to trace the evolution of the state up to that point.

Breaking the code up into smaller functions can reduce what you need to keep in your head, if the functions can be understood standalone just by their parameters, and if any side effects they may have on their parameters (in case of mutable objects) are straightforward enough to understand from their naming and/or comments (rather than from their implementation).

One purpose of functions is to separate interface from implementation. If for some part of the code an interface is easier to understand than the implementation, then that’s a clear case for making it a separate function.

The points of separation should therefore be the points where the least context is needed to understand the seperated-out operation.


Personally, I find linear code more readable when I have context. The pizza example reads better linearly because is easy to figure out the context. But when I have no context (I enter a new code base) linear code is harder for me to reason about because at that point I’m just trying to understand how all the pieces fit together. At this point having everything stuffed in one place makes figuring out the higher level picture pretty difficult. I recently had to update a 1000+ lines of code function with very specific business rules that I had no context of. I’m sure for the developer writing it at that time it was easier to put everything in one big function, but it was pretty hard to figure out everything that was going on in there. I had to refactor it into a few smaller functions in order for me and the team to figure out what was actually going on and how we could fit the new business requirements into it.


There is absolutely no doubt in my mind that that right variant is significantly better. I prefer it because of the smaller lexical scopes, because it's easier to test and most importantly to me it's easier to extend and to understand what the workflow intends to do. If I had to maintain code like this, I imagine in my day to day, I'll likely only have to extend it by only touching the `addToppings` function, the rest can stay the same. If I have someone new joining my team I can easily guide them to this `addToppings` function and ask them to add support for pineapple and ham, nobody needs to be overwhelmed by the entire system and their task would be done in no time. I do acknowledge the question is about readability but I don't think it's possible to ignore testability, manageability and extensibility. I think the inlined approach simply does not strike a good balance of the aforementioned.


As someone who prefers the underlying of the non-linear right side example, it's written terribly code (which is puzzling since it was supposed to be the halo case)

The prepare function is the main issue: it creates the pizza and adds toppings. If the pizza had been constructed at the top of createPizza, then `addToppings` `bake` and `box` were called, it'd be strictly clearer than it is now.

Now obviously this is all from a contrived example, but I think the underlying lesson is: bad linear code is less tedious to deal with than bad non-linear code. With bad overly long linear code, at least the whole mess is in front of you. With bad non-linear code stuff is hiding stage-left, there's side effects that names are hiding, you're at the mercy of tooling for navigation, etc.

Maybe if you know what you're making is doomed to be bad code (think convoluted business logic driven by the real world), maybe prefer linear?


Much prefer the right-side version. It's still "linear" at the top level and cognitive load is greatly reduced into small, single-responsibility, bite-sized functions. Plus—and sure, this might be a premature optimization—but future devs will thank you when it comes time to implement the "bake a calzone" feature.


Thanks for the post.

I do not enjoy navigating and bouncing through 100s of files to work out how something works. Where algorithms are obfuscated and there's indirection everywhere.

I enjoy reading dense code where everything is clearly linear because I do not need to context switch.

But when you need to change something, you probably prefer the many function approach.


For many, many years, I've adopted the OP's choice of style, using what I call "FIRST/NEXT" comments to divide the function into paragraphs:

    // FIRST, Create the pizza object
    ...
    // NEXT, Add the toppings
    ...
    // NEXT, Heat the oven
    ...
By all means, move a "paragraph" into its own function if it's called more than once; but otherwise this provides a number of useful features:

* The FIRST/NEXT comments serve as useful headers, making it possible to navigate the function without reading the code in detail.

* I know that no one's going to call one of the blocks from outside.

* I can see at a glance what chunks of code go together.

I've often gone back and read code I wrote five, ten, twenty, thirty years ago using this method, and found it perfectly readable.


It really all boils down to cognitive load:

Can the average dev keep all of the variable states & side effects of the function in your head as they read through it? Great! Linear may be a good fit.

Or does one need to jump up and down in the function to /really/ understand it? Probably time to consider abstracting it.


The right side version has extracted functions from an arguably worrisome implementation on the left, then someone inlined it and left comments to explain the purpose of ?some lines? of upcoming code. Author wants to optimize readability, Carmack and others want to reduce complexity by eliminating local optima introduced by abstractions. Other people want to make a fashion style out of it. I’m thinking: how does oven work? Does it mutate the parameter, is it heating up at a constant pace? If oven mutates pizza why not the Box methods? Also if inline person likes inline why they don’t inline Box and Oven. Because they are called from some other places? Why not inline those as well? So many questions to ask. I’m not sure this is a clear win for either styles. Maybe we should ask chatgpt :)


"The right side version has extracted functions from an arguably worrisome implementation on the left"

Not amount of linearity and abstractions, or silly comments, can make bad code readable. I see this stuff at work all the time. To my team's defense I deal with a lot of chemists and physicists who like to write their own algorithms.


Whats not shown is the 10 other functions calling createPizza and bakePizza that can be tested by mocking that routine centrally.

In the basic case, the linear version is better until the code is duplicated. Adding constants and function aliases before the code has duplicated is generally a bad idea.


The straw man has been shot with silver bullets. Can we also linearise calls like box.PutIn(pizza)? What if it's a complex external API call that takes the pizza serialised to ProtoBuf and needs credentials that you'll retrieve from a configuration provider?


Sure, you can put A, B, C, D side by side. But next time if you need to find D, you have to navigate A > B > C > D, with no other recourse. Often, you don't care about A, B, or C. Only D. And the benefit of A, B, C, and D being close together becomes immaterial.

In real systems where things are spread, A, B, C, and D can be very far apart indeed. And it's totally fine! What matters is that from the starting position,let's say X, I can 'navigate' to A, B, C, or D, in an equal and speedy manner.

Plus human brains love to navigate things in a 'spatial' way like this. It's natural. Really when you think about it, the perceived loss here is not that big compared to the benefits.


I noticed that people have already contributed great insights on readability and testability aspects which were my first thoughts as well on reading this blog.

However, I do believe that there is no one right answer to this argument. And the right answer is with that team who in the end have to read, write and maintain that code. The metrics that I collect with my co-workers who work on same code base as me are

  * What is the cognitive load to grasp the code for members in the team?  
  * How easy is it to onboard a new member to this team?
  * Are we able to move fast and have confidence in the code changes we make?
In my opinion, metrics like these are usually the ones most of us care about in the end.


so that team knows all future hire? I work on code bases that were written 15 years ago by a plethora of people in different imperative styles. where business requirements changed a lot over that 15 year period.

There's a lot of spaghetti code.

we found There's a strong correlation between method/function size and bugs.

There's not a lot of confidence because there's implicit mutable state and side effects all over the place.


It is also super powerful technique when using the closure of a function as a way to encapsulate logic and state. A good example is this implementation of a json parser in js[1]. Attempt at lifting the lexer functions or state out of the function would result in every function needing to be wrapped with a factory function. Parser have always been tricky and before I knew this technique I would have reached for a parser combinator/generator but this is a very sensible way of doing it.

[1] https://lihautan.com/json-parser-with-javascript/


I wonder if there is some programming language that supports combining both styles:

- A linear control flow - Named Blocks with explicit, named, typed parameters and return values

I understand that one can use anonymous functions, immediately called to simulate this style.


If I understood you correctly, the ML family seems to come closest, especially Elm and F#'s use of |> syntax.


I am currently working in some "best practice" (according to its author) code with hardly an if statement.

And after half a year of halving to always step into a method (or out of it) to continue reading or debugging, this resonates with me very much.


This paper[1] suggests that there is a 50/50 split amongst programmers in the way in which they trace programs:

> Given a straight-line program, we find half of our participants traced a program from the top-down line-by-line (linearly), and the other half start at the bottom and trace upward based on data dependencies (on-demand)

So it's possible that both viewpoints are correct in some sense and we should pursue languages which allow us to switch between the two viewpoints.

[1] https://arxiv.org/abs/2101.06305


Interesting read. It’s amazing more people don’t use runtime variable value annotation tools like Wallaby.js, or a debugger.

So much time spent mentally remembering what is in what variable based on the naming.

I often find myself adding “// e.g. foo, bar” to show example cases for some lines of code…like recedes for example. Wallaby.js is a godsend for this though.


The real issue is plain text, and files and folders.

File names, folder names/hierarchies, function names, class names are all _arbitrary_. You could randomize them all and your code would still run.

What is not arbitrary is: the call graph, and the data flow/dependecy graph.

Every line/block of code could be wrapped in a function.

And classes...your class methods are just functions with an implicit parameter of an object of a certain type...and practically, not the entire object, just the parts it that it actually uses in the function body.

So if you just focus on what your functions do, the boundaries and groupings of your code will become self-evident.


To add to this: the code you are writing and how you access it can be distinct. A static function with all parameters available is accessible but hard to use; adding a builder layer in-front makes it usable without changing the logic.

CLI, RPC, Rest, are all different methods of interfacing with the underlying core function. I too-often see folks make a "microservice rpc server" rather than "function exposed via rpc microservice".


This was always my interpretation of "Flat is better than nested." from "The Zen of Python".

I often run into conflict with developers who believe in the single return statement. This is flatter but irks a lot of devs:

if (!condition) {

  return
}

more code

return


Two points

* functional style makes it easier to split stuff since you mostly transform immutable data

In the Article IO, State (side-effects) and Data transformation is mixed in both versions. That leads to unnecessary complexity. In that case worse if it hides in sub-functions. But separate it and the right version is better. (you can answer the question about idempotency easily now)

* Comments over blocks of code are harder to keep in sync with the code below, they require more discipline from all team members

So in theory they are nice, but you will never have them unless you enforce them through proper code reviews.


I agree with the Functional style part, but hardly disagree with your second point.

Because of the fact that, keeping comments over blocks of code in sync with the code below requires the same exact amount of effort (arguably less) as keeping function names (and ideally, documentation) in sync with the code inside..

Except that, if the second one is not done, that is a lot more dangerous than the first- exactly because, every time you define a function, you are declaring an abstraction, and if the abstraction changes silently, that's where the real mess begins.

That's what ends up happening in my experience. (See also sibling other comments about the proliferation of flags in a function signature)

(Needless to say, I strongly resonated with the OP as I also love linear code with comments, but in the end it's also a matter of taste..)


_sigh_. OP got really hung up on the example, and it ruined the article.

There are certainly cases where linear makes sense. But beyond the length of your ticker tape memory is about it, and since that varies quite a lot from person to person, I like it best if I can choose the level of abstraction with which to read code. This has never been a problem.

Where indirection becomes a problem is inheritance and black box operations (as you might find in rails-y) frameworks. Django’s block model of extension for templates is so devious it should be considered felonious.


I can't believe that 55 years after Go To Statement Considered Harmful we are still operating at medieval level of "I recently found this piece of code and now I have opinions to share".

If coding for an employer, it's their business. But for use in collaborative public projects, I want a linter with experimentally measured effect on readability. Not stories from the field.


I wholeheartedly disagree. Linear functions like this promote laziness in variable naming (var a1, c_tfr, bvf, etc). This also leads to buggy side effects such as having multiple nested if statements performing a plenko-machine determination of code branching. It’s horrid. It’s unmaintainable. It guarantees that someone will have to rewrite it after your gone, because you will be gone.

This is the same as someone arguing for scrolls when books with table of contents and appendices are far superior.


Your rant is misplaced; it is the spirit rather than the letter of the thing that matters. Linear giant code is often easier to comprehend for structures like state machines where you can follow the business logic from one stage to another easily.

See my other comment here: https://news.ycombinator.com/item?id=37518275


DRY, SOLID, there’s a wrath of principles on why this isn’t correct. Here’s what Code Complete [0] has to say…

>” From time to time, a complex algorithm will lead to a longer routine, and in those circumstances, the routine should be allowed to grow organically up to 100-200 lines. (A line is a noncomment, nonblank line of source code.) Decades of evidence say that routines of such length are no more error prone than shorter routines. Let issues such as depth of nesting, number of variables, and other complexity-related considerations dictate the length of the routine rather than imposing a length restriction per se.

If you want to write routines longer than about 200 lines, be careful. None of the studies that reported decreased cost, decreased error rates, or both with larger routines distinguished among sizes larger than 200 lines, and you’re bound to run into an upper limit of understandability as you pass 200 lines of code.”

[0] https://books.google.co.in/books?id=LpVCAwAAQBAJ&pg=PA174


These are all just guidelines/heuristics and should not be treated like inviolable laws. Thus all advice should be adapted to the problem at hand in the service of Readability/Comprehensibility first.

Instead of repeating myself, i point you to my other comments in this thread for details.


While I do like linear code the bigger problem is that everything is in scope. If you break it apart into separate functions you can clearly see the inputs and outputs.


Right, you can read both but it also requires more brain power to separate out when not in separate functions. It's also easier to unit test as the code grows larger.


this - this is the reason. Large functions accrue state and state begets bugs over time.


It's more readable to the CPU too.

Deeply nested code, especially with many functions that are called once, is really horrible to debug.

Extract functions when you see obvious repetition, not just to appease some dogmatic abstraction goal. Incidentally, this also helps the CPU (cache locality).

Along the same lines, I'd rather have a directory with several dozen source files than several dozen nested directories that may contain only one or two files each.


I don't think the issue is about which code is more readable, but whether it's efficient for the CPU or the computer. Modern compilers optimize more than we think in production builds.


I completely disagree with the article. The right hand side is far better. Not perfect; there are definitely a couple of things to improve, but it's better than just a big long meandering god function like on the left. It feels like the author is arguing to go back to the coding style of the 1980s.

Big advantage of the right-hand style: the various steps are laid out in a simple 5-line function. You immediately see what making a pizza involves. Want to know more about it (like whether baking involves the creation of a completely new oven), you can zoom in on the details, but you never have to look at details that are irrelevant to you, unlike on the left side, where you have to dig through a page of code to figure out which part is relevant to you.

Mind you, there are a lot of ways in which the right hand style could go wrong: if you don't separate your concerns, and have global or member variables manipulated by different functions in ways that are not immediately obvious, then superficially clean code could be hiding some terrible spaghetti. But at least the right-hand style punishes you for that and encourages you to do better (in fact, I'm currently refactoring a bit of code that did exactly that). The left hand side would allow terribly messy code with complex interactions between different parts of the code without making it obvious that those interactions are there, and will make it more intimidating to refactor them. Small functions are easier to test and easier to refactor.


I agree. The right hand side also makes it very easy to zoom in on the problem details. If the pizza is not properly boxed, I don't want to worry about whether the salami was sliced the right way. I can skip over that, and immediately zoom in on the boxing process.

Pretending that a comment header is the same as a function is a bit silly. We can navigate to functions, not to comments.


>> this code makes no sense: why would you create a whole new oven to make a pizza? in real life...

one can rationalize all sorts, but certain real-life metaphors don't have to map closely to the digital realm.

if my CreatePizza function relies on remote/dynamic code/features (realtime functionality), then it's simpler and possibly safer to re-create than re-use. Depends on the use-case.


I can imagine a new control statement with this type of syntax:

  code
  code
  uses (a, b, c, d) { // Step 5: Foo the bar
    code
    code
  }
  more code
  more code
It's a block that defines the variables it uses, with no other access to the outer scope. It would help break up a linear function into blocks with clearer dependencies.


> I know this is a synthetic example but this kind of issue actually occurs in real code and sometimes causes performance issues. It is likely that this code should take the oven as a parameter. Providing it is the job of the caller.

The caller might not care about the oven, does not know the processes needs a oven, or does not even have a oven.

The injection pattern could be used.


I'm only a novice at programming, but my usual rule is if the indentation gets too far in, then it's time to put it in its own function. When it becomes hard to follow, I will put it in its own function doing The Thing(tm). But I won't break down every small logic into its own function - it's just too much.


The problem with code styles is that most developers can only reason about information systems, where the only thing your code has to do is dispatch the right data to the right place.

As soon as you try and write a function that actually uses the data, you find out that every book has been written for CRUD application programmers.


I couldn't read the examples (on mobile), but in general I like more functions because it it is easy to skip over details I don't care about. If I'm interested in the oven preheat (his example) i'll dig into it, but if not I'll skip over that part to the toppings function.


I write top-to-bottom code almost always. And prefer not to have separate functions if I don't need them. Here is the real example I'm building CRM https://www.youtube.com/watch?v=l4QjeBEkNLc


you can split large linear function into smaller ones with this one simple trick:

  function myFunction() {
  
    /************************************
     * Subsection 1
    ************************************/
    // code

    /************************************
     * Subsection 2
    ************************************/
    // code

    /************************************
     * Subsection 3
    ************************************/
    // code
  }

better than splitting into multiple function calls with a bunch of variable-to-parameter and return-type to variable renaming going on. Helps if your language allows you to limit some variable scopes, but usually I wouldn't bother


Or you use the correct types and avoid these "problems" alltogether.

It's not possible to bake in a cold oven. Why does your type allow it then? Why don't you encode the state directly?

  oven := ColdOven.heat()
  bakedPizza := oven.bake(pizza)


> Also, what happens if you pass a pizza to those functions twice? Are they idempotent or do you end up eating cinder?

That is surely about state and mutable data, not code structure. And factored code makes it _easier_ to write more stateless code.


Yeah, that's the most frustrating part of the debate here.

Arguing about the difference between:

  number := prepareZero
  addOne(number)
  addTwo(number)
  addThree(number)
  return number
and

  number = 0
  number += 1
  number += 2
  number += 3
  return number
When either of the following would be far better:

  addThree (addTwo (addOne prepareZero))
or

  return 0 + 1 + 2 + 3;


you call your blog separateconcerns.com and then advocate for the opposite. functions have their purpose. one of them is separating concerns as is done here. the code on the right is probably better than 99.9% of code out there.

there is even an academic book (normalized systems theory) which claims that having more concerns in one function, will inevitably cause an explosion of your codebase where you have to write a ton of code for a very small change. i doubt the validity of the proof they provide for this, but its something to keep in mind as i have not seen anyone more serious than I claim that it is wrong.


For me , if a function is bigger than one page and I have to scroll , it means it is probably too long . If it’s length is less than one page , I don’t bother to break it down into smaller functions unless it makes sense


Drakon uses Silhouettes to show code linearly and abstracted at the same time:

https://drakon.tech/read/silhouette


Mutation everywhere, no thank you.

This approach requires one to keep the state in mind while manually "evaluating" the mutations along the way, forming a picture, or whatever one uses, in mind about the created artifact.


Correct.

Especially when there's mixing of in-place mutation and return-values.

The first function in the green: func createPizza(order) { pizza = prepare(order) bake (pizza) box (pizza) return pizza }

`prepare` transforms an order into a pizza, but `bake` and `box` mutates one in-place.


Good luck with finding proper line to make change or find a bug with single long linear function that is always a mess, because theres never time to refactor. I would rather not want to work with such code.


Pro tip: Use an editor that doesn't allow you to quickly jump to the definition of a function. You will make your code more readable because you will prefer to write linear code.


The one on the right is more readable, but takes things too far seemingly to prove a point. For example, "addToppings" clearly doesn't need to be a separate method.


Those comments won't match the code in 6 months.

Edit to add: and in 6 months, instead of being one short page of code, it'll be 600 lines long and impossible to understand or modify safely


Do other people not consider comments in PRs anymore? Would someone ok a PR [for a function] with 600 lines? Probably not. Let's not be absurd here, but I fundamentally agree that it's too long already.

Yes, maybe it's the wrong abstraction or it won't match in 6 months because people will start calling Hats Pizzas and the logic will be different. Maybe I'll be dead tomorrow. I don't concern myself with what-ifs over unknowns.

I want correctness and some assurance of it. I don't think writing 50? tests to cover this thing is a good way to spend my time (or someone else's). I'll write the left hand side first. When I'm looking at writing tests, it will become something like the right hand side.

Yes linear code is more readable. That's something to consider, but it's not my primary consideration.


My code revivew tool shows me 3 lines of context around changes, which means the untouched comment won't even be seen unless i think to look for context. The tool also makes it hard to comment on those lines that were not touched but should have been.

I've used many code review tools, all suffered from that. (the only exception wasn't a tool: print out of all the code to physical paper and then we all went into a meeting room to review for several hours - this was the best type of code review for finding things wrong, but it was also extremely expensive so I've only done that once in my life)


Each PR has 20 lines, and there are 30 such PRs over six months.


Apologies. I wasn't clear. I meant it's a stretch to approve a function that's 600 lines. It would be very hard to reason about. I have clarified in my original post. Thank you for the feedback +1 to you.


600 lines is relatively short for some of the bad code that I have seen...


It's so ridiculous though! We have to get rid of comments because some random in the future might commit some rubbish? It's their fault! not mine! I'm just documenting what works at the time of publication


Hahaha!

You are putting something in there that turns out to be disinformation.

I used to believe as you did, but then I started actually looking at old code.

I stored many examples, but I'll give you one to indicate just how bad most comments are:

//Add 1 to x x:=x+2;

Now why the original author thought it was important to explain in English what the next line of code was going to do in the programming language I am not sure.

I am sure that someone figured out that it was a mistake and did the least they could possibly do to fix the mistake.

Something like 80% or more of the comments I've seen have been misleading, wrong, or useless.

If you do your own survey of someone else's old code so that your ego doesn't get in the way you will probably find something similar.

Although I did talk with people working at NASA and they had someone on their code reviews who was specifically looking at comments. If you are that disciplined then your code will not be that bad, probably.


Comments are meant to pass information to readers about rare cases that may not make much sense at first but have a reason to be.

This is to avoid the introduction of bugs by tempting someone to refactor or "fix" your code.

A properly written code, with good naming conventions for parameters, variables and functions should be easy to understand without comments.

I should not need anyone to tell me the next few lines will

//Heat oven

That should be self explanatory by just looking at the code itself.


Design patterns that are used too soon can contribute to less readable code too. Like implementing the strategy pattern when a simple if else would be much more succinct.


Beside this, I also hate navigating through modules. I would like to see a part of code in a single file if it will not used anywhere else or the thing is very abstract.


I wonder if modularization is not a form of parametrization.


I think it is, at least in some cases. I've found that Elm/React are like this with dynamic elements.

First, you code the component you want with hardcoded styles and data. Then, you extract that to a function. Finally, you pull out the harcoded styles and data into parameters for that functoin. Now you have a reusable component.


Prepare, addToppings and bake are meaningless functions that serve no purpose. Meanwhile heatOven, bakePizza and box do have very good reasons to exist.


I hate asking the question "is this function called from different places, or was it extracted only for aesthetic reasons?".


The function will/may get called from different places in the future. I am coding for the future.


Ah, the good old premature abstraction.

You're coding for a future that might not exist. You might be coding for the wrong future and you painted yourself into a corner.

Been there, done that.


But what if I'm coding for the correct future?

Maybe there's a way where I can code for every possible future with minimal effort. I'm talking about a pattern that isn't a form of premature optimization. Just a rule.

Your way of coding is, coding for the most probable future. Distinctly different of coding for every possible future.


>> The function will/may get called from different places in the future. I am coding for the future.

> Ah, the good old premature abstraction.

The function will get called from different places. Once from its caller, and a second time from its unit test.


Functions have never been about simply being called different places.

So you entire premise is wrong.


Would you hate it if it was really, really easy to answer?


I’d like to also see the Rx example of this code. In my experience it would be vastly less readable but probably half the length.


Left case has too much scope. I don’t know at a quick glance if a variable from line 1 is used on line 200


I don't like the side effects of the second addToppings. I'd much prefer

pizza.Toppings = getToppings(kind string)


> pizza.Toppings = getToppings(kind string)

That's a side effect.


pizza isn't, and can't be, opaquely modified by the topping adder this way.


Testability matters more than readability - please separate different parts into different functions!


End to end tests only, for you! They’ll find your bug in 30 minutes or you get your money back!


Very much depends on the naming and arch.

If the naming and architecture are good, then it reads like a book.


Yeah, when it’s trivial code like this and doesn’t go on for ten pages that might be true.


The code in the red is imperative, with implicit state all over the place. I would argue that it's not linear, you have to keep all those mutable values and state transitions in your head as you read what's happening from top to bottom.

It's not extensible, It's not composable, It's hard to test, and frankly, it's complicated and complex for no other reason than a particular type of engineer thinks its easier to read, because they feel all programming should be imperative.

Don't get me wrong, there's a time and a place for this style of programming. (Manual memory management, algorithmic design that maximizes speed or memory usage, or even taking a bunch of services and dictating the order in which they are supposed to be executed.)

For business logic, especially the type thats supposed to model "the world," this is terrible code.

How is the bottom, not better code?

    type Oven struct {
        Temp int
    }

    type Box struct {
        // Box properties
    }

    type Pizza struct {
        Base    string
        Sauce   string
        Cheese  string
        Toppings []string
        Baked   bool
        Boxed   bool
        Sliced  bool
        Ready   bool
    }

    type Order struct {
        Size string
        Sauce string
        Kind string
    }

    func preparePizza(order *Order) *Pizza {
        toppings := map[string][]string{
            "Veg":  []string{"Tomato", "Bell Pepper"},
            "Meat": []string{"Pepperoni", "Sausage"},
        }
        return &Pizza{
            Base:     order.Size,
            Sauce:    order.Sauce,
            Cheese:   "Mozzarella",
            Toppings: toppings[order.Kind],
        }
    }

    func bakePizza(oven *Oven, pizza *Pizza, cookingTemp int,    checkOvenInterval int) *Pizza {
        // Simulate oven heating
        for oven.Temp < cookingTemp {
            time.Sleep(time.Duration(checkOvenInterval) * time.Millisecond)
            oven.Temp += 10 // Simulate oven heating
        }
        pizza.Baked = true
        return pizza
    }

    func boxPizza(pizza *Pizza, order *Order) *Pizza {
        box := &Box{}
        pizza.Boxed = true // Simulate putting pizza in box
        pizza.Sliced = true // Simulate slicing pizza
        pizza.Ready = true  // Simulate closing box
        return pizza
    }
    
    // I just need to really understand this imperative part
    // this is the meat and potatoes
    func createPizza(order *Order, oven *Oven, cookingTemp int, checkOvenInterval int) *Pizza {
        pizza := preparePizza(order)
        pizza = bakePizza(oven, pizza, cookingTemp,  checkOvenInterval)
        pizza = boxPizza(pizza, order)
        return pizza
    }


Now in a Functional lang, that has good scoping and private functions, etc.

    defmodule Pizza do
      import Pizza.Pizza, only: [new: 0]
      import Pizza.Order, only: [new: 0]

      def create_pizza(order, oven_temp) do
        order
        |> prepare_pizza()
        |> bake_pizza(oven_temp)
        |> box_pizza()
      end

      defp prepare_pizza(%Pizza.Order{size: size, sauce: sauce, kind: kind}) do
        toppings = 
          case kind do
            "Veg" -> ["Tomato", "Bell Pepper"]
            "Meat" -> ["Pepperoni", "Sausage"]
            _ -> []
          end

        %Pizza.Pizza{base: size, sauce: sauce, toppings: toppings}
      end

      defp bake_pizza(oven_temp, %Pizza.Pizza{baked: _baked} = pizza) when oven_temp >= 400 do
        %Pizza.Pizza{pizza | baked: true}
      end

      defp bake_pizza(_, pizza), do: pizza

      defp box_pizza(%Pizza.Pizza{boxed: _boxed, sliced: _sliced, ready: _ready} = pizza) do
        %Pizza.Pizza{pizza | boxed: true, sliced: true, ready: true}
      end
    end


Maybe more readable but less testable and less maintainable longer term.


Let's heat up the oven then check if the pizza is baked.


Seems like the key takeaway was adding comments.


Pretty lame contrived example; easy to make a case for inlining when your functions are 5 lines long. In any case, think I'll go on ignoring dogmatic coding advice.


having a poor taste is nothing to have confident with. The author sounds like he has never coded anything large and complex.


> having a poor taste is nothing to have confident with.

Huh, I can say the same about you by the reaction to this post.


All of this "why your favorite best practice is wrong, actually" stuff gives me whiplash.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: