Linear code is more readable

theptip · 2023-09-15T05:47:43.000000Z

It’s a matter of style, and like cooking, either too much or too little salt will ruin a dish.

In this case I hope nobody is proposing a single 1000-line god function. Nor is a maximum of 5 lines per function going to read well. So where do we split things?

This requires judgment, and yes, good taste. Also iteration. Just because the first place you tried to carve an abstraction didn’t work well, doesn’t mean you give up on abstractions; after refactoring a few times you’ll get an API that makes sense, hopefully with classes that match the business domain clearly.

But at the same time, don’t be over-eager to abstract, or mortally offended by a few lines of duplication. Premature abstraction often ends up coupling code that should not have to evolve together.

As a stylistic device, extracting a function which will only be called in one place to abstract away a unit of work can really clean up an algorithm; especially if you can hide boilerplate or prevent mixing of infra and domain concerns like business logic and DB connection handling. But again I’d recommend using this judiciously, and avoiding breaking up steps that should really be at the same level of abstraction.

ncann · 2023-09-15T06:34:30.000000Z

> In this case I hope nobody is proposing a single 1000-line god function. Nor is a maximum of 5 lines per function going to read well.

This is the key. Novice devs tend to write giant functions. Zealot devs who read books like Clean Code for the first time tend to split things to a million functions, each one a few lines long (pretty sure the book itself says no more than 5 lines for each function). I worked with a guy who extracted each and every boolean condition to a function because "it's easier to read", while never writing any comments because "comments are bad" (according to the book). I hate that book, it creates these zealots that mindlessly follow its bad advices.

bunderbunder · 2023-09-15T12:55:53.000000Z

Or, the fun one I run into is devs who write a mix of 1000 line functions and tiny little 5 line functions with no discernible pattern to which option is chosen when.

The truth is that what makes code readable is not really (directly!) about function size in the first place. It's about human perceptual processing and human working memory. Readable code is easily skimmable, and should strive to break the code up into well-defined contexts that allow the programmer to only have to carry a handful of pieces of information in their head at any given moment. Sometimes the best way to do that is a long, linear function. Sometimes it's a mess of small functions. Sometimes it's classes. Which option you choose ultimately needs to be responsive to the natural structure of the domain logic you're implementing.

And, frankly, I think that both versions do a pretty poor job of that, because, forget the style, the substance is a mess. They're both haphzardly newing up objects and mutating shit all over the place. This code reads to me like the end product of about four sprints' worth of rushing the code so you can get the ticket closed just in time for sprint review.

I mean, let's just think about this as if we were describing how things work in a real kitchen, since I think that's pretty much what the example is asking us to do, anyway: on what planet does a pizzeria create a new, disposable oven for every single pizza? What the heck does

  pizza.Ready = box.Close()

even mean? Now we've got a box containing a pizza that's storing information about the state of the object that contains it, for some reason? Demeter is off in a corner crying somewhere. What on earth is going on with that 'if order.kind == "Veg"' business, why aren't we just listing the ingredients on the order and then iterating over that list adding the items to the pizza? The logic for figuring out which ingredients go on the pizza never belonged in this routine in the first place; it's ready aim fire not ready fire aim. etc.

Vt71fcAqt7 · 2023-09-20T16:32:05.000000Z

This is in the article (seems to have been edited after your comment):

>I wasn’t sure if I wanted to mention this or not, but I ended up editing the post because there is something that bothers me with this function, and it is that business with the oven.

>[...]

>this code makes no sense: why would you create a whole new oven to make a pizza? In real life, you get an oven once, and then you bake a whole lot of pizzas with it, without going through the whole heating cycle.

usrbinbash · 2023-09-15T06:58:03.000000Z

The problem with any good idea: As soon as it becomes dogma, it doesn't matter how good the original idea was, it will turn itself bad.

rightbyte · 2023-09-15T09:15:06.000000Z

And programmers are way worse than people in general when it comes to being dogmatic.

marginalia_nu · 2023-09-15T13:09:25.000000Z

I think this is an experience thing. There are a lot of inexperienced developers, and inexperienced developers tend to become very attached to development philosophies, and attribute all problems to the lack of adherence of their pet philosophies.

This tends to wear off with exposure to the real world. Not only will you find undeniably good code that's written in flagrant disregard to the holiest of doctrines, you'll also find garbage written The Proper Way, more damning still sometimes you'll discover it was written by your own hands.

I don't think it's a coincidence many of the ideas that have the most fervent and zealous followers have names that sound righteous, if it isn't clean code it's pure functions or more recently memory safety. Clearly nobody who is on the "side" of dirty, impure and unsafe code can be right?

usrbinbash · 2023-09-17T07:53:36.000000Z

> I think this is an experience thing.

Ahhhhh...yes, but...

There are two kinds of people when it comes to dogma: The faithful, and the preachers. The inexperienced developers are the faithful. They cling to the scripture without proof that it is actually necessary.

Insofar, I agree with your post.

But the faithful need someone to preach the faith, and those are usually not the inexperienced ones. Those are usually experienced developers. Their personal reasons to cling to the dogma are varied: Some may have started as faithful themselves, for some it's stubbornness, an unwillingness to change, maintaining a feeling of superiority, the fear of becoming obsolete, ...

So here we are in disagreement. The preacher is the product of experience and development over time. And in my book, the preachers of dogmas are more of a problem than the faithful who follow them. Because it's the preachers who write the scripture, the preachers who make up arguments why alternatives to the ideology are bad, and the preachers who seek to isolate their flock from the "evil" preditions of alternatives.

> I don't think it's a coincidence many of the ideas that have the most fervent and zealous followers have names that sound righteous, if it isn't clean code it's pure functions or more recently memory safety.

Closing my answer on another point of agreement, it is absolutely not a coincidence, that the wording of dogmas in programming sound eerily similar to that in religious teachings ;-)

indigoabstract · 2023-09-15T15:37:44.000000Z

Well said. Following a set of rules is no substitute for experience, despite any pressure to do so.

Since every situation is unique, even if similar to others, I think it's always best that programmers rely on their judgement and experience when writing code instead of a set of axioms.

And besides, blindly embracing a set of rules is acting more like a robot, instead of a human being.

rightbyte · 2023-09-15T20:59:13.000000Z

"Agile" is another of those words, used to whitewash stiff, heavy and dogmatic dev processes.

ragnese · 2023-09-15T11:36:24.000000Z

I prefer to read this in such a way that implies that programmers are not people. :p

I jokingly refer to our customers/users as "humans" at work.

ajuc · 2023-09-15T17:24:06.000000Z

Hard not to become dogmatic when your living depends on getting every comma exactly right and you're getting instant impartial feedback every time you get something not quite right :)

pindab0ter · 2023-09-15T21:37:58.000000Z

I disagree. Programming allows for so much freedom and flexibility in how you approach problems. I find that a large portion of the job is following your gut feeling and putting it into words.

Aerbil313 · 2023-09-15T12:07:38.000000Z

I don't understand this widespread sentiment at all. If the dogma-idea is right, it's right. No matter how dogmatic the people who are defending it are.

btilly · 2023-09-15T13:03:52.000000Z

Most of these ideas are tradeoffs. If a particular tradeoff is the right thing to do 80% of the time, then it is clearly a good idea. But if you understand it as dogma, you'll do the wrong thing 20% of the time.

Take the pizza example. Which is better, linear code or small functions? It's a series of tradeoffs. Once you get above a screen of code, or 10 ifs, functions become hard to read. Once the same logic has been written 3x, abstracting it is usually a win. Even if it is small. And there is a fuzzy area where it isn't obvious which is better, and debating it is probably a loss over writing it and moving on. Doubly so if you're defaulting to the same kind of decisions every time and so the style is consistent.

In a world full of pragmatic tradeoffs, dogmatism is rarely the right choice. (Unless you haven't learned the tradeoffs.)

PH95VuimJjqBqy · 2023-09-16T17:25:16.000000Z

My pithy description of software development is "a series of decisions in pursuit of a goal".

dogma invites people to stop using their critical thinking skills.

One of my favorite examples of this:

Everyone would agree that having a newborn in a car means safety is paramount. Everyone also agrees that left turns are less safe than right turns. No one would agree that this implies you should never make left turns in a vehicle with a newborn.

^But the above is how both security people and TDD proponents tend to act, as if there can be no risk assessment and critical thinking involved. We've all made right hand turns when we really wanted to go left because there was just too much traffic, even without a newborn in the vehicle.

Aerbil313 · 2023-09-15T17:13:20.000000Z

I was thinking about no-tradeoff truths. But you’re right on those tradeoff situations.

wilsynet · 2023-09-15T12:15:13.000000Z

There are very few ideas that are absolutely right. I don’t consider “facts” as ideas, although these days even the facts have alternatives too.

usrbinbash · 2023-09-17T08:03:13.000000Z

It doesn't matter how good the underlying idea is, a dogma is always bad.

Because ideas are highly unlikely to be universally correct, no matter how good they are. Even if an idea is supported by all available evidence, it MUST subject itself to scrutiny and possible falsification, all the time.

A dogma flies in the face of that. It is, by definition "any belief held unquestioningly and with undefended certainty" (quote from wikipedia). Once people follow an idea dogmatically, they are very likely to apply that idea no matter if it makes sense or not. They stop following logic and start following scripture.

It is bad enough if this happens in science, where we really do have systems so far supported by all available evidence. But it becomes a lot worse in areas like software development, where we know there isn't "the one" true way of doing things.

And to head off one likely reply to this: Yes, I apply this logic also to the assumption that "dogmas are always bad". If someone could present proof showcasing a dogma that only has good outcomes, with no negatives attached, then I am willing to change my mind on this.

rTX5CMRXIfFG · 2023-09-15T12:24:08.000000Z

just because people interpret a fact incorrectly doesn’t mean the fact stops being a fact, lol wth is this sub

oblio · 2023-09-15T12:27:27.000000Z

What's a sub?

Sindisil · 2023-09-15T13:40:04.000000Z

They're likely a reddit refugee referring to this site as a subreddit.

jjice · 2023-09-15T12:54:55.000000Z

Oh man it's easy to spot someone who blindly follows Clean Code. I personally don't like it, but I am I fan of all of Martin's other books. It's just aggressively opinionated in a way that I just can't get behind. I'm sure I'm not alone but reading that book made me feel insane since he described things as objectively good that I found awful.

crdrost · 2023-09-15T16:34:09.000000Z

I don't know if it's because of clean code or because he calls himself my uncle or what else, but he's always rubbed me the wrong way.

I always ask people to think about printed pages and they look at me as if I'm crazy... But it's like, if you have to pick up a reference book or something, carefully find the right section addressing your problem, you want to read it, how many pages do you want to read at a sitting? For most the answer is ideally 1 but you can read 2-3 and still not get annoyed, right? If it gets longer than 10 then that's doable but not what you signed up for. Well if I print code onto those pages, and just assume that most of English lines are kind of filler which programming languages don't have, so no compression due to the sparseness of code, then you get ~40 LOC on a textbook page, ideally you would solve a problem in 40 LOC but if it took 120 LOC that is still perfectly readable, it's when it gets to 400+ish that something has really started to get confusing about the structure.

Same with diffs, 400-line diff is still reviewable, but barely.

The printed page isn't the point, the point is that these are kind of objective numbers, if I describe them in printed pages everybody seems to agree on these numbers... But then a book like Clean Code comes around, people want to have these tiny little scraps of an eighth of a page, bound together in a little flip book of half-index-card strips each pointing at other strips, “bake the pizza (see strip 37)”, and nobody thinks about whether this is actually an informational presentation mode that anybody really wants to use. “It works better for review time because you encourage only reviewing one or two pages of flip book at a time.” Yeah Bob I see what you're saying but, like, is this my “crazy uncle” now who insists that the usual book is going away because with the advent of Wikipedia and infinite content feeds all knowledge and story will forever be stored in such flipbooks? Just because editors who don't care about the overall story anymore because their attention span are shot to hell find it easier to review half an index card at a time? This is a good thing? Something feels off!

You get this same argument from people who believe in the layered server architecture. “Business logic needs to go in the business layer, database logic in the data access layer, presentation logic in the view layer, routing logic in the controller layer.” but you would never voluntarily read a book that was structured this way! “Matt saw Alice sitting there, a young girl of maybe 16, gorgeous in her melancholy and disaffected way, an old schoolmate of his. He waved. She beckoned. He said “Hi, how are you?” and she replied...” Right, the author gave you a data structure of adjectives to associate with Alice, you didn't have to flip to the Characters section looking for “ALICE_INTRODUCTION” and wade through all of the different ways she appears in the book to find “when Matt first sees her she is melancholy and gorgeous” and then flip all the way back to the story that you were reading, then flip to the Dialogue section looking for MATT_ALICE_INTRO_DIALOGUE, hope you left a bookmark back in the Main Story section back there! “Oh, but it is so easy to read the whole book if you can skim through the Main Plot part of the book without ever knowing anything about the characters or settings or repercussions or dialogue, “Matt saw Alice (ALICE_INTRO), she beckoned, they talked (MATT_ALICE_INTRO_DIALOGUE), he walked to the diner...”. And if you complain about the big all caps stuff someone says “well in a modern hypertext reader, those just become links and you never need to see them directly!” except you do because you have to maintain it... And it's like, I get it! You can probably compress most modern novels considerably if you remove all their descriptions and dialogue to appendices, it's not wrong! But writing is so much slower in that format, debugging is surprisingly so much slower in that format, the things that are faster are queries like “Did Alice ever mention her father to Matt in the recorded dialogue?” and then you make a refactoring change on the basis that Matt should not know anything about Alice's father and then it turns out that it generated a plot hole because somewhere in the Exposition layer the two were connected more obliquely, Alice wrote about it in a post-it on the fridge or something, that she was going to see her father who was ill.

PH95VuimJjqBqy · 2023-09-16T17:11:55.000000Z

The point of abstraction layers is more about responsibilities (and the abstractions).

For example, the DAL shouldn't know that a missing record is going to return a 404, instead it needs to be able to express record not found in it's API. The business layer should also not know that a record not found is going to return a 404. Instead the business layer needs to be able to express that a record was not found. The web layer needs to know that when a record not found is expressed, we return a 404.

This is why I'm not a fan of ORM's without a DAL. Too many people will sprinkle the ORM code directly into a controller and call it a day, and then ORM's will come up with all of these unmaintainable ways to "re-use" queries and all the nasty performance knobs that come with that.

And I'm not saying the gap between the layers needs to be that thick. If a DAL wants to hand back ORM models directly, more power to them, just disconnect them from the DB before you do. If the web layer wants to use those same models as the api contract, more power to them, that can always be fixed when and if they diverge.

And it's not as if these layers themselves must be proper layers, that's what I meant when I said responsibilities. The web layer is responsible for web concerns (security, api contracts, etc). If you want to treat the controller as an orchestration mechanism that you get from 15 different Dependency Injected services, great. It doesn't need to be a physical layer, but it should be a logical layer and the layers below being able to express everything the layer(s) above need to know is an important part of that sort of design.

segfaltnh · 2023-09-15T12:02:44.000000Z

I was this dev early in my career. A sharp overreaction to a giant ball of mud architecture with no tests and minimal consistency. I read all those books looking for some better way and inflicted all those rules on people.

I don't regret the learning, but I do regret being dogmatic. It was interesting that no one around me knew better either way, or felt they could provide reasonable mentorship, so we went too far with it. These days I write the pizza function on the left, and use comments sparingly where they add context and reasoning.

pindab0ter · 2023-09-15T21:36:18.000000Z

I just inherited a code base written by somebody in that exact same situation. Saving a single file associated with a model in 27 steps instead of about 4.

livrem · 2023-09-15T06:54:55.000000Z

Clean Code says "Functions should not be 100 lines long. Functions should hardly ever be 20 lines long".

I think both 100 and 20 are a bit low, but much better than 5. As I mentioned in a comment a few days ago when I also corrected someone that misremembered a detail from the book, I am not a huge fan. But I also think it is mostly correct about most things, and not as terribly bad as some say. Listening to fans of the book is more annoying than to actually read the book.

(And that other comment when I corrected someone was about bad comments. Clean Code definitely does not say that you shall never comment anything.)

ncann · 2023-09-15T14:07:05.000000Z

I went back to check and this is what the book says, verbatim:

"Every function in this program was just two, or three, or four lines long. Each was transparently obvious. Each told a story. And each led you to the next in a compelling order. That’s how short your functions should be!"

So I think it's fair to say the book advocates for functions 2-4 lines long.

And about comments, from the book:

"So when you find yourself in a position where you need to write a comment, think it through and see whether there isn’t some way to turn the tables and express yourself in code. Every time you express yourself in code, you should pat yourself on the back. Every time you write a comment, you should grimace and feel the failure of your ability of expression"

"Some comments are necessary or beneficial. We’ll look at a few that I consider worthy of the bits they consume. Keep in mind, however, that the only truly good comment is the comment you found a way not to write."

With opinionated sentences like these, it's not hard to see how one would read the book and adopt a "no comment" mindset.

xorcist · 2023-09-15T16:51:24.000000Z

It also completely misses the point of why comments are useful.

"Store user.age in age variable", is a useless comment which is indeed better expressed with clear code.

"Store user age in struct because when xyz() iterates over this it has no way to access the user object" is useful because it tells us why something is done, where it is used, and why the obvious solution isn't right.

PH95VuimJjqBqy · 2023-09-16T17:42:58.000000Z

I fought and lost this battle with one of our teams.

The tech lead insisted they use XML comments (Visual Studio) for everything.

///<Summary> ///Represents the User ///</Summary> public class User { ///<Summary> ///Users Age //</Summary> public int Age {get;set;}

  ///<Summary>
  ///Users First Name
  //</Summary>
  public string FirstName {get;set;}

}

ad nauseum.

Here's the thing. Swagger (.net) can pick up the XML file generated from these comments and it gives developers the ability to add more information to the Open API Spec file (swagger generates a UI off of it).

So it has a legitimate use, but if you don't have anything than to repeat what the damned code already says, it's harmful to the readability of the codebase.

ahoka · 2023-09-15T09:19:34.000000Z

A classic criticism of CC: https://qntm.org/clean

maleldil · 2023-09-15T11:30:57.000000Z

I love qntm's writing, be it criticism a technical book or blowing my mind with science fiction. I strongly recommend "There Is No Antimemetics Division".

bunderbunder · 2023-09-15T13:14:04.000000Z

I think that no sensible rule of thumb is possible unless we're specifying a language, because language "density" can vary so greatly.

I follow wildly different rules of thumb about everything from line of code count to how many methods per class to whether or not that functionality belongs in a class in the first place, depending on whether I'm writing Java or Python or F# or Racket.

raincole · 2023-09-15T07:50:16.000000Z

It's not from Clean Code, but Refactoring.

mcv · 2023-09-15T09:27:26.000000Z

I think I completely agree with that line. 5 is a nice goal to aim for. Sometimes you hit 1, some things really do take 20. 100 lines is almost always a bad idea (unless it's a 100 really boring and obvious lines).

I haven't read the book, and I can see how people can go overboard and can turn good advice into a caricature of it, but short, well-named functions that focus on a single thing are generally better than long ones that do dozens of different things. Separate those concerns.

omgmajk · 2023-09-15T12:33:39.000000Z

I literally just pushed a 101 line function to prod that is named "download_and_extract" that downloads some files from a place, extracts them, then has a lot of error checking and a couple of logging statements and hands off to a few smaller functions to move and re-arrange files. It is long but it is readable and doesn't really fit a more abstract way of doing things. But that's my style I guess.

Turskarama · 2023-09-15T10:48:50.000000Z

Length of function is not even a metric, it is at worst a code smell.

jamil7 · 2023-09-15T07:41:34.000000Z

> I worked with a guy who extracted each and every boolean condition to a function because "it's easier to read"

Obviously, readability is important, but I've also seen things like this so often in my career where it's used as an excuse for anything. Most recently, trying to stop a teammate turning nearly every class into a singleton for the sake of "simplicity" and "readability", which I thought was a real stretch.

drewcoo · 2023-09-15T15:07:02.000000Z

> pretty sure the book itself says no more than 5 lines for each function

The book was written by a Java dev who was dipping his toe into Ruby.

Go code, covered everywhere in an obnoxious rash of error handling, will be bigger.

lobocinza · 2023-09-15T16:33:55.000000Z

That's why I don't read books.

coldtea · 2023-09-15T07:34:54.000000Z

>In this case I hope nobody is proposing a single 1000-line god function.

Why not? Who said it's worse? What study settles the issue?

Some times a "1000-line god function" is just what the domain needs, and can be way more readable, with the logic and operations consolidated, than 20 50 line functions, that you still have to read to understand the whole thing (and which then someone will be tempted to reuse a few, adjust them for 2-3 different needs not had by your original operation, and tie parts of the functions implementing your specific logic to irrelevant to it use cases).

And if it's a pure 1000-line function, it could even be 10,000 lines for all I care, and it would still be fine.

visarga · 2023-09-15T08:28:57.000000Z

Yeah, when code gets spread out across too many classes and functions, it's like you're trying to navigate a maze without a map. You hit a breakpoint, and you're left scratching your head, trying to figure out what the heck each class is supposed to do. Names can be deceptive, and before you know it, the whole architecture feels like a jigsaw puzzle. It's a cognitive load, having to keep track of all these quirks. Maybe it was easier for the author to do it that way when they started from scratch, but after they finished, it's another deal.

stodor89 · 2023-09-15T11:49:43.000000Z

Okay, I don't care much about all of the unproven "software engineering" cargo cult rituals, but maybe 10,000 lines is pushing it a bit!

derangedHorse · 2023-09-16T13:03:24.000000Z

1000-10000 lines typically mean the developer just doesn’t know how to abstract. Don’t go overboard with the function extraction but also don’t make me read every line of your code so I can find the one tiny part I want to change.pseudo-functions, like the commented segments of code like in the linked post, helps but it’s not obvious which data those segments of logic are depending on.

orwin · 2023-09-15T09:53:54.000000Z

I think the only good use case I have for 50+ lines functions are finite state machines and renderers, whatever the form.

Do you have other examples of 50+ lines functions where you thought it was the best to not separate issues?

fingerlocks · 2023-09-15T10:22:17.000000Z

Specialized parsers. Encoding and decoding tasks. Complex computation is often isolated to help with peer review. Pattern matching routines.

Also, constructors with many validation steps that are compiler constrained to their local scope. That seems common.

orwin · 2023-09-15T19:41:11.000000Z

Aren't specialized lexer/parsers finite state machines? As are encoder/decoders?

OK fir complex computation, I left the world of mathematics 7 years ago, and I wasn't at the edge on that, I trust you, but to be clear, all your examples scream 'FSM' to me. If you have a pattern matching routine of 50+ lines that isn't a finite state machine, you're doing something wrong imho, and should consider changing abstraction (I'm not a big OO guy, but maybe use dynamic dispatch?)

fingerlocks · 2023-09-15T20:34:05.000000Z

Yeah any routine that is not Turing complete is a FSM. Most things are FSM, so it’s not a useful distinction

agent327 · 2023-09-15T10:59:16.000000Z

Anything consisting primarily of a switch statement with a great many cases.

orwin · 2023-09-15T19:43:26.000000Z

Finite state machines then. I never used more than 3 cases unless I had to write one.

agent327 · 2023-09-19T11:38:45.000000Z

Also event dispatchers. I've written switch statements for various event systems that need to handle 60-100 events. You can easily get to hundreds of lines without things getting unreadable.

atoav · 2023-09-15T06:33:37.000000Z

If we go with the cooking analogy, if you have to describe to someone how to cook a meal, and at one part of the meal you have to put the fond in, it is reasonable to explain how to make the fond in a seperate section. The fond is it's own thing and it has one touching point with the food,therefore it is okay (or even benefitial) to move it out.

Also: cooking recipes are also very abstracted. When they say you need to lightly fry onions they assume you know a way to cut onions and a lightly frying algorithm already. If they would inline everything it would become unreadable.

Code is very similar. If you want it strictly without abstractions it will be as low level as your language allows you, and that is definitely not readable code.

If you e.g. instead of using pythons "decode" method tries to do unicode decoding yourself it would become very hard to understand what your program is actually about. Now there are probably zero people who would do that, because the language provides a simple and well tested abstraction — but what makes that different from you creating your own simple and well tested abstraction and using that throughout the actual business logic of your code?

The hard part is creating abstractions that are so well chosen that nobody will have to ever touch them again.

yccs27 · 2023-09-15T13:17:29.000000Z

To stay with the fond analogy: It gets interesting if the fond preparation involves deglazing a pan (mutable environment) with meat bits and juices left at the bottom (state/precondition). Two options:

- Linear code: The meat frying (state-producing) and deglazing (state-requiring) steps are below each other in the same recipe, so to verify that it works you can just linearly go through line by line. However if the recipe becomes long and a lot of stuff happens in between, it's no longer obvious. You'll have to use good comments ("// leave residue in the pan, we'll need it for the fond") because otherwise you might accidentally refactor in a way that violates the precondition (swaps/scrubs the pan).

- Modular code: You need to clearly describe the precondition on the fond preparation subroutine to have any chance to keep using it correctly. On one hand this forces documentation, on the other hand it's probably still easier to forget since the subroutine call ("Prepare the fond.") doesn't directly make the precondition obvious.

Either way has its advantages and drawbacks, and the right choice depends on the circumstances. This is assuming you only want to cook this specific meal and aren't writing a cookbook - otherwise you should definitely modularize to remove repetition.

nucleardog · 2023-09-15T12:36:51.000000Z

> But at the same time, don’t be over-eager to abstract, or mortally offended by a few lines of duplication. Premature abstraction often ends up coupling code that should not have to evolve together.

A relatively common piece of feedback from me to the team at work is usually to take a half step back and look at the larger problem domain and consider whether these things are necessarily the same, or coincidentally the same.

Just because the lines of code look similar right now doesn't mean they need to be that way or need to stay that way. Trying to mash together two disparate use cases because "the code's basically repeated" is often how you get abstractions that, especially over time, end up not actually abstracting anything.

As the various use cases get too divergent, the implementations either move much of the logic up to the caller (shallow abstractions, little value), or expose the differences via flags and end up with two very different implementations under the hood side-by-side (less clear than two independent implementations).

wiseowise · 2023-09-15T06:45:29.000000Z

> In this case I hope nobody is proposing a single 1000-line god function.

I’ll take well-structured 1000-lines function over bad spaghetti of hundreds small functions any day.

leoedin · 2023-09-15T07:11:02.000000Z

Have you ever seen a well structured 1000 line function?

I'm sure they exist - maybe some sort of exceedingly complicated data transform or something. But in almost every situation I've seen, a 1000 line function has countless side effects, probably sets a few globals, takes loads of poorly named arguments, each of which is a nested data structure which it reaches deeply into and often has the same for loop copied and pasted 10 times with one character changed.

Often a 1000 line function is actually 5 or 6 20 line functions. I'm sure there are legitimate exceptions, but I've never seen them.

Thorrez · 2023-09-15T07:29:59.000000Z

https://github.com/github/putty/blob/master/terminal.c#L3281

This function is 1830 lines long. It's reasonably well structured I think. Although the #if 0 are maybe not so good.

mattlondon · 2023-09-15T07:45:02.000000Z

A lot of those if-/case-blocks are precisely where I'd put functions :)

If you changed a bunch of those to separate, pure (i.e. side-effect-free) functions it would if nothing else make unit testing a breeze, and then you'd be free to fix bugs in the logic without fear. As it is, if I had a bug in that huge function I'd be really worried about breaking some edge-condition or implied-state 500 lines up etc.

fingerlocks · 2023-09-15T10:28:33.000000Z

They can’t be side effect free, that’s the point. The switch statements are mutating the input.

Turskarama · 2023-09-15T11:05:00.000000Z

There's a lot that can be, take this for instance (line 3336):

  if (c == '\033')
      term->print_state = 1;
  else if (c == (unsigned char)'\233')
      term->print_state = 2;
  else if (c == '[' && term->print_state == 1)
      term->print_state = 2;
  else if (c == '4' && term->print_state == 2)
      term->print_state = 3;
  else if (c == 'i' && term->print_state == 3)
      term->print_state = 4;
  else
      term->print_state = 0;

This could be turned into a pure function that takes c and print_state as input, and returns a new print_state which the outer function assigns. That's 12 lines turned into 1.

  term->print_state = newState(c, term->print_state);

(I am not a C developer so the sytax could be wrong). Just because the outer function is impure doesn't mean it can't in turn call pure functions.

couchand · 2023-09-15T12:08:28.000000Z

It's a Mealy state machine, it should be encoded as a state machine!

Why some folks will absolutely insist that a series of manually-written case statements in a single thousand line function is the epitome of style, when there's a pure state machine model inside wishing you would free it from the shackles of if else if else if else....

Turskarama · 2023-09-15T12:14:46.000000Z

I have some experience with state machines, and I absolutely hate them. It's so easy to get lost, you never know where a function will go to next.

There's a reason stacks are ubiquitous, they're much easier to fit in your head.

kaba0 · 2023-09-15T15:33:12.000000Z

You can still have side-effects in your “big” function, based on the return values of pure functions.

BenFrantzDale · 2023-09-15T11:38:05.000000Z

I totally agree. And you can always write mutating “functions” (what get called “algorithms” in the C++ world), like C++’s `std::ranges::sort(range)`.

mattlondon · 2023-09-15T11:25:07.000000Z

You can always use return values, and allow the caller to decide how to use them (e.g. caller may mutate other values).

Depending on use and language, this may be "expensive" (e.g. you could be allocating for and then copying some huge data structure only to pass it back where is is simply copied over the top of the input), but this is where discretion comes in and decisions are made on what is appropriate (i.e. is performance critical, or is correctness and maintainability more important?)

bombela · 2023-09-15T08:51:27.000000Z

Oh my God. The sneaky `else` at line 3400.

edit: another one at 3826 with a preprocessor define interleaved.

pjc50 · 2023-09-15T08:26:38.000000Z

Well that's horrendous. Sorry, Simon. Each of those big "switch" statements should be broken out.

PeterisP · 2023-09-15T11:54:39.000000Z

Why? That would create lots of small functions that each is called only once from only one place in the code and still are extremely coupled to the invoking switch statement, often having multiple inputs and multiple outputs which then the invoking statement needs to write in the proper places to which the function wouldn't have access.

Breaking out functions makes sense when you get either reuse or decoupling, but in this case you don't get any of these.

ilaksh · 2023-09-15T14:57:05.000000Z

Because it would objectively and shockingly obviously make the code easier to understand.

You would be able to see what it does at a high level at a glance and then drill down into the functions and sub-functions to focus on a particular part of it.

With this version, to get the high level overview you have to scroll through multiple pages of code and find the comments that say what each section does.

nickitolas · 2023-09-15T16:39:03.000000Z

So the problem is being able to drill down? Most decent IDEs allow you to collapse code blocks (i.e brackets) if that's what you want to do, and the comments accomplish the same thing a function name would. Some editors support region comments that start out collapsed and can be named. I don't see why separate functions would be anyones first instinct to reach for here.

ilaksh · 2023-09-15T23:03:49.000000Z

I don't see why anyone's first instinct to reach for would be relying on some editor's ability to optionally collapse code blocks.

What happens if someone wants to use a different editor that doesn't have that feature or doesn't display it quite right anymore?

Lol.

deelly · 2023-09-15T07:21:39.000000Z

I`m not OP, but yes, I saw.

Thats my personal opinion, and nothing more:

Something like complex one time financial/workflow/maintenance operation that includes calling dozens of different smaller functions, but very well structured.

It does not make sense to separate it more into different functions, because execution is generally linear and having to deal with tree of calls where some calls is depends on state of the previous is become cumbersome and makes reading and making changes more complex.

Again, thats my personal feeling, and nothing more.

npteljes · 2023-09-15T07:49:05.000000Z

Going further, I'll take a 1000-line shitty code, over split-to-small-functions shitty code. In the long code, all I have to think about is the code. With the functions, I have to pay attention to what calls what, also also because the code is shitty, surely the function names also are, adding two things at the same time to the confusion mix.

Gibbon1 · 2023-09-15T08:56:34.000000Z

Even better instead of a interrupt driven state machine implemented as a switch statement that linearly progresses. Do it using 20-25 small chained callbacks spread over a couple of files.

Bonus: Uncle Bob teaches us not to use comments.

andrewprock · 2023-09-15T14:49:43.000000Z

This sounds like a nightmare. One that I've lived through.

xorcist · 2023-09-15T17:08:35.000000Z

It is easy to nod along when someone speaks about different styles. But there are also a few objective truths down there, and it makes sense to try to identify them.

For example, I have been at this for over three decades now, and there are some things that almost never fails. From the article, the kind of person who advocates for the more "testable" code with a few more lines and more abstractions, is never the same person who can maintain that codebase a handful years later.

That should tell us something. For what it's worth, I agree with the article that simpler is better, which often coincides with fewer lines of code. I personally wouldn't have chosen objects that look like "pizza.Sliced = box.SlicePizza()" but most of the time the structure is already in place and it is best to go along with it.

As to that 1000 line function, if it is in an imperative style it might well be the easiest form to read. Have you seen the Python source code? That language success owes to a simple interpreter with ginormous functions that anyone and their brother can read from top to bottom and dare modify without having the brain the size of a planet.

naikrovek · 2023-09-15T13:38:18.000000Z

> In this case I hope nobody is proposing a single 1000-line god function.

this made me feel a certain type of way. (dont ever look at video game source code, by the way; 1000-lines is quite short by some standards)

if a 1000-line long main is what makes sense then you should do that.

I find 1000-line long methods which are linear far easier to read than code which has every method call broken out into its own method. it's so bad I literally can't read JavaScript that is written in the contemporary style anymore. absolutely impenetrable for me.

it's true that I am not a "real" developer in that I don't work on code full-time, but I've written probably millions of lines of code in my 30-year career. I am not a novice.

if the solution calls for a 1000-line main method, then that's what I'm writing, "best practices" can go in the corner and cry. I'm writing what I need to solve the problem and nothing more.

wouldbecouldbe · 2023-09-15T09:12:12.000000Z

My biggest pain is Javascript developers who get to high on Java concepts, most often after using NestJS. Providers, Models, Services and what not.

Remember an import script I wrote in ExpressJs. Was like 50 lines. Did things like copy databases, clean up config etc. There were hardly any layered ifs, just steps, I didn't see much use in breaking it up, was easy to read.

Another developer, who was smart but liked abstract concepts, overenginered the hell out of it, moving it to 20 places, a bunch of provider, and I could never find & make sense out of it after that, was very hard to read was going on. Was such a pain always to update it.

DanielHB · 2023-09-15T11:21:50.000000Z

The main reason I have a distaste for dependency injection is because of this, promotes separating code into multiple places and over-abstracting things, making code hard to follow. Most of the times it is not worth the trade-off.

Doing module mocking for unit tests instead of dependency injection in runtime code is almost always a better idea in my opinion. Dependency injection was invented for languages that can't do module mocking.

westurner · 2023-09-15T17:06:46.000000Z

> So where do we split things?

Cyclomatic complexity: https://en.wikipedia.org/wiki/Cyclomatic_complexity

Overhead: https://en.wikipedia.org/wiki/Overhead_(computing)

Some programming language implementations and operating systems have more overhead for function calls, green threads, threads, and processes.

If each function call creates a new scope, and it's not a stackless language implementation, there's probably a hashmap/dict/object for each function call unless TCO Tail-Call Optimization has occurred.

Though, function call overhead may be less important than Readability and Maintainability

The compiler or interpreter can in some cases minimize e.g. function call overhead with a second pass or "peephole optimization".

Peephole optimization: https://em.wikipedia.org/wiki/Peephole_optimization

Code linting tools measure [McCabe,] Cyclomatic Complexity but not Algorithmic Complexity (or the overhead of O(1) lookup after data structure initialization).

dsego · 2023-09-15T07:01:28.000000Z

> Also iteration. Just because the first place you tried to carve an abstraction didn’t work well, doesn’t mean you give up on abstractions;

C. Muratori calls this method "semantic compression" . https://caseymuratori.com/blog_0015

djur · 2023-09-15T07:28:30.000000Z

What's described there is what I understand DRY ("don't repeat yourself") and the associated "rule of three" to mean.

ed_blackburn · 2023-09-15T09:01:03.000000Z

As soon as conversations stray into lines of code etc. I think we've veered directly into Goodhart's Law.

pvillano · 2023-09-15T18:51:57.000000Z

Sometimes I use an anonymous scope instead of extracting a single use function. This is especially nice when you would otherwise have many parameters/returns

bsuvc · 2023-09-15T03:08:44.000000Z

The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.

I think you have to consider things like reusability and unit-test-ability as well, and having all your code in a single function can make reasoning about it more difficult due to all the local variables in scope that you need to consider as possibly (maybe or maybe not) relevant to the block of code you’re reading.

That being said, when I look back on my younger, less experienced days, I often fell into the trap of over-refactoring perfectly fine linear code into something more modular, yet less maintainable due to all the jumping around. There is something to be said for leaving the code as you initially wrote it, because it is closer to how your mind was thinking at the time, and how a readers mind will also probably be interpreting the code as well. When you over-refactor, that can be lost.

So I guess in summary, this is one of those “programming is a craft” things, where experience helps you determine what is right in a situation.

laserbeam · 2023-09-15T03:39:41.000000Z

> The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.

One of the best reviewed functions I wrote at work is a 2000 line monster with 9 separate variable scopes (stages) written in a linear style. It had one purpose and one purpose only. It was supposed to convert from some individual html pages used in one corner of our app on one platform into a carousell that faked the native feel of another platform. We only needed that in one place and the whole process was incredibly specific to that platform and that corner of the app.

You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them. Yet, each step had subtle assumptions about what happened before. The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own... For code that's barely ever needed elsewhere. We even had some code that was similar to some of the middle parta of the process... But just slightly didn't fit here. Changing that code caused other aspects of our software to fail.

The method was not any less debuggable, it still had end to end tests, none of the intermediate steps leaked state outside of the function. In fact 2 other devs contributed fixes over time. It worked really well. Not to mention that it was fast to write.

Linear code scales well and solves problems. You don't always want that but it sure as hell makes life easier in more contexts than you'd expect.

Note. Initial reactions to the 2000 line monster were not positive. But, spend 5 minutes with the function, and yeah... You couldn't really find practical flaws, just fears that didn't really manifest once you had a couple tests for it.

saurik · 2023-09-15T04:09:13.000000Z

I don't know if it is still like this, but the code for dpkg used to be like this, and it was amazing: if you ever needed to know in exactly what order various side effects of installing a package happened in, you could just scroll through the one function and it was obvious.

To this end, I'd say it is important to be working in a language that avoids messing up the logic with boiler plate, or building some kind of mechanism (as dpkg did) to ease error handling and shove it out of the main flow; this is where the happy path shines: when it reads like a specification.

realrains · 2023-09-15T03:59:22.000000Z

I don't think the fact that a function works well is a good enough reason to write a 2000 line function. Sometimes there are long pieces of code that implement complex algorithms that are difficult to break into smaller pieces of code, but those cases are limited to the few you mentioned.

BigJono · 2023-09-15T05:54:30.000000Z

Computers execute code in a linear fashion, why on earth would you "need a reason" to NOT abstract something? Just because abstraction is often the right thing to do doesn't make it the base case.

It's like saying you need a reason not to add 4000 random jumps in your assembly code just to make it more difficult to read...

ahtihn · 2023-09-15T06:08:56.000000Z

Source code isn't written to be executed by computers, it's written to be read by other humans.

Source code tends to be very far removed from how computers execute anything, so I wouldn't use that as a justification for any sort of code style.

amoss · 2023-09-15T07:08:50.000000Z

> Source code isn't written to be executed by computers, it's written to be read by other humans.

It is pronounced "documentation".

nomel · 2023-09-15T06:30:04.000000Z

> that implement complex algorithms that are difficult to break into smaller pieces of code

My longest code is always image processing. It's usually too hard to break up for the sake of breaking up. There's nothing to reuse between the calls to filters/whatever.

flohofwoe · 2023-09-15T06:42:52.000000Z

The default should be reversed, don't break into smaller pieces unless there's a really good reason.

coldtea · 2023-09-15T07:38:26.000000Z

>I don't think the fact that a function works well is a good enough reason to write a 2000 line function.

The fact that it works well and reads well (when it does, as in the parent's case), is.

Aside from those factors what else would be against it? Dogma?

osigurdson · 2023-09-15T05:57:15.000000Z

I guess all we know is there were 2K lines of code and the commenter thinks that was the right way to do it. It would be necessary to see the code to appropriately critique it.

goatlover · 2023-09-15T06:35:54.000000Z

Not just the commenter, but his team as well. It passed code review with flying colors, apparently. The moral of the story is that there always exceptions and developers should not be ideologically committed to one approach above all else.

em-bee · 2023-09-15T11:07:13.000000Z

we know more than that: You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them. Yet, each step had subtle assumptions about what happened before.

what we don't know is if it would have been possible to abstract those assumptions away so that functions could have been defined without them.

PH95VuimJjqBqy · 2023-09-16T17:52:14.000000Z

We do know that if we trust the poster, they said very clearly it could have been done but they didn't consider the value to outweigh the downsides.

em-bee · 2023-09-17T00:00:58.000000Z

yes, i meant we don't know if it would have been possible to extract functions in such a way that they are actually safely reusable.

osigurdson · 2023-09-15T14:08:07.000000Z

Even the contrived example in the post can be factored differently (and better imo). How do we know those 9 scopes are appropriate?

RHSeeger · 2023-09-15T04:36:40.000000Z

>The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own

Why? Why can't the functions say "to be used by <this other function>, makes assumptions based on that function, do not use externally"? Breaking out code into a function so that the place it came from is easier to maintain... does not mandate that the code broken out needs to be "general purpose".

laserbeam · 2023-09-15T05:04:47.000000Z

Specifically, in that place, there was no need. And prematurely splitting it would have caused us to overthink and over generalize. Having a long, linear and tested function was a better choice.

auggierose · 2023-09-15T05:53:23.000000Z

I understand your point, but perhaps that would have simply been an opportunity to refine your approach to code design. If such a situation leads to excessive deliberation and overgeneralisation, your code base must be riddled with unnecessary overthinking and overgeneralisation.

goatlover · 2023-09-15T06:38:10.000000Z

Or maybe it was just a long, sequential algorithm where breaking it up wouldn't have been an improvement.

auggierose · 2023-09-15T07:56:13.000000Z

I have been programming for more than 30 years. Except for code generated explicitly to be only consumed by machine, I've never come across a function consisting of 2000 lines of code that should not have been broken up. Something is wrong there, and if you show me the code, I'll tell you what's wrong with it.

wiseowise · 2023-09-15T06:48:48.000000Z

Glad you can see that without even looking at the code.

auggierose · 2023-09-15T07:58:12.000000Z

Some things you don't have to see to know whats going on. Function with 2000 lines of code? Have fun rationalising this.

waynesonfire · 2023-09-15T03:46:40.000000Z

I worked with an engineer that wrote the most clear and elegant linear code. It was remarkable, never seen anything like it since. I can't reproduce it but I do have an idea of what a well designed linear function looks like.. a story.

eep_social · 2023-09-15T06:34:13.000000Z

I was just thinking that if I _needed_ to refactor this I might structure the stages as chapters in a book. One might be able to write an inner class or some such that had a “table of contents” function that called each stage in sequence as a void function with data managed out of line, maybe via cleverly designed singleton structs. Then the code itself can be written in order with minimal boilerplate between stage boundaries.

I think I’ve worked with some Python that looked and worked this way. I can’t place the details but probably in a processor pipeline running over a particularly hairy data format. Consider ancient specifications written by engineers talking on the phone encapsulated in relatively “modern” but still vintage specifications, sometimes involving screen-scraping a green screen mainframe terminal, wrapped in XML and sent over the internet. Anyway, point is I couldn’t agree more about stories.

laserbeam · 2023-09-15T05:23:09.000000Z

I will agree that it takes some skill, not that I am great at it. It's a different kind of skill than abstraction. Reading error handling in c code offered good insights for me to learn linearity better (c code that uses goto to jump to the end of a function for cleanup when an error occurs, for example).

However, if you screw up linear code, you screw up locally. If you write poor small functions, the rest of the team screws up because they barely ever read the contents of your functions that call other functions that call other functions. I've had way more problems with stuff being called slightly out of order, than with large functions.

yxhuvud · 2023-09-15T05:19:37.000000Z

That is true of well designed nonlinear code as well.The code needs to tell a story or it will be a mess.

osigurdson · 2023-09-15T05:54:36.000000Z

You don't have to write tests to prove that private methods work on their own. Just test the public behaviour.

koonsolo · 2023-09-15T06:00:35.000000Z

At first I thought how horrible, but basically you have sort of 9 functions within the same scope, each having a docstring. So I guess not too different from splitting them up.

I read you have "end to end" tests.

One question though: Wouldn't each part benefit for having their own unit tests?

laserbeam · 2023-09-15T06:14:05.000000Z

Maybe, maybe not. For our particular case it would have been mostly wasted effort.

I found that I like to write tests at the level of abstraction I want to keep an implementation stable. I'd be totally fine if someone went in and changed the implementation details of that long process if needed. We cared that stuff got cleaned up at the end of the process, that the output matched certain criteria, that certain user interaction was triggered and so on... In that case it made more sense to test all our expectations for a larger scope of code, rather than "fix" the implementation details.

Tests usually "fix" expectations so they don't change from build to build. Tests don't ensure correctness, they ensure stuff doesn't alter unexpectedly.

PeterisP · 2023-09-15T12:00:36.000000Z

Tests effectively freeze requirements; you should test those things which should be preserved throughout any changes, and not test those things which should be open to change. In this case, it seems that is no real requirements for any of these 9 steps - perhaps the implementer could figure out how to do the same outcome by skipping a step or merging two steps, and the existence of unit tests for these 9 functions somehow encodes the idea that these 9 functions each are inherently needed, which is not necessarily true.

coldtea · 2023-09-15T07:40:34.000000Z

>One question though: Wouldn't each part benefit for having their own unit tests?

Not necessarily better, especially since this allows for the case where individual unit tests pass fine, but the combined logic fails.

BenFrantzDale · 2023-09-15T11:41:46.000000Z

If the sub-functions could be reused and people would be tempted to change them, then that’s what your tests are for. In fact, it’s often tricky to test the sun-function logic without pulling them out because to write the test you have to figure out how to trick the outer function to get into certain states. Follow the Beyoncé rule: if you like it: put a test on it. Otherwise it’s on you if someone breaks it.

emodendroket · 2023-09-15T05:47:28.000000Z

> You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them.

Good thinking. Now they’ll just add 50 flags and ten levels of nested ifs instead which is much simpler.

patrulek · 2023-09-15T06:01:02.000000Z

2000 lines is like a small project. I cant imagine putting that all in one function.

reactordev · 2023-09-15T04:10:24.000000Z

>”but then devs would be tempted to reuse them”

Isn’t that the fucking point? Having a 2000 line function is a code smell so bad, I don’t care how well the function works. It’s an automatic review fail in my book. Abstractions, closures, scope, and most importantly - docs to make sure others use your functions the way you intended them. Jesus.

laserbeam · 2023-09-15T05:10:10.000000Z

Some devs did find it a code smell... But each scope had a clear short high level comment describing what it did, there were end to end tests for the method, and very little state flowed from scope to scope (some did) - because that's what scoprs do... Prevent variables from leaking.

My point is the code smell isn't always accurate, and there are times and even for 2000 line monsters other devs agreed that it was the best way to hide complexity away from the rest of the codebase in that case. If we ever needed to factor things out (we never did), we could spend some effort and do it.

okaleniuk · 2023-09-15T06:05:38.000000Z

Have you tried reading code instead of smelling it?

MrPatan · 2023-09-15T05:57:28.000000Z

A code smell means you should look into it, not that it's wrong.

Some things are genuinely 2kloc-complex. Maybe not that many. Do check! But some are.

laserbeam · 2023-09-15T06:28:01.000000Z

Definitely not that many. Even for me this was an outlier, but it made me more comfortable with functions most people would consider long.

I'd like to clarify this was not necessarily 2kloc-complex, this was just 2kloc-long-and-not-really-meant-to-be-reused. It was a fairly long but linear process that was out of the ordinary for the rest of the codebase. It could easily have been split (hell, I had 9 fairly separate stages), but calling any of the intermediate stages out of order or without the context of the rest of the execution flow... would have been a foot gun for someone else. And, as time showed, we never needed those stages for anything else.

turdprincess · 2023-09-15T04:27:43.000000Z

Agreed. I’ve written plenty of software of all kinds and have never had to write a 2000 line long methods (although I have had the joy of refactoring such messeses a time or two).

Just don’t do that. Your code doesn’t have to have abstractions out the wazzo, but if your class (or method) is getting bigger than 1000 lines that’s a great sign that it’s doing too much and abstractions can be teased out. Your future self will thank you, as well as your team.

crabmusket · 2023-09-15T05:49:32.000000Z

I like this from Sandi Metz:

> You can't create the right abstraction until you fully understand the code, but the existence of the wrong abstraction may prevent you from ever doing so. This suggests that you should not reach for abstractions, but instead, you should resist them until they absolutely insist upon being created.

turdprincess · 2023-09-15T20:55:38.000000Z

At least in the mobile world, I find that this “no abstraction” approach is the default one, and it usually leads to huge objects which do everything from drawing views to making network requests to interacting with the file system. These kinds of classes are quite hard to work in, hard to test, and also keep snowballing to get bigger and bigger. Things usually end with unmaintainable code and a full rewrite.

I am not saying you need to create complex abstract hiarchies right off the bat. But usually, it’s pretty easy to tease out a couple significant abstractions that are very obvious, and break down your classes by a factor of two or three. Just getting such low hanging fruit will prevent you from ever having a 2000 line long method.

And for the folks who are saying that they make sure to not add abstractions too early - are you disciplined enough to go back and add them later? I feel like if you’re the kind of engineer that busts out 2000 line methods, you’re also not going to refactor it as this method grows to 2500 or 3000 lines or beyond.

Probably most robust software you depend on is full of solid, quality abstractions. Learning to write code like this takes practice. The wrong abstraction might be wrong, but it’s one step closer on your journey to growing as an engineer. You won’t grow if you never try.

whywhywouldyou · 2023-09-15T03:27:34.000000Z

So where's the proof that the function'd code scales? As the complexity of the overall code grows, so would something that gets chopped into dozens of functions to the point of being unreadable.

Suddenly, you realize that the dozens of functions __need to be called in specific orders__, and they are each only ever used once. So really what you're doing is forcing someone to know the magic order these functions are composed in order for them to be of any use.

harpiaharpyja · 2023-09-15T03:46:27.000000Z

The truth is that either one can be done wrong.

Unfortunately organizing your code along the right lines of abstraction is something that just takes skill and can't easily be summarized in the form of "just always do this and your code will be better"

If you organize your code into units that are easy to recompose and remix, well you get huge benefits when you want recompose and remix things.

If you organize your code into units that can't be easily recomposed, then yes you've added complexity for no benefit. But why make units that can't be treated individually?

"As the complexity of the overall code grows, so would something that gets chopped into dozens of functions to the point of being unreadable."

So the answer to this is, "don't chop it into functions in a way that leaves it unreadable, instead chop it into functions in a way that leaves it more readable."

That may be unsatisfying, but it gets to the point that blindly applying rules is not always going to lead to better code. But it doesn't mean that an approach has no value.

erhaetherth · 2023-09-15T04:38:36.000000Z

There's an easier approach that will also aid you in telling you how to precisely chop up your function.

Simply don't chop up your function until you need a slice of it somewhere else. Then refactor out the bit you need. You'll find out exactly which bits need to be replaced with variables and exactly where the slice needs to happen.

atq2119 · 2023-09-15T05:17:03.000000Z

This is the correct answer right here if you have a good enough team. It is still the way I want to work. Unfortunately, I find that there are too many developers who haven't learned that you should always be considering to "refactor as you go". I'm trying to teach by example, but it's an uphill battle.

pstuart · 2023-09-15T06:13:10.000000Z

Exactly. Start with the straight-ahead linear approach and factor out once it's unwieldy.

Same thing for copy pasta funcs -- the first copy is fine, the second one may be too, but after that consider extracting to a parameterized func (a permutation of the Go Proverb "A little copying is better than a little dependency.")

kaba0 · 2023-09-15T15:38:28.000000Z

A single use function absolutely makes sense - you are effectively naming a block of code in some way, documenting it.

Guvante · 2023-09-15T03:49:32.000000Z

The API shouldn't be that. Expose something easy to use. That is the point of abstractions. It doesn't matter if there are a dozen methods called in order if those dozen methods are called by a helper method, beyond maybe some implementation details.

Really the question should always come up when there are more than say two ways to do things. If I can make a pizza from scratch, reheat a chilled pizza, create a pizza and chill it, reheat a half dozen pizzas, or make three pizzas of the same kind and chill them suddenly the useful abstractions are probably something you can figure out between those helper methods.

Honestly that is the real fear of the left way of thinking. If you add a quantity, whether to cook and whether to chill parameters you end up with a hard API where certain combinations of parameters don't make sense.

Have a clean API and make the implementation as simple as is feasible. Reuse via functions when it makes sense but don't add them willy nilly.

Aka "it is a craft and you figure things out" as someone said in the comments here

sunwukung · 2023-09-15T13:07:34.000000Z

I'm very dubious of anyone resorting to "readability" as a justification.

What you're doing by breaking things into functions is trying to prevent it's eventual growth into a bug infested behemoth. In my experience, nearly every case where an area of a code base has become unmaintainable - it generally originates in a large, stateful piece of code that started in this fashion.

Every one who works in said area then usually has the option of either a) making it worse by adding another block to tweak it's behaviour, or b) start splitting it up and hope they don't break stuff.

I don't want to see the "how" every time I need to understand the "what". In fact, that is going to force me to parse extraneous detail, possibly for hundreds of lines, until I find the bit that actually needs to be changed.

hbrn · 2023-09-15T18:48:01.000000Z

> What you're doing by breaking things into functions is trying to prevent it's eventual growth into a bug infested behemoth

Not every piece of code grows into a bug-infested behemoth. A lot of code doesn't grow for years. We're biased to think that every piece of code needs to "scale", but the reality is that most of it doesn't.

Instead of trying to fix issues in advance you should build a culture where issues are identified and fixed as they come up.

This piece of code will be a pain to maintain when the team gets bigger? So fix it when it actually gets bigger. Create space for engineers to talk about their pains and give them time to address those. Don't assume you know all their future pains and fix them in advance.

> In my experience, nearly every case where an area of a code base has become unmaintainable - it generally originates in a large, stateful piece of code that started in this fashion

In my experience it gets even worse with tons of prematurely-abstracted functions. Identifying and fixing large blocks of code that are hard to maintain is way easier that identifying and fixing premature abstractions. If you have to choose between the two (and you typically do), you should always choose large blocks of code.

The great thing about big blocks of code is that their flaws are so obvious. Which means they are easy to fix when the time comes. The skill every team desperately needs is identifying when the time comes, not writing code that scales from scratch (which is simply impossible).

lelanthran · 2023-09-15T04:31:51.000000Z

> Suddenly, you realize that the dozens of functions __need to be called in specific orders__, and they are each only ever used once. So really what you're doing is forcing someone to know the magic order these functions are composed in order for them to be of any use.

That's where nested functions show their true utility. You get short linear logic because everything is in functions, but the functions are all local scope so you get to modify local scope with them, and because the functions are all named, it is easy to determine what is going on.

professoretc · 2023-09-15T03:33:41.000000Z

In a decent programming language you can nest functions, so all the little functions that make up some larger unit of the program are contained within (and can only be called within) that outer function. They serve less as functions to be called and more just as names attached to bits of code. And since they can't be called anywhere else, other people don't need to worry about them unless they're working on that specific part of the program.

lenkite · 2023-09-15T07:15:38.000000Z

If you have dozens of functions that need to be called in specific orders, design and use a state machine and then use a dispatch function that orchestrates the state machine.

dfee · 2023-09-15T03:30:34.000000Z

Dozens of functions need to be called in a specific order?

Oh my God.

rramadass · 2023-09-15T04:07:35.000000Z

> but the idea doesn’t scale.

You are wrong here.

> this is one of those “programming is a craft” things, where experience helps you determine what is right in a situation.

You are right here.

The key insight on why giant linear functions are often more readable (and desirable) is because they allow you to keep more concepts/relationships simultaneously together as a single chunk without context switching which seems to aid our comprehension. An extreme proponent is Arthur Whitney (inventor of the K language) who writes very terse (almost incomprehensible to others) code so as to accommodate as much as possible in a single screen.

Two examples from my own experience;

1) I found reading/understanding/debugging a very large Windows message handler function (i.e. a WndProc with a giant switch statement containing all the business logic) far easier than the same application rewritten in Visual C++ where the message handlers were broken out into separate functions.

2) The sample code for a microcontroller showed an ADC usage example in two different ways; One with everything in the same file and another where the code was distributed across files eg. main.c/config.c/interrupts.c/timer.c/etc. Even though the LOC was <200 i found the second example hard to understand simply because of the context switch involved.

roelschroeven · 2023-09-15T12:43:27.000000Z

> The key insight on why giant linear functions are often more readable (and desirable) is because they allow you to keep more concepts/relationships simultaneously together as a single chunk without context switching which seems to aid our comprehension.

The problem with giant linear functions is that those concepts get separated by sometimes thousands of lines. Separating out the high-level concepts vs the nitty-gritty details, putting the latter in functions that then get called to implement the high-level concepts, does in my experience in most cases a better job of keeping related things together.

rramadass · 2023-09-15T13:41:33.000000Z

See my other comment here : https://news.ycombinator.com/item?id=37522577

The issue is one of Policy vs. Mechanism - https://en.wikipedia.org/wiki/Separation_of_mechanism_and_po...

It is "Mechanism" which should be separated out and encapsulated while "Policy" (aka business logic) is what is better centralized as a linear (possibly large) function.

RHSeeger · 2023-09-15T04:40:21.000000Z

YEAH, but the moral that should be taken from that is not "it's always better to write huge, linear functions". Rather, "there are cases where huge, linear functions make sense because of the way the code needs to interact with things". Along the same lines, there are cases where breaking the code up into smaller functions, and calling them from the main function, makes more sense".

> Linear code is more readable

^ Wrong

> Linear code is sometimes more readable

^ Better

rramadass · 2023-09-15T13:24:34.000000Z

Not quite what i was trying to convey. Linear code actually has a lower burden on one's cognitive load and thus easier to comprehend. But of course it breaks down at certain sizes (variable) which is when it makes sense to partition it into smaller pieces and apply Abstraction etc.

See for example Cyclomatic Complexity - https://en.wikipedia.org/wiki/Cyclomatic_complexity

RHSeeger · 2023-09-15T15:44:07.000000Z

> Linear code actually has a lower burden on one's cognitive load and thus easier to comprehend. But of course it breaks down at certain sizes ...

I agree with this, but I think the combination of those two sentences winds up being "linear code is sometimes easier to comprehend, and sometimes not". The statement "linear code is easier to comprehend" is misleading. Your statement makes it seem like "at certain sizes" is the edge case; whereas, in my opinion, it's the only case that really matters. For a small enough block of code, "easier to comprehend" becomes a moot point.

> See for example Cyclomatic Complexity

I think that's only tangentially related. Cyclomatic Complexity deals with branching, which is somewhat orthogonal to refactoring out code to separate functions (though refactoring can make the branching easier to read, since it shows more in a smaller area).

anonzzzies · 2023-09-15T06:14:52.000000Z

> an extreme proponent is Arthur Whitney (inventor of the K language) who writes very terse (almost incomprehensible to others) code

But k has a small set of built-in commands and a built-in database; it was made for fast analysis of stock information, so with that you have everything you need and you use the same semantics. The only thing you need to know is the data structure and you can build whatever you need.

So in this way, it's very likely that, given two tables A + B and 'bunch of operations' X on A and 'bunch of operations Y' on B where Y depends on the result of X, and given the tasks to;

- create X' = X

- create XY' = X + Y

to implement XY without knowing X already exists rather than figure out X exists and reuse it.

The problem with not k (or programs written in similar style; it doesn't really matter what the programming language is), that we have learned to use the second style from the article, and, more extreme, to separate everything out in layers. You cannot even reach the data model without going through a layer (or more) of abstractions which makes it necessary not only to know the datamodel in detail but also find the matching findXinAandApplyWithYToB(). Where X & Y & A & B are often some kind of ambiguous and badly named entities. And then there is of course badly designed databases which is also quite the norm as far as we see, so there is a much lower data integrity which means that if you create something without checking all the code that touches it, that you might change something and the data becomes inconsistent.

I notice the same when working on systems built with stored procedures on MSSQL/Postgres; it is far quicker to oversee and (at least basically) understand the datamodel (even with 1000+ tables, which is rather normal for systems we work with) than it is to understand even a fraction of a, let's say Go, codebase. So when asked to do do a task XY', you are usually just not searching for X'; you are simply reading the data used in X & Y and whop up a procedure/query/whatever yourself. It's simply much faster as you have a restricted work surface; the model and sql (I know, you can use almost any language in postgres, but let's not here) and you can reason about them and the tasks at hand when you shut off internet and just use your sql workbench.

rramadass · 2023-09-15T13:07:36.000000Z

Nice post. If i understand you correctly; you are saying that K is specialized enough (my knowledge is only cursory here) that you can directly work with the data model easily rather than going through multiple layers of abstractions and hence linear code is normal. In other languages it may not be so easy to do and multiple layers of abstractions only make things harder to comprehend. True, IMO Abstraction should always follow Understanding of the Problem space and not some arbitrary dogma. What i find infuriating nowadays is "cargo culting" where people blindly follow something because they read it somewhere/listened to somebody without thinking through the motivations involved and whether it is applicable to their current problem. In other words "they don't think" for themselves. Examples are "OO is bad"(it is not), Agile/Scrum processes will magically solve all your PM problems(hell, no!), Using the latest library/framework/API/fad will magically make your system better (no!), etc. etc.

anonzzzies · 2023-09-15T13:24:28.000000Z

> If i understand you correctly; you are saying that K is specialized enough (my knowledge is only cursory here) that you can directly work with the data model easily rather than going through multiple layers of abstractions and hence linear code is normal.

Yes, it is often just easier to write the linear code than figure out if you can reuse anything because the space is small. I think a good 'feeling' for this is, if you need internet search/package managers/copilot etc for something or if you can just write working code sitting on a desert island, quite possibly on paper. For instance, for C, asm (arm/68k/z80/8080 and older intel), k and some others I can write working code like that for non-toy applications in specific domains. And, at least for me, those languages lend themselves very well for this linear programming. Incidentally, but not related, this is for me also the most enjoyable way of programming; I kind of really hate using libraries. That's also because we work in heavy regulatory areas where you cannot just add them; you have to inspect them and sign of on them and, of course, most of them are terrible...

rramadass · 2023-09-15T14:01:45.000000Z

Nice; you have found a niche work domain for yourself which you seem to enjoy.

May we all be so lucky :-)

PS: You might want to consider adding your contact info. to your profile.

starbugs · 2023-09-15T05:55:30.000000Z

I have seen many instances where people just out of habbit factor out a lot of linear code that will never be reused into separate functions.

These pieces of code then often end up being private functions of a class. With state. Since they are private functions now, they are not really testable.

So now we got a lot of private functions that are only called once and typically modify side effect state. When these functions are grouped together with the caller, it is actually still a bit readable in simple cases.

But then after a while someone adds other functions in between the calling function and the factored out ones.

Now we have bits and pieces modifying different side effect state that no one knows if they are called from different places without getting a call graph or doing a search in the class file.

If you insist on making the code non-linear, I'd beg you to at least consider making these factored out private funcs inner funcs of the calling function if your language supports that. This makes it clear that these functions won't be called from anywhere else.

As with so many things in life, in a real codebase this is not an either/or, but an art of combining the two into something that stays readable and maintainable.

matsemann · 2023-09-15T06:05:27.000000Z

If the function was truly linear having a long function wouldn't be so bad. But it actually isn't, the example contains multiple branches!

Will people bother testing all of them? Or will they write a single test, pass in a pizza and just glance at it actually working? My guess is the latter, as testing multiple branches from outside is often tedious, vs testing smaller specialized functions.

flohofwoe · 2023-09-15T06:40:51.000000Z

> The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.

...that's basically why common sense and taste in programming is still required, it's not a purely mechanical task. That's also why I'm not entirely a fan of automatic code formatting tools, they don't understand the concept of nuance.

s17n · 2023-09-15T04:14:06.000000Z

Everyone saying "linear code doesn't scale" actually has it backwards - it's concise functions with a deeply nested call stack that really becomes a nightmare in large codebases. It's never obvious where new code should be added, the difficulty of understanding what the effects of your changes will be increases exponentially since you have to trace all the possible ways code can get called, you end up with duplicated subroutines, etc etc.

99% of the time, you haven't actually come up with a good abstraction, so just write some linear code. Prefer copy/pasting to dubious function semantics.

gorgoiler · 2023-09-15T04:26:19.000000Z

Another risk is if you add print_table() then someone else is going to find it and use it in their code, but also add a little flag to adjust the output for their use case.

12 months later you have:

  print_table(
    rows,
    headers = None,
    is_unicode = False,
    left_align = False,
    align = [],
    remove_emoji = None,
    max_width = 80,
    potato_mode = 7,
    _debug_frontend = not FLAGS.dont_debug,
    ellipsis_for = 0,
    no_print = False,
  )

ncann · 2023-09-15T06:46:04.000000Z

I think we all know at least some functions like this in a code base. All it takes is for a newcomer to come across a complex function that they need to update some logics for but also don't understand it enough to refactor, so they just added some parameters with default values and call it a day.

> no_print = False

love this

tonyedgecombe · 2023-09-15T10:06:46.000000Z

I just had to implement potato_mode in a report, what a rabbit hole that turned out to be.

reedf1 · 2023-09-15T08:03:03.000000Z

To play devil's advocate - what's the issue with this?

Is print_table() + print_table_without_emoji() better than print_table(remove_emoji= False)?

myrmidon · 2023-09-15T11:31:03.000000Z

The issue is that this approach is almost guaranteed to produce basically untestable code with a myriad of invalid/nonsensical/completely broken input combinations, and its impossible to refactor, too, because you don't even know which parts of the parameter space are actually ever needed, or even how they are supposed to interact.

Whenever function semantics need to change, everything degrades further because of refactoring uncertainties (=> you end up with even more parameters).

This will also be extremely resistant to optimization because even finding the "happy path" is non-trivial.

dahauns · 2023-09-15T13:51:14.000000Z

Honestly? Both reek of SRP violation. Why should print_table specifically care about emoji?

if needed remove the emoji, then print. if performance/table size is an issue, working via streams/generators/etc. should be on the (heh) table anyway.

But if you have conceded to being in quick&dir^H^Hpragmatic-land anyway, IMO both can be ok depending on the context.

leonseled · 2023-09-15T11:46:24.000000Z

As a fresh dev, I’d like to know the answer to this as well. Abstract to function w multiple params, abstract to multiple functions, no abstraction and keep as switch statement.

`print_table() + print_table_without_emoji()`

vs

`print_table(remove_emoji= False)`

vs

`switch table_name: case emoji: print(table) case no_emoji: print(table no emoji)`

layer8 · 2023-09-15T13:43:52.000000Z

If callers typically know statically whether they want emoji or not (in other words, if the parameter would typically be a literal true or false at the call site), then the parameterless version is better. (Note that you can still factor out common code from the two functions into a separate private function if you like. So the parameterless version doesn’t necessarily imply code duplication.) If, on the other hand, callers tend to be parameterized themselves on whether to use emojis or not, then the parameterized version is better, because it avoids having to make a case distinction at the call site.

PH95VuimJjqBqy · 2023-09-16T18:04:48.000000Z

It depends on the implementation, but in general you prefer the parameterless version because in theory a bug that shows up in print_table has less code it could be in (less surface area to debug).

Without understanding the implementation no one can truly say which is the better approach, but this idea of "surface area for bugs" is something that should be considered when approaching these types of decisions.

krembo · 2023-09-15T08:30:27.000000Z

This example looks totally legit to me as long as it preserves backward compatibility and doesnt add unnecessary junk flags

BenFrantzDale · 2023-09-15T11:45:03.000000Z

That’s what tests are for. And if `print_table` is factored properly then they won’t want to add flags, they’ll make a new function out of the pieces of `print_table` that has distinct behavior of its own.

corethree · 2023-09-15T04:26:14.000000Z

Well you're describing a readability problem. And you're essentially saying readability is what causes it not to scale.

If we consider the concepts orthogonally meaning we don't consider the fact that readability can influence scalability then "everyone" is fully correct. Linear code doesn't scale as well as modular code. The dichotomy is worth knowing and worth considering depending on the situation.

That being said I STILL disagree with you. Small functions do not cause readability issues if those functions are PURE. Meaning they don't touch state. That and you don't inject logic into your code, so explicitly minimize all dependency injection and passing functions to other functions.

Form a pipeline of pure functions passing only data to other functions then it all becomes readable and scalable. You'll much more rarely hit an issue where you have to rewrite your logic because of a design flaw. More often then not by composing pure functions your code becomes like legos. Every refactoring becomes more like re-configuring and recomposing existing primitives.

bcrosby95 · 2023-09-15T07:20:02.000000Z

I disagree. It's not the purity of the functions, its having to know the details of them. The details, which could have existed here, are now in two other places. If you need to figure out how a value is calculated, and you use a half dozen functions to come to that value, you now have a half dozen places you need to jump to within the codebase.

Small functions increase the chances of you having to do this. Larger ones decrease it, but can cause other issues.

Also, many small functions doesn't make code modular. Having well defined, focused interfaces (I don't mean in the OO sense) for people to use makes it modular. Small functions don't necessarily harm it, but if you're not really good at organizing things they definitely can obscure it.

tonyedgecombe · 2023-09-15T10:10:57.000000Z

I find how easy it is to name something is a pretty good indicator. If I'm struggling to name a function then it probably needs some more attention.

gen220 · 2023-09-15T14:36:43.000000Z

I think you’re right about side effects being the missing ingredient to this discussion, that is leading people to talk past each other. The pattern’s sometimes called “imperative shell, functional core”.

And I totally agree, this is how you write large code bases without making them unmaintainable.

Where to go “linear” vs “modular” is an important design choice, but it’s secondary to the design choice of where to embed state-altering features in your program tree.

I think people dislike modular code because they want to have all the “side-effects” visible in one function. Perhaps they’ve only worked in code bases where people have made poor choices in that regard.

But if you can guarantee and document things like purity, idempotency, etc, you can blissfully ignore implementation details most of the time (i.e. until performance becomes an issue), which is definitionally what allows a codebase to scale.

corethree · 2023-09-16T02:01:37.000000Z

Yeah few people have seen the light. But you're right. The only downside is performance. But this is rare and sparse.

dgunay · 2023-09-15T06:10:27.000000Z

The example code would be less distracting if it at least attempted to stick to the pizza metaphor in a meaningful way and weren't subpar Go code.

`prepare` is a horrible name for a function. I would expect a seasoned Gopher to call it something like `NewPizzaFromOrder`.

I don't see any reason for putting `addToppings` in its own function. If you have to have it, I personally would have made it a method on Pizza something like `func (p *Pizza) WithToppings(topping ...Topping) *Pizza { /* ... */ }`. Real pizza is mutable, so the method mutates the receiver.

Why is a new oven instantiated every time you want to bake a pizza? You should start with an oven you already have, then do `oven.Preheat()`, and then call call `oven.Bake(pizza)`. You can take this further by having `oven.Preheat()` return a newtype of Oven which exposes `.Bake()` so that you can't accidentally bake something without preheating the oven first. Maybe elsewhere `Baker` is an interface, and you have a `ToasterOven` implementation that does not require you to preheat before baking because it's just not as important.

Without changing the code, I'd also reorder the declarations to be more what you'd expect (so you don't have to jump up and down the page as you scan through functions that call each other).

IDK I have to leave now but there are just so, so many ways in which the code is already a deeply horrible example to even start picking apart the "which is more readable" debate.

tlarkworthy · 2023-09-15T08:29:47.000000Z

John carmack said much the same and I have been following it ever since. Of course linear code is easier to read, if follows the order of execution. It minimizes eye saccades.

Some code needs to be non-linear for reuse. Then execution is a graph. If you code does not exploit code reuse from a graph structure, do not bother introducing vertexes where a single edge suffices.

http://number-none.com/blow/blog/programming/2014/09/26/carm...

Symmetry · 2023-09-15T10:56:37.000000Z

Something Carmack calls out but the OP doesn't is that if you can break out logic with no side effects into its own function that's usually a good idea. I think the left side would have benefited from

  pizza.Toppings = get_pizza_toppings(order.kind)

in this case to keep the mutation of the pizza front and center in the main function here.

devjab · 2023-09-15T04:59:26.000000Z

I actually sort of agree that linear code is more readable, but that’s not what makes good code practices alone. So while good linear code is more readable, at least in my opinion, it’s also a lot less maintainable and testable. I have a few decades of experience now, I even work a side gig as an external examiner for CS students, and the only real world good practices I’ve seen over the years is keeping functions small. I know, I know, I grade students on a lot of things I don’t believe in. I’m not particularly fond of abstraction, or even avoiding code-duplication at all costs and so on, but “as close to single purpose” functions as you can get, do that, and the future will thank you for it.

Because what is going to happen when the code in those examples run in production over a decade is that each segment is going to change. If you’re lucky the comments will be updated as that happens, but they more than likely won’t. The unit test will also get more and more clunky as changes happen because it’s big and unwieldy, and maybe someone is going to forget to alter the part of it that wasn’t obviously tied to a change. The code will probably also become a lot less readable as time goes by, not by intend or even incompetence but mostly due to time pressure or other human things. So yes, it’s more readable, and in the perfect world you probably wouldn’t need to separate your concerns, but we live in a very imperfect world and the smaller and less responsibility you give your functions the easier it’ll be to deal with that imperfection as time goes on.

xmcqdpt2 · 2023-09-15T12:07:36.000000Z

Sure, it's less testable BUT in the specific case at hand it's all mutations that need to be performed in a specific sequence. IMO if you are taking an object through a specific set of states, you either split that and use types to mark the transitions (bakePizza takes a RawPizza and returns a BakedPizza, enforcing the order of calls at compile time) or you write one big function because it doesn't make sense to create a pizza and then not bake it before you box it.

I obviously prefer the former for readability, correctness, and testability etc. However, in most PL changing the type of an object involves creating a new object and has a runtime cost. For hot code path, it makes sense to mutate in place, but in that case it's better to keep it all in one linear function.

mtreis86 · 2023-09-15T05:06:57.000000Z

I recently started reading Sussman's Software Design for Flexibility and what you write is directly in line with that book https://mitpress.mit.edu/9780262045490/

Shoop · 2023-09-15T03:37:46.000000Z

Related email by John Carmack: http://number-none.com/blow/blog/programming/2014/09/26/carm...

Discussion: https://news.ycombinator.com/item?id=12120752

jonahx · 2023-09-15T03:29:24.000000Z

Hard agree. And I used to belong to the other camp.

The basic tension here is between locality [0], on the one hand, and the desire to clearly show the high-level "table of contents" view on the other. Locality is more important for readable code. As the article notes, the TOC view can be made clear enough with section comments.

There is another, even more important, reason to prefer the linear code: It is much easier to navigate a codebase writ large when the "chunks" (functions / classes / whatever your language mandates) roughly correspond to business use-cases. Otherwise your search space gets too big, and you have to "reconstruct" the whole from the pieces yourself. The code's structure should do that for you.

If a bunch of "stuff" is all related to one thing (signup, or purchase, or whatever), let it be one thing in the code. It will be much easier to find and change things. Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization.

[0] https://htmx.org/essays/locality-of-behaviour/

haberman · 2023-09-15T06:02:09.000000Z

I went the opposite direction: I used to be in the linear code camp, and now I'm in the "more functions" camp.

For me the biggest reason is state. The longer the function, the wider the scope of the local variables. Any code anywhere in the function can mutate any of the variables, and it's not immediately clear what the data flow is. More functions help scopes stay small, and data flow is more explicit.

A side benefit is that "more functions" helps keep indentation down.

At the same time, I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.

jonahx · 2023-09-15T07:04:27.000000Z

Some important context re: my style...

> Any code anywhere in the function can mutate any of the variables

Regardless of the language I'm using, I never mutate values. Counters in loops or some other hyper-local variables (for performance) might be the inconsequential exceptions to this rule.

> More functions help scopes stay small, and data flow is more explicit.

Just write your big function with local scope sections, if needed (another local exception to the rule above). Eg, in JS:

   let sectionReturnVal
   {
     // stuff that sets sectionReturnVal
   }

or even use IIFE to return the value and then you can use a const. "A function, you're cheating!" you might say, but my goal is not to avoid a particular language construct, but to maintain locality, and avoid unnecessary names and jumping around.

> A side benefit is that "more functions" helps keep indentation down.

This is important and I maintain it.

See "Align the happy path to the left" (https://medium.com/@matryer/line-of-sight-in-code-186dd7cdea...)

It is also worth noting that solving this problem with function extraction can often be a merely aesthetic improvement. That is, you will still need to keep hold the surrounding context (if not the state) in your head when reading the function to understand the whole picture, and the extraction makes that harder.

Using early returns correctly, by contrast, can actually alleviate working memory issues, since you can dismiss everything above as "handling validation and errors". That is, even though technically, no matter what you do, you are spidering down the branches of control flow, and therefore in some very specific context, the code organization can affect how much attention you need to pay to that context.

> I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.

Precisely, just take this thinking to its logical conclusion. You can (mostly) have your cake and eat it too.

_dain_ · 2023-09-15T06:53:48.000000Z

The better solution to this is to use nested functions that are immediately called, rather than top level functions. That lets you cordon off chunks of state while still keeping a linear order of definition and execution. And you don't have to worry about inadvertently increasing your API maintenance burden because people started to depend on those top level functions later.

jonahx · 2023-09-15T07:06:45.000000Z

Ha, I started writing my reply before seeing yours, and suggested almost the same thing.

ryanjshaw · 2023-09-15T03:34:52.000000Z

> Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization

What about for testing? What about for reducing state you need to keep in mind? What about releasing resources? What about understanding the impact of a change? Etc.

Consider an end of day process with 10 non-reusable steps that must run in order and each step is 100 lines. Each step uses similar data to the step before it so variables are similar but not the same. You would really choose a 1000 line single function?

jonahx · 2023-09-15T03:47:54.000000Z

> What about for testing?

For "use-case" code like this with many steps, you are typically testing how things wire together, and so will either be injecting mocks to unit test, in which case it is not a problem, or wanting to integration or e2e test, in which case it is also not a problem.

If complex, purely logical computation is part of the larger function, and you can pull that part out into a pure function which can be easily unit tested without mocks, that is indeed a valid factoring which I support, and an exception to the general rule.

> What about for reducing state you need to keep in mind?

Typically not a problem because if the function corresponds to a business use-case, you and everybody else is already thinking about it as "one thing".

> What about releasing resources?

Not a problem I have ever once run into with backend programming in garbage collected languages. Obviously if you are in a different situation, YMMV.

> Consider an end of day process with 30 non-reusable steps that must run in order and each step is 100 lines.

I would use my judgement and might break it down. Again, I have never encountered such a situation in many years of programming.

You seem to be trying to find the (ime) rare exceptions as if those disprove the general rule. But in practice the "explode your holistic function unnecessarily into 10 parts" is a much more common error than taking "don't break it down" too far.

syntheweave · 2023-09-15T04:01:57.000000Z

  let DebugFlags = {StepOne=false, StepTwo=false, StepThree=true};
  
  if (DebugFlags.StepOne) { ... }
  if (DebugFlags.StepTwo) { ... }
  if (DebugFlags.StepThree) { ... }

Your training in structured, DRY and OOP will recoil at this: More branches! Impossible. But your spec says "must run in order". It does this by design. Every resource can be tracked by reading it top to bottom, and the only way in which you can miss it is through a loop, which you can also aim to minimize usage of. The spec also says "uses similar data to the step before it". If variables are similar-not-same, enclose them in curly braces so that you get some scope guarding. The debug flags contain the information needed to generate whatever test data is necessary. They can alternately be organized as enumerated state instead of booleans: {All, TestOne, TestTwo, TestThree}.

Long, bespoke linear sequences can be hairy, but the tools to deal with them are present in current production languages without atomizing the code into tiny functions. Occasionally you can find a useful pattern that does call for a new function, and do a "harvest" on the code and get its size down. But you have to be patient with it before you have a good sense of where a new parameterized function gets the right effect, and where inlining and flagging an existing one will do better.

skybrian · 2023-09-15T03:42:29.000000Z

They both read linearly. In the version with smaller functions taken out, there's a table of contents at the top of the page and it summarizes the dataflow between the steps. It seems like an appealing read order, assuming you're going to read the whole thing.

For it to stay this readable, though, you'd need to move the functions around if you change the order of the steps. And that's fine if they're private functions, called only from the table of contents. Only, nothing forces you to keep them in order, or even to think about how it reads overall.

It often happens that functions start being reused in a way that can't be linearized anymore. Sometimes people give up and sort them alphabetically, or it's just random.

jvans · 2023-09-15T05:13:06.000000Z

In my experience the more familiar that someone is with the code, the more they think pushing code into smaller functions is the correct path. They have already built up a mental model of the code at hand, so the cleanest implementation to them is one with very few lines.

But the next person to come along has to bounce back and forth, performing mental stack push/pop operations to create the same mental model which is much harder to do when you don't have any of the original context

sfn42 · 2023-09-15T07:50:37.000000Z

Not if the code makes sense. If the code is well written with elegant abstractions, slim interfaces and decent documentation, you often don't need to bounce around that much. For example how often do you read source code of your language's standard library? I almost never do, I mostly just look at method signatures and maybe read some docs if it's a bit complex or new.

The whole point of interfaces is that you're not supposed to care how a method is implemented, only what it does which is explained by a combination of context, naming and documentation. But a lot of devs don't understand(or care about) this, so they write code that doesn't make sense and then it doesn't matter whether they made it linear or modular. They do things like make a service class where you have to call one method to get some data and then you have to call another to get some other data and then you have to call a third method to get some data that needs to be consolidated with the other two and now what the hell is the point of your service? It exposes all the internal complexity to the outside.

You aren't supposed to force small methods, there's no point having 20 ~5-line functions that are all only called once and do super specific stuff and have to be called in the right order etc. That's not clean code, that's more like cargo cult programming. You are supposed to abstract things appropriately so that they make sense both to new and seasoned team members, are easy to reason about and hide complexity in places where the complexity makes sense.

This is not easy to do but it is possible.

Firadeoclus · 2023-09-15T16:20:38.000000Z

>The whole point of interfaces is that you're not supposed to care how a method is implemented, only what it does which is explained by a combination of context, naming and documentation.

There are, however, cases where code is a better explanation of "what it does" than naming and documentation. Both naming and documentation are hard and can become out-of-sync. Code is less ambiguous than natural language.

sfn42 · 2023-09-15T19:32:35.000000Z

Sure, I'm not saying nobody should ever read code. I'm just saying we should aspire to write code which does not need to be read to be understood at a high level. If you really need the low-level details they're there, I'm just saying I don't want to have to think about them if I can avoid it.

As an example I've never read the source code of any language's String implementation, but I've used them in many different languages.

I also don't like the "they can become out of sync" reasoning. That's like saying speed limits are pointless because people break them. If you change the code you update the name and comment. That's your job. I'm not saying you should document every class in your system like it's the Java standard library. A standard library doesn't change that much and its documentation is viewed by millions of devs so it makes sense to spend a lot of time documenting it.

That's the gold standard, it would be great if our code could be like that but it would be impractical given the frequency of change in most active development systems. So we find a middle ground, we focus our effort on the interfaces between subsystems. You section your codebase into subsystems so that the application's core can interact with the database without worrying about database details, or get some data from an API without worrying about whatever weird quirks the API has. You construct a subsystem around the API which handles all the API details so that your core can interact with the API without having to worry about auth or weird API quirks or the fact that the API entities have 50 properties and you only need 7 of them. You hide away all that stuff in a subsystem and then you design a nice and clean interface that the application core interacts with. If there are any implementation details that the consumer of the interface needs to know to use it, you document it.

Just try your best to make your subsystems usable without having to deep dive into them for implementation details every 5 minutes.

Tomis02 · 2023-09-15T13:24:15.000000Z

> The whole point of interfaces is that you're not supposed to care how a method is implemented

That's exactly how you end up with O(N^4) code. Your job is to care.

sfn42 · 2023-09-15T14:14:23.000000Z

Yeah sure let's reduce my advice to a hyperbolic niche situation.

This has nothing to do with performance. I'm explaining general rules for designing maintainable systems - it is possible to follow them and write performant code at the same time. It is also possible to break them if necessary. Though it usually isn't a problem at all, you're just reducing it to absurdity in order to make your point.

hnfong · 2023-09-15T16:21:26.000000Z

BS. You wouldn't even get past your first hello world if you read all the code that underlies a one liner hello world.

weatherlight · 2023-09-15T15:13:55.000000Z

So you read the machine code that your code spits out? how interesting...

hnfong · 2023-09-15T16:25:55.000000Z

That's not quite enough. The kernel and associated device drivers are also "interfaces" that a program binary invokes, so all the kernel code paths triggered by the program calling into the kernel should be inspected too.

That said pretty sure the GP hasn't had a deep dive into whether say the C library or Linux kernel has some funny O(n^4) stuff happening.

mcv · 2023-09-15T09:43:35.000000Z

I disagree, but maybe there's a difference between people who read/think bottom-up and those who think top-down.

My son struggled in school despite easily being smart enough for it, and one of the many people we spoke to about his needs explained to him that schools tend to teach bottom-up, whereas he was very much a top-down learner. He first needs an overview before he dives into the details, whereas others first need to grasp the details before they can assemble an overview. And schools tend to teach to the second group.

It's possible we've got something similar with programmers here.

pif · 2023-09-15T09:17:58.000000Z

> But the next person to come along has to bounce back and forth

Only if they can't read code! Code is meant to be read as written, at least the first time; if you try to read it as executed, you are in the wrong.

If the previous developer wrote a function BakePizza, just assume that the pizza will be properly backed and move to the next line. If you start dwelling in details like oven temperature while trying to understand how to run the restaurant, you will not understand how the restaurant works, and you will forget the correct oven temperature.

vaughan · 2023-09-15T23:57:44.000000Z

This is why we need better tools like projectional code editors.

There should be an editor toggle to inline functions temporarily.

No more bouncing.

BenFrantzDale · 2023-09-15T10:33:58.000000Z

This blogger obviously wouldn’t get along with Sean Parent of Adobe. It’s old but I have everyone on my team watch this “no raw loops” presentation: https://youtu.be/W2tWOdzgXHA?si=4LKv1-sau60U63op in which he identifies reusable patterns hiding in code (“That’s a rotate!”) I myself was skeptical at first but have found over the years that breaking functions into pieces is the only way to maintain short functions that can be reasoned about in isolation, and as a side effect, surfaces reusable code. If you can’t write functions that easily fit on a page, I posit you don’t actually know what the function is supposed to be doing, and there’s probably a bug. (If you can’t hold the whole function in your head, how can you be sure there isn’t a bug?)