“What if it changes?”

Philip-J-Fry · on May 22, 2022

"What if it changes?" is a reasonable question to ask. But every time you do you are walking a tightrope. My rule of thumb is that we look at what is in use TODAY, and then write a decent abstraction around that. If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction that suits us TODAY not for the future. Bonus points if the abstraction allows us to extend easily in the future, but nothing should be justified with a "what if".

The reason a lot of Java or C# code is written with all these abstractions is because it aids unit testing. But I've come to love just doing integration testing. I still use unit testing to test complex logic, but things like "does this struct mapper work correctly" are ignored, we'll find out from our integration tests. If our integration tests work, we've fulfilled our part of the contract, that's all we care about. Focus on writing them and making them fast and easy to run. It's virtually no different to unit tests but just 10x easier to maintain.

Stratoscope · on May 23, 2022

> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction...

That is a good rule of thumb, and I often follow it too. But it does take some discernment to recognize cases where something would benefit from an abstraction or some common code, even if it is only used twice.

I used to work for a company that imported airspace data from the FAA (the US Federal Aviation Administration) and other sources. The FAA has two main kinds of airspace: Class Airspace and Special Use Airspace.

The data files that describe these are rather complex, but about 90% of the format is common between the two. In particular, the geographical data is the same, and that's what takes the most code to process.

I noticed that each of these importers was about 3000 lines of C++ code and close to 1000 lines of protobuf (protocol buffer) definitions. As you may guess, about 90% of the code and protobufs were the same between the two.

It seemed clear that one was written first, and then copied and pasted and edited here and there to make the second. So when a bug had to be fixed, it had to be fixed both places.

There wasn't any good path toward refactoring this code to reduce the duplication. Most of the C++ code referenced the protobufs directly, and even if most of the data in one had the same names as in the other, you couldn't just interchange or combine them.

When I asked the author about this code duplication, they cited the same principle of "copy for two, refactor for three" that you and I approve of.

But this was a case where it was spectacularly misapplied.

m12k · on May 23, 2022

I think your example illustrates why it's so important to choose the right way to generalize/share code depending on the circumstances. I've found that when there's a 90% overlap between 2-3 use cases, many people tend to go with "one common code path for all that's shared and then inject the 10% difference in via components/callbacks/config vars". This works reasonably well when the flow of execution is the same and what changes is just the specifics of some of those steps. But if the differences are also in which steps even happen, then in my experience this approach couples the whole thing too tightly and makes it harder to reason about what actually happens in a given configuration.

What I like to do instead is break the shared code paths into a palette of smaller subfunctions/subcomponents and then have each use case have its own high level code path that picks and chooses from these subfunctions: One does ABCDE, another does ACDEX. It makes it supremely easy to reason about what each of them actually do, because they read almost like a recipe. It becomes a sequence of high level steps, some of which are used by several use cases, while others are unique. I've found this way of generalizing is almost "cost free" because it doesn't really couple things at a high level, and it's the kind of readability refactor that you'd often want to do anyway even if the code wasn't being shared.

knighthack · on May 29, 2022

> ...break the shared code paths into a palette of smaller subfunctions/subcomponents and then have each use case have its own high level code path that picks and chooses from these subfunctions: One does ABCDE, another does ACDEX. It makes it supremely easy to reason about what each of them actually do, because they read almost like a recipe. It becomes a sequence of high level steps, some of which are used by several use cases, while others are unique.

Isn't this just the Command pattern? - https://en.wikipedia.org/wiki/Command_pattern

bckr · on May 23, 2022

I love this. Refactor the first time. Remix the rest of the times.

CraigJPerry · on May 23, 2022

Do you know if there’s a name for this pattern? I admire it all the time in Peter Norvig’s code. It leads to very approachable code.

akavel · on May 23, 2022

I don't know if there is an official name, but in my head I call it "helpers/components/mixins are better than frameworks." Or, "if one happens to want to write a framework, one ought to try hard to refactor it 'inside-out' to a set of composable components."

The most important (though not only) issue with frameworks is that you typically can't compose/mix more than one together - every framework is "exclusive" and takes control of the code flow. Whereas "components" can usually be easily mixed with each other, and leave control of the code flow to the programmer.

davejohnclark · on May 23, 2022

I generally think of this as the same principle of "prefer composition over inheritence". Leave the top-level free to compose the behaviour it requires rather than inheriting the framework's behaviour, for exactly the reasons you describe.

DougBTX · on May 23, 2022

This is frameworks vs libraries. In the first case the framework is calling the code with config and hooks to change behaviour. In the second case there are common library functions called from completely separate “application” code.

m12k · on May 23, 2022

I don't know an official name for it. It seems like it's almost too basic - "subdivide into helper functions" - to make it into the Gang of Four or other design pattern collections. But in my head I'm calling it the "Recipe Pattern"

actually_a_dog · on May 23, 2022

It sounds like a version of the strategy pattern to me.

https://en.wikipedia.org/wiki/Strategy_pattern

BigJono · on May 23, 2022

> and it's the kind of readability refactor that you'd often want to do anyway even if the code wasn't being shared.

Couldn't disagree more tbh. Some of the worst code I've ever had to work with has been over abstracted "recipe" code where I'm trying to descern complex processes based off two word descriptions of them in function names.

Doing this too much is a great way to turn a readable 100 line algorithm into a 250 line clusterfuck spread across 16 files.

biorach · on May 23, 2022

> Doing this too much

ok, so you're talking about overdoing it. It's still a good approach when done right.

BigJono · on May 23, 2022

Not really, unless "done right" is for like a 2000 line function or something.

If code is running once in order, there's no reason to break it up into functions and take it out of execution order. That's just stupid.

JJMcJ · on May 23, 2022

Martin Fowler, in his book "Refactoring" outlines circumstances were you can leave bad code alone.

Basically if it works and you don't have to touch it to change it, leave it alone.

biorach · on May 24, 2022

I think you've completely missed the point.

bcrosby95 · on May 23, 2022

Oh god that reminds me. Our company did this but for a whole project.

It was back when a bunch of social networks released app platforms after Facebook's success. When hi5 released their platform, rather than refactoring for our codebase to work on multiple social networks... someone ended up just copying the whole fucking thing and did a global rename of Facebook to Hi5.

For the 3rd social network I refactored our Facebook codebase to work with as many as we wanted. But we never reigned in Hi5, because it had diverged dramatically since the copy. So we basically had two completely separate codebases: one that handled hi5, and one that had been refactored to be able to handle everything else (facebook, bebo, myspace, etc)

ncmncm · on May 23, 2022

No bets on which one is buggier. Or which one's bugs (and also their fixes) break more networks.

bcrosby95 · on May 23, 2022

Hi5 was less buggy because new features were just never ported to it - it was deemed not worth the effort.

woolion · on May 23, 2022

I also got this heuristic from Martin Crawford. However I believe it applies to snippets (<100 lines of code at the very most) only, for the reason you gave. But even then, it sometimes happen that you find a bug in a 4 line snippet that you know was duplicated once, and have to hope you can find it through grep or commit history. So while being careful not to over-engineer and apply KISS/YAGNI ('you ain't gonna need it'), one-time duplication can be a pain.

woolion · on May 23, 2022

I cannot edit my comment anymore, but I realized Crawford is the Martin of 'Forest Garden' fame. I was obviously meaning Martin Fowler, from the 'Refactoring' book.

Maybe we'll have 'Forest Software' in some time. 'A code forest is an ecosystem where the population of bugs, slugs, trees and weed balance themselves, requiring very little input from the engineer'.

Dylan16807 · on May 23, 2022

> There wasn't any good path toward refactoring this code to reduce the duplication. Most of the C++ code referenced the protobufs directly, and even if most of the data in one had the same names as in the other, you couldn't just interchange or combine them.

That makes it sound like the problem is more of a spaghetti mess than duplication.

But I think the advice to copy something when you need two versions is supposed to be applied to specific functions or blocks or types. Not entire files. Then it wouldn't have duplicated the geographical code.

It's also important to have a good answer to how you'll identify duplicated bugs. I'm not sure how to best handle that.

zelphirkalt · on May 23, 2022

If I needed to guess: They probably referenced the protobufs directly, because there are always 2 and "You have to tell it which one!".

Brometheus · on May 23, 2022

What if the FAA updates the code for the coordinates of the one and not the other. Then your abstraction is moot.

woolion · on May 23, 2022

Of course not, abstraction works even better there! Every point that differs will have either a conditional, or an abstract part to be implemented by child classes. So the abstraction lets you know at a glance what are the key points to look for.

pkolaczk · on May 23, 2022

> If something is used once, ignore any abstractions.

This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once. CS 101 style. :D

The primary reason for building abstractions is not removing redundancy (DRY) nor allowing big changes, but making things simpler to reason about.

It is way simpler to analyze a program that separates input parsing from processing from output formatting. Such separation is valuable even if you don't plan to ever change the data formats. Flexibility is just added bonus.

If the implementation complexity (the "how") is a lot higher than the interface (the "what") then hiding such complexity behind an abstraction is likely a good idea, regardless of the number of uses or different implementations.

jasonkester · on May 23, 2022

Nah, I’ll take your 50 line main() every day over those 10 files with 10 lines of boilerplate and one line of working code each. But at the end of the day you just need to roll with the style of the org you’re working with.

I drop in to Java shops from time to time, and am more than happy to port my simple class structures that make sense and do things into the 18 level hierarchies described in the article. I just assume there is somebody there that is really invested in all those interfaces, adapters, and impls and I’m not here to start silly fights with them. The code will still work no matter how many pieces it’s cut into and how many unnecessary redirections you add so no worries.

But for my own stuff I like to keep things compact and readable.

pkolaczk · on May 23, 2022

Where did I write it would be 50 lines of code only? And where do you get the 10:1 boilerplate to real code ratio? Maybe just use a more expressive language if you can't build proper abstractions and need a lot of boilerplate?

And why go so extreme? A main() calling into 3 functions like load, process and save will be still plenty better than a single blob of IO mixed with computation and would still contain no boilerplate.

> I drop in to Java shops from time to time, and am more than happy to port my simple class structures that make sense and do things into the 18 level hierarchies described in the article.

I certainly agree with that, but that has nothing to do with abstraction. Abstraction and indirection are different things. Those terrible FizzBuzz Enterprise like hierarchies are typically a mixture of insufficient abstraction and far too much of indirection. Abstraction reduces complexity, while indirection increases it. AbstractFactoryOfFactoryOfProblems is indirection, not abstraction, contrary to what the name suggests.

jasonkester · on May 23, 2022

And why go so extreme?

I wouldn’t. I’d break it up the same as you, with those 3 functions. After we’d shown we were going to be doing lots of similar things. But given the choice between too complicated and too simple, that’s the direction I’d lean.

Apologies if that wasn’t clear in context.

Dylan16807 · on May 23, 2022

> And why go so extreme? A main() calling into 3 functions like load, process and save will be still plenty better than a single blob of IO mixed with computation and would still contain no boilerplate.

Sure. But a main() ordered into loading, processing, and saving would be similar amounts of better, despite not using the abstraction of functions.

pkolaczk · on May 23, 2022

Code style / formatting is a secondary thing. If someone made the effort of splitting it into well organized 3 pieces, and denoted those pieces somehow (by comments?), that also counts as an abstraction to me, even though it is not my preferred code style.

Dylan16807 · on May 23, 2022

If you consider organization in general to be abstraction, then I think that might cause some overselling of abstraction and miscommunication with others.

Unless I'm the one using words weirdly here.

pkolaczk · on May 23, 2022

It is not just reordering the lines of code.

In order to organize code that way, you need to establish e.g. some data structures to represent input and output that are generic enough that they don't depend on the actual input/output formatting. There you have the abstraction.

The key thing is to be able to understand the processing code without the need to constantly think the data came from CSV delimited by semicolons. ;)

lelanthran · on May 23, 2022

>> If something is used once, ignore any abstractions.

>

> This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once.

I broadly agree with you, but devils advocate time: not all abstractions are at the same level.

Writing a static function `slurp()` that takes in a filename and returns the file contents isn't an abstraction in the same sense as having a `FILE *` type that the caller cannot look into which functions like `fprintf()` and `fscanf()` use to operate on files.

I think an opaque datatypes (like `FILE`) are "more abstract" than static functions defined in the same file you are currently reading.

IOW, "Abstraction" is not a binary condition, it is a spectrum from full transparency to full opacity.

Static functions in C would be full transparency (no abstraction at all).

Opaque datatypes in C would be full opacity (no visibility into the datatype's fields unless you have the sources, which you may not have).

C++ classes would be something in-between (the private fields are visible to the human reading the header).

pkolaczk · on May 23, 2022

I agree, and that's why I said that good abstractions are those which have good implementation complexity vs interface complexity ratio. File abstraction is a perfect example of this - a simple concept you can explain in 5 minutes to a kid, but implementations often several thousands lines of code long.

Also, the simpler the interface, usually the more contexts it can be used in. So those abstractions with nice interfaces naturally tend to be more reusable. But I argue this is the consequence, not the primary reason. You probably won't end up with good abstractions by mercilessly applying DRY.

orthoxerox · on May 23, 2022

> This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once. CS 101 style. :D

Yes, if the program will be written and tested exactly once, with no change requests to come later, it's perfectly fine to write it as one big main().

It all depends on what the stakeholders need, clear communication with them is the real trick.

pkolaczk · on May 23, 2022

Well, what if the program suddenly crashes and gives you a stacktrace pointing to main()? Assuming you were not the original author of the code, you'd have to read most of the code to understand it.

If the main was split into well defined, separate pieces, at least you could quickly rule out quite a lot of complexity. If it crashed in parsing, so wouldn't need to understand the processing logic, etc.

Sure it is easy to read one blob of code, if it is only 100 lines of code. But it is a different story if it is 10000 lines and now you have to figure out which of the 100 variables are responsible for keeping the state of the input parser and which are responsible for "business logic".

orwin · on May 23, 2022

But writing it would be harder, no?

I mean if it's only like 50 short lines, that would be okay-ish, but in this case why do it in C and not use perl or awk?(i suppose you want fast text processing, so I won't suggest python). If the processing is hard, then you will need debugging (which is better in segregated functions) and to prototype a bit (unless I'm the only one who does that?).

andrewingram · on May 23, 2022

I think the specific example mentioned might be subjective, but I agree with your point.

In my mind, the common emphasis on the DRY/WET thing with abstractions leads many people to miss the point of abstractions. They’re not about eliminating repetition or removing work, they’re about making the work a better fit for the problem. Code elimination is a common byproduct of abstractions, but occasionally the opposite may happen to.

I see an abstraction as being comprised of a model and a transformation. The villain isn’t premature abstractions, it’s abstractions where the model is no better (or worse!) for the problem than what’s being abstracted over.

doctor_eval · on May 22, 2022

I could not agree more with this.

I would add, though, that in my experience you can often identity parts of a design that are more likely to change than others (for example, due to “known unknowns”).

I’ve used microservices to solve this problem in the past. Write a service that does what you know today, and rewrite it tomorrow when you know more. The first step helps you identify the interfaces, the second step lets you improve the logic.

In my experience this approach gives you a good trade off between minimal abstraction and maximum flexibility.

(Of course lots of people pooh-pooh microservices as adding a bunch of complexity, but that hasn’t been my experience at all - quite the opposite in fact)

Mr_P · on May 23, 2022

Microservices is just OOP/dependency-injection, but with RPCs instead of function calls.

The same criticisms for microservices (claims that it adds complexity, or too many pieces) are also seen for OOP.

Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

singron · on May 23, 2022

I don't think the paragraph metaphor works well since written works are often read front to back, and the organizational hierarchy isn't so important on such a linear medium. There are books that buck the trends and IMO you don't really notice the weirdness once you get going. E.g. books with long sentences that take up the whole paragraph, or paragraphs that take up the whole page, or both at the same time. Some books don't have paragraphs at all, and some books don't have chapters.

Splitting material into individual books makes a little more sense as a metaphor, especially if it's not a linear series of books. You can't just split a mega-book into chunks. Each book needs to be somewhat freestanding. Between books, there is an additional purchasing decision introduced. The end of one book must convince you to go buy the next book, which must have an interesting cover and introduction so that you actually buy it. It might need to recap material in a previous book or duplicate material that occurs elsewhere non-linearly.

A new book has an expected cost and length. We expect to pay 5-20 dollars for a few hundred pages of paperback to read for many hours. We wouldn't want to pay cents for a few pages at a time every 5 minutes. (or if we did, it would require significantly different distribution like ereaders with micropayments or advertising). Some books are produced as serials and come with tradeoffs like a proliferation of chapters and a story that keeps on going.

Anyway, it's a very long way to say that some splitting is merely style, some splitting has deeper implications, the splits can be too big or too small, and some things might not need splits at all.

Ma8ee · on May 23, 2022

I'd like to argue against [quote].

[author] uses the [simile] to argue the [argument].

The obvious flaw in the [argument] is of course [counterargument].

[quote]: Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

[author]: Mr_P

[simile]: microservices or smaller classes are like paragraphs in an essay.

[argument]: since no one complains about breaking up an essay into paragraphs, no one should complain about breaking up a system into paragraphs.

[counterargument]: breaking up a system in smaller microservices or classes is not at all like breaking up an essay into paragraphs, which I think this comment has demonstrated.

epicureanideal · on May 23, 2022

> Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

There are orders of magnitude different amounts of work in each of these cases. (I’m not saying it’s a lot of work but it’s still significantly more in some of those cases relative to the others.)

Transfinity · on May 23, 2022

Perhaps "break up your book into chapters" is a better metaphor for microservices. Breaking a chapter into paragraphs makes me think more of OO design or functional decomposition.

dgb23 · on May 23, 2022

It’s breaking up into whole books. Each has is stored, distributed, addressed and built separately. You have to become an expert at making the implied overhead efficient, because it will dominate everything you do.

lelanthran · on May 23, 2022

> Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

They would if each paragraph of that essay lived at a different domain/url.

the_gipsy · on May 23, 2022

Even if each paragraph was its own file. It's just a bad metaphor.

closeparen · on May 23, 2022

A microservice contains many classes. Those classes are organized into packages and so many of them are necessarily “public.” The microservice boundary is a new kind of grouping, where even this collection of packages and public classes presents only one small interface to the rest of the architecture. AFAIK this is not a common or natural pattern in OOP and normal visibility rules schemes don’t support or encourage it.

bcrosby95 · on May 23, 2022

My favorite books are the ones where you read a paragraph and realize, after the fact, that it's just 1 sentence.

dotancohen · on May 23, 2022

  > If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction

I refactor for the second time. I don't like chasing bugs in multiple places.

My rule of thumb is that there are only three quantities in the software development industry: 0, 1 and infinity. If I have more than 1 of something, I support (a reasonable approximation of) infinite quantities of that something.

astrobe_ · on May 23, 2022

Agreed, except avoid the term "abstraction". When one starts to talk about abstractions, one stops thinking.

The right word is "generalization", and that's what you are actually doing: you start with a down-to-earth, "solve the problem you've got!" approach, and then when something similar comes up you generalize your first solution.

Perhaps part of the problem is that in OO, inheritance is usually promoting the opposite: you have a base class and then you specialize it. So the base class has to be "abstract" from day one, especially if you are a true follower of the Open Close Principle. I don't know about others, but for me abstractions are not divine revelations. I can only build an abstraction from a collection of cases that exhibit similarities. Abstracting from one real case and imaginary cases is more like "fabulation" than "abstraction".

The opposite cult is "plan to throw away one", except more than just one. Not very eco-friendly, some might say; it does not looks good at all when you are used to spend days writing abstractions, writing implementations, debugging them, and testing them. That's a hassle but at least once you are done, you can comfort yourself with the idea that you can just extend it... Hopefully. Provided the new feature (that your salesman just sold without asking if you could do it, pretending they thought your product did that already) is "compatible" with your design.

The one thing people may not know is how much faster, smaller and better the simpler design is. Simple is not that easy in unexpected ways. In my experience, "future proofing" and other habitual ways of doing things can be deeply embedded in your brain. You have to hunt them down. Simplifying feels to me like playing Tetris: a new simplification idea falls down, which removes two lines, and then you can remove one more line with the next simplification, etc.

ThrustVectoring · on May 22, 2022

Java in particular is missing certain language features necessary for easily changing code functionality. This leads to abstractions getting written in to the code so that they can be added if needed later.

A specific example is getters and setters for class variables. If another class directly accesses a variable, you have to change both classes to replace direct access with methods that do additional work. In other languages (Python specifically), you can change the callee so that direct access gets delegated to specific functions, and the caller doesn't have to care about that refactor.

_gabe_ · on May 22, 2022

Getter and setters are unnecessary. The thing that most people are trying to avoid by using these is mutating state. However a getter or setter does nothing to prevent this. A simple `const` keyword goes so much farther than adding useless indirection everywhere.

Edit: I suppose it may be argued that you need to set some other state when you set a member variable. If that's the case, then it's no longer a getter or a setter and the function should be treated differently.

karpierz · on May 22, 2022

Getters and setters are much more useful when accessing or setting the element should require some other function calls. Caching, memoization, and event-logging are examples where you might want this to happen.

You can say that's not a getter/setter, but then your definition is just different than the people you're responding to.

Xeoncross · on May 23, 2022

Caching, memoization, and event-logging can be handled by wrapper objects that implement the interface so the base object doesn't need to contain all these layers of outside concerns. Let each class focus on it's single area of use.

interface Store { Query() }

// these all have the Query() method

type/class MySQL implements Store

type/class Cache implements Store

type/class Logger implements Store

var db Store

db = new Logger(new Cache(new MySQL()))

ratww · on May 23, 2022

However Getters/Setters are often the worst place to implement cross-cutting concerns, like caching, memoization and logging.

Of course, in more limited languages/environments they're probably the only tool you have, so there's that.

soganess · on May 23, 2022

Getter and setter are not just for keeping state immutable. They allow an api to control _how_ state changes. The most obvious example is maintaining thread-safety in multi-threaded environments.

I get they can be cumbersome, but using them really matters especially as a project grows... an API that has a simple single client today may have many different (and concurrent!) ones tomorrow. The pain of using S&Gs now saves refactoring later.

kadoban · on May 23, 2022

The number of getters and setters I've written that never got changed into anything more than read/change variable has to be _hundreds_ of times more than the ones that ever did anything else.

At what point is it cheaper to just refactor into getters/setters later when needed? That point _has_ to be miles behind me.

ratww · on May 23, 2022

True.

Another problem (from a class/library-consumer point of view) is having getters/setters suddenly becoming more expensive to call, blocking, or even having side effects after an update.

It often only affect the runtime behavior of the code.

Changing the interface, however, will give me a hit that something else has changed.

rileyphone · on May 23, 2022

OOP languages shouldn't need getters and setters because there shouldn't be even a concept of variable access and mutation, just all method calls - that's what OOP is all about, after all, not just putting variables into bags and staying in a procedural mindset.

msla · on May 23, 2022

Smalltalk-style OO anyway: All You Can Do Is Send A Message.

That isn't the only type of OO. Look at CLOS in Common Lisp for a counterexample: https://wiki.c2.com/?HowObjectOrientedIsClos

imtringued · on May 23, 2022

That will just make everything more convoluted and less flexible. When you send a message over websockets you want a Datatype for each message type. It's not going to have any complicated method calls. You just insert the data or retrieve it on the other side. Since the framework expects you to define setters and getters you do it reluctantly.

MichaelBurge · on May 22, 2022

I think the concern is: It's currently a getter/setter but might change later.

Maybe for debugging you want to log a callstack every time the field gets accessed, for example.

Or when you set the field, you should invalidate some cached value that uses it.

andrewaylett · on May 23, 2022

That's a design choice though -- if you're structuring your code to avoid mutable state, you're not going to have setters. And if you're structuring your code such that you're telling objects what to do, rather than pulling data out of them and acting on them remotely, then you're not necessarily going to have getters either.

marginalia_nu · on May 22, 2022

To be fair, the the Open-Closed principle is basically an article of faith in Java (along with the rest of SOLID).

imtringued · on May 23, 2022

The getter setter nonsense is 99% compliance for specific frameworks like Hibernate or shudder, JSF but it caught on and now nobody wants to be seen without using ugly getters and setters which would be perfectly fine if the language natively supported them.

zelphirkalt · on May 23, 2022

> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better.

That is just as bad as a general rule as "What if it ever changes, we need to abstract over it!". As always: It depends. If the abstraction to build is very simple, like making a magic number a named variable, which is threaded through some function calls, at the same time making things more readable, then I will rather do that, than copying copy and introducing the chance of introducing bugs in the future, by only updating one place. If the abstraction requires me to introduce 2 new design patterns to the code, which are only used in this one case ... well, yes, I would rather make a new function or object or class or whatever have you. Or I would think about my over all design and try to find a better one.

Generally, if one finds oneself in a situation, where one seems to be nudged towards duplicating anything, one should think about the general approach to the problem and whether the approach and the design of the solution are good. One should ask oneself: Why is it, that I cannot reuse part of my program? Why do I have to make a copy, to implement this feature? What is the changing aspect inside the copy? These questions will often lead to a better design, which might avoid further abstraction for the feature in question and might reflect the reality better or even in a simpler way.

This is similar in a way to starting to program inside configuration files (only possible in some formats). Generally it should not be done and a declarative description of the configuration should be found, on top of which a program can make appropriate decisions.

fauigerzigerk · on May 23, 2022

I agree that counting the number of times you repeat yourself is not the right metric to determine whether or not to introduce an abstraction. Abstraction is not compression. But I don't think it depends on how simple any abstraction would be either. Simplicity does play a role for pragmatic reasons of course but it's not the key question in this case.

The key question is whether there is a functional dependency or just a similarity between some lines of code. If there is a functional dependency, it should be modeled as such the first time it is repeated. If there is only coincidental similarity then introducing a dependency is simply incorrect, regardless of how often any code happens to get repeated.

zelphirkalt · on May 23, 2022

I agree! Maybe one could say: Not all repetitions are of equal nature in terms of what causes them, and to understand the cause is important.

konschubert · on May 23, 2022

> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction...

As others have said, this is a good rule of thumb in many cases because finding good abstractions is hard and so we often achieve code re-use through bad abstractions.

But really good abstractions add clarity to the code.

And thus, a good abstraction may be worth using when there is only two, or even just once instance of something.

If an abstraction causes a loss of clarity, developers should try to think if they can structure it better.

EDIT: This comment below talks about good example of how a good abstraction adds clarity, while a bad abstraction takes it away: https://news.ycombinator.com/item?id=31476408

grishka · on May 23, 2022

When I'm asked "what if it changes?", I usually answer with something like "we'll solve it when, and if, it happens". I'm a fan of solving the task at hand, not more, not less. If I know for sure that we're going to add feature X in a future version, sure I'll prepare my code for its addition in advance. But if I don't know for certain whether something will happen, I act as if it won't. It's fine to refactor your code as the problem it solves evolves. You can't predict future, and you'll have to be able to deal with mispredictions if you try, too.

BobbyJo · on May 23, 2022

It takes me twice a long to get my integration tests to work if I don't have unit tests making sure the parts work along the way.

If you write an integration test, and it fails, what's broken?

lelanthran · on May 23, 2022

> It takes me twice a long to get my integration tests to work if I don't have unit tests making sure the parts work along the way.

That's a valid concern, but if your unit-tests are only for making sure that part you just wrote works as expected than just have a test-case up for that specific part, and change it when you move to the next part.

The value of unit-tests is supposed to be regression testing: when you change something that breaks a different unit in a different part of the stack.

> If you write an integration test, and it fails, what's broken?

Well, I debug it it the same way I debug any bug. After all, most bug reports are from a full execution in the field; I am probably already set up to debug the full application anyway[1].

[1] Once a bug reproduction is set up in a fairly automated way.

_ZeD_ · on May 23, 2022

you know... you debug and find out

hinkley · on May 23, 2022

The thing about unit tests is that the better they are, the less you have to debug.

A very good one, with effective use of matchers, you don't even have to read the test to know what you did wrong. You just know from the error message what you broke.

lelanthran · on May 23, 2022

> The thing about unit tests is that the better they are, the less you have to debug.

>

> A very good one, with effective use of matchers, you don't even have to read the test to know what you did wrong. You just know from the error message what you broke.

Agree, and agreed. The counterpoint is that unit-tests take time to write and time to maintain - you have to balance that time spent against the time that you would spend debugging an integration test.

hinkley · on May 23, 2022

Integration tests take far more time to maintain. Major functionality changes can affect all of your tests. With unit tests they may invalidate a few, but that’s okay because they were cheap to begin with.

If your unit tests are hard, you need to refactor your code.

lelanthran · on May 23, 2022

> Integration tests take far more time to maintain.

So? You're going to have them anyway or else you can't deploy.

> Major functionality changes can affect all of your tests. With unit tests they may invalidate a few, but that’s okay because they were cheap to begin with.

hinkley · on May 23, 2022

Orders of magnitude matter, and if you have a testing pyramid instead of a testing ice cream cone, there can be up to two orders of magnitude difference between the number of unit tests and integration tests.

If you start with unit tests, then the integration tests are just verifying the plumbing. That only changes when the architecture changes, which is hopefully a lot less than how often the requirements change substantially.

BatteryMountain · on May 23, 2022

Not to mention most unit tests are utterly useless in reality and test things we know to be true (1 + 1 -level nonsense), not real edge cases.

The logic that usually gets ignored in unit tests is the ones that actually needs to be tested, but skipped because it is too difficult and might involve a few trips to the database which makes it tricky (in some scenarios you need valid data to get a valid test result, but you cannot just go grab a copy of production data to run some test).

And then there is the problem of testing related code, packages and artifacts being deployed to production which is really gross in my mind and bloats everything further.

A team I've worked on has resorted to building actual endpoints to trigger test code that live alongside other normal code (basically not a testing framework), so that they could trigger test and "prove the system works" by testing against production data at runtime.

heavenlyblue · on May 23, 2022

Your message is just a collection of ad-hoc points with no structure, context or justification for any of them.

imtringued · on May 23, 2022

The "message" is a response to the last paragraph.

>The reason a lot of Java or C# code is written with all these abstractions is because it aids unit testing.

That is the justification for talking about testing. Code is being ripped apart to make it easier to test, while the tests that are used as a justification for ripping apart the code are low quality as 99% of the work in unit testing is thinking of and setting up the test case, not the actual test code.

urban_winter · on May 23, 2022

"Copy code if it's used twice" is terrible advice. You are creating a landmine for future maintainers of your code (often yourself, of course). Someone will inevitably change only one of the two versions at some point in the future and then you're going to have to rely on tests to catch the issue - except that your tests will probably also reflect the duplication and you'll also forget to change the 2nd test.

The only possible justification for duplicating code would be that creating an appropriate abstraction is harder. Given that there are generally economies of testing when you factor out common code, that's usually just not true.

"Duplication is evil" is a more reliable mantra.

Philip-J-Fry · on May 23, 2022

It's a rule of thumb, not a hard rule. If I have something I need to use in a separate project, I'm copying it. I'm not going to write a library just so I can import it into 2 different projects.

Yes it means stuff needs to be changed in 2 places. Yes it means someone can change one and not the other. But it also means that each thing can be maintained on it's own without worrying about another. In the early stages you don't know how much can truly be reused and whether you're just cornering yourself. I've had scenarios in the past where we've written an abstraction around some common code and then the third application we want to use it in just does not fit the model we initially thought of. Could we change the library? Yes obviously. But are we going to face the same issue on the 4th project to use this library? Probably. It's a large maintainance load. At some point you end up making breaking changes to the library and you're comitted to either maintaining multiple major versions, or maintaining an abstraction that is supposed to work for every scenario, which can be a huge time sink.

There are tradeoffs to be made. I'd rather lose the maintenance burden of library when consumers have vastly different needs and just take the hit if having to do a Sourcegraph search for usages of some code. This search would need to be done to find all consumers of the code anyway if it was a library. So the end result is rarely different in my experience.

watwut · on May 23, 2022

Imo, the correct rule is "copy if it is more likely to be modified separately" and "create one method if it likely has to change at the same time".

ansmithz42 · on May 23, 2022

Excellent advice! I wish more programmers paid attention to this than just the 2-3 rules. The 2-3 rule tends to create unintentional tight coupling between things that becomes an iron bar that is even more evil to rectify.

hardware2win · on May 23, 2022

>Someone will inevitably change only one of the two versions at some point in the future and then you're going to have to rely on tests to catch the issue

It works both ways

Someone modifies code thats used in both places and breaks othef thing

belter · on May 23, 2022

It is important to carefully look into the functional context where that abstraction is used.

If you are looking for example into System Integration, Data Integration, ETL and so on, not using a canonical format from the beginning, will get you into the type of almost exponential grow in mappings between sources and targets.

https://www.bmc.com/blogs/canonical-data-model/

https://www.enterpriseintegrationpatterns.com/CanonicalDataM...

davedx · on May 23, 2022

I think the test pyramid still has legs. Write both.

I do agree a lot of abstractions in C#/Java seems to be testing implementation stuff leaking into the abstraction layer. A lot of inversion of control in these languages seems purely to allow unit testing, which is kind of crazy.

Personally I prefer the "write everything in as functional a style as possible, then you'll need less IoC/DI". This can be done in C# and Java too, especially the modern versions.

_carbyau_ · on May 23, 2022

I have a general rule:

Once is an incident. Deal with it.

Twice is a co-incident. Deal with it. But keep an eye out for it...

Third time? Ok, this needs properly sorting out.

haihaibye · on May 23, 2022

Mr Bond, they have a saying in Chicago: 'Once is happenstance. Twice is coincidence. The third time it's enemy action' - Goldfinger

bryanrasmussen · on May 23, 2022

I got handed off a little online customer service chat application at a previous job, it had been written by someone I put at a similar skill level as mine but with different personality traits. One of his personality traits was to code to the spec and not consider "what if it changes".

This online chat had two functionalities, chat with a worker a and leave a message from worker with suggestions as to what to look at in response, there was no connection between these two functionalities specified and so my friend had written it without connection, it was difficult without doing a full rewrite to get state information from one part of the application to another one (this was written in JQuery)

Anyway, 6 months+ down the line it got respecified, now it needed shared state between the two parts of the application, which meant either significant rewrite or hacks, so hacks was chosen. Ugly hacks but worked (I think ugly hacks was the correct choice here definitely because chat application almost completely scrapped a year later for a bot)

After I was done I said "but why write it like that?" It was specified no state was needed between the two parts, "yeah but it should be obvious that is going to change, they would keep wanting to add functionality to it and probably share state as to the two communications channels"

tldr: there are some potential changes that seem more likely than others, and the architecture should take those potential changes into consideration.

rr808 · on May 23, 2022

Wow I always believed this but was too scared to admit because its not fashionable to think this way.

civilized · on May 22, 2022

Reminds me of FizzBuzzEnterpriseEdition . https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...

You never know when you might need to change the implementation of how the "Fuzz" string is returned, so you need a FuzzStringReturner.

And you never know when you might need multiple different ways of returning "Fuzz", so you need a FuzzStringReturnerFactory.

And for SOLID it's important to separate concerns, so you want your FuzzStringReturnerFactory separate from your FuzzStringPrinterFactory.

And that barely scratches the surface of what you need!

Dave3of5 · on May 23, 2022

Yip dealing with a code base right now that's like this. Devs do this for job security. Make the thing incomprehensible to others guarantees you can't be fired right? It also guarantees a promotion right up to team lead or dev manager.

Another note it's not what's the best SOLID design it's what that original dev thinks is the best SOLID design. SOLID in itself is loose enough that you can have designs that vary massively but are still technically SOLID.

moffkalast · on May 23, 2022

A good dev should comment their readable code so well that anyone can drop in and figure it out.

Unfortunately that good dev is no longer employed at this company...

Spivak · on May 22, 2022

FizzBuzzEnterprise is just "closed to modification" / "open to extension" taken to the extreme on a trivial problem.

doctor_eval · on May 22, 2022

Except that sometimes it’s hard to tell when the problem you’re working on is trivial. I saw this pattern used as a source of data for a drop down list of office locations. It should have been “select id, name from office_locations”. It really was that simple. But it was an “enterprise app”, so instead we had 5 classes and so much more.

civilized · on May 22, 2022

It would be easier if people were encouraged to find the simplest solutions in their education, rather than being indoctrinated with somebody's long list of Patterns that will somehow make your code Good if you do them enough.

mepiethree · on May 23, 2022

Maybe we just work at completely different kinds of places, but I’ve worked with a ton of people who don’t know what patterns are and very few who over-apply them

Aeolun · on May 23, 2022

Conversely, we have a single React component for displaying locations, and a whole plethora of different parameters to send into the backend call with a variety of different effects. Having a factory or interface pattern here would have been really nice.

doctor_eval · on May 23, 2022

Sure, if that’s the use case. But I think the point of this thread is that in many cases we add functionality despite the lack of immediate necessity, and this causes problems.

isoprophlex · on May 23, 2022

From the test suite:

    this.doFizzBuzz(TestConstants.INT_14, TestConstants._1_2_FIZZ_4_BUZZ_FIZZ_7_8_FIZZ_BUZZ_11_FIZZ_13_14);

This is hilarious

chunkyks · on May 23, 2022

I haven't seen anyone else point out: One of the ways you can often identify really experienced programmers is by them being able to pretty accurately separate thing-that-probably-will-change from things-that-probably-won't.

Finding a good way to balance YAGNI with this-will-definitely-change-in-a-few-months-because-it-always-does is incredibly hard, and I've really appreciated working with engineers who make that prediction correctly.

hinkley · on May 23, 2022

I'd say it goes even beyond that into being able to separate the hypothetical future problems that will be a minor irritation from the ones that will wreck your whole month.

As you get to learn more failure modes for software, you start to realize you can't plug all the holes in the dyke, not if you had absolute control of every hand on your team. You can stop four problems or you can hedge against twenty. The problem is that hedging is way harder to explain to people. It's how we ended up with unsatisfying concepts like 'code smells'. It's not evil, it might not even be that bad, but... don't do it anyway.

creakingstairs · on May 23, 2022

I think there are also very low hanging fruits that doesn't take more effort but makes something way more future proof. Experienced engineers can identify those, and most of the low hanging fruits seem to be DB design related.

JeremyNT · on May 23, 2022

> Finding a good way to balance YAGNI with this-will-definitely-change-in-a-few-months-because-it-always-does is incredibly hard, and I've really appreciated working with engineers who make that prediction correctly.

Is this a generalizable skill? To me, it feels like changes are more often driven by shifting business requirements and ephemeral victories of internal political battles rather than sound technical reasons.

funkymike · on May 23, 2022

Agreed. I think being able to predict what is likely to change is heavily dependent on knowing the business domain well since that is going to be the driver of change. I think it is a specific application of curiosity. It's also a matter of being forward thinking in general. I think both are skills one can foster and can apply more generally.

chunkyks · on May 23, 2022

That's the point, though. What makes someone senior is being good at everything that isn't writing code. Such as being in tune with the business and how it's demands behave over time.

angarg12 · on May 23, 2022

Since OP works in Amazon he might be familiar with the mental model of "one and two way doors". A one way door is a decision that is impossible or very difficult to undo or change. A two way door is easy to change. The idea is to spend most of your energy on one way door decisions and little in two way doors. This act as a remedy to things like bike shedding. If something is easy to change, just go ahead and do it!

The relevance here is that we can apply this concept in reverse. If we make something easy to change, is close to a two way door. Hence we reduce the time we need to spend on it's design/consideration/ etc.

Personally I like to write code that is more on the flexibler side to increase my optionality. I can then iterate faster throwing things at the wall and change my mind as needed. Of course this flexibility doesn't come for free. Overengineering and cost of carrying are real, so apply your best judgement.

yen223 · on May 23, 2022

"One and two-way doors" is a nice way to phrase it. I use a similar heuristic to figure out where to spend more effort designing things upfront. For most web apps, the one-way door is usually going to be the database schema - data migrations are trickier to do once you have to deal with real data.

The other big class of "one-way" decisions are with regards to code that live in environments you don't control, e.g. mobile and desktop apps.

One tip I would offer is when building something new, you should try and delay making one-way decisions as long as you can, until you have a clearer picture of how things should work.

dgb23 · on May 23, 2022

This is really good advice. Flexibility is leverage. It’s not a “what if” question, but deliberately preparing for exploration and fast iteration.

thatswrong0 · on May 22, 2022

This is probably a bit of an aside to the implied problem at hand but.. considering this from the React side of my company's codebase, I wish this question was at least kept softly in mind when designing components. I frequently encounter components that were very clearly created as "one-shot"s that are then unfortunately extended by piling on more props and conditional behavior by the next developers who need something like the current component but ever so slightly different.

Often the solution initially would have been to separate out the presentation side of the component from the behavioral wrapper that chooses what data needs to be shown / what actions are performed by interactions. By the time the component arrives at my lap (because I too need something same same but different), however, it has become a monstrosity that can take a long time to disentangle via ADD (anger driven development).

I think asking oneself a simple question such as "how would someone make this search box work with a different data source?" would probably result in components that are decomposed into simpler, smaller parts that allow for much easier reuse and adaptation.

On the flip side, I'm also of the belief that the second developer to touch a component is necessarily better equipped to answer that question, so the onus should probably be on them to make the proper generalizing changes.

(I'm still trying to figure out how to write a document that expresses this idea more concretely to my coworkers because it often isn't quite this simple..)

¯\_(ツ)_/¯

bcrosby95 · on May 22, 2022

> On the flip side, I'm also of the belief that the second developer to touch a component is necessarily better equipped to answer that question, so the onus should probably be on them to make the proper generalizing changes.

This.

Given a long enough timeline, pretty much all abstractions fail. People are too timid to replace them when they do, and given enough churn, the code gets out of hand.

creakingstairs · on May 23, 2022

I think there are some quick wins that doesn't take too much dev time but makes components reusable or at least easily make it reusable in the future.

- Like you've already said, prefer splitting components into purely presentational components("dumb components") and components with some logics ("smart components")

- If you are using design components, make sure to pass on those props. e.g. if you are using Chakra, pass BoxProps to <Box>.

- try to split out logic into hooks. This can be very specific to the current use case.

These aren't hard things to do but lets you quickly create a generic component by only changing the smart component/hooks or only reuse the dumb components with another specific hook, if the use case is a bit different.

wildrhythms · on May 24, 2022

Yes! This is such a great summary of the pains of front-end, although I'm sure these pains transcend any one arm of software development. What laypeople may see as a simple search box is probably a monstrosity of spaghetti code that was coded by a single person under a surprise deadline and interfaces with 3 different APIs (recent suggestions, search-on-type quick results, full search...) don't forget the many different states- focus, active, disabled, loading- necessary to consider and to build styling around. Everything is tightly coupled and semantically horrific because that's what the designers and managers demand for this year's particular flavor of the product's design under whatever deadline has been sprung on those involved in actually building the thing.

I think the interactive nature of these things and years of 'training' that the average computer user has endured to expect how a search box behaves tends to hide the hidden complexity lurking in UI elements everywhere. Another great example is a <select> element.

jtolmar · on May 23, 2022

Most abstractions will accept one layer of ugly hacks for situations they were never meant to deal with. I'd recommend waiting until the second layer of hacks starts to form, then refactoring with what you've learned, since that second layer of hacks is when things start to really fall apart.

aprdm · on May 23, 2022

Second developer that touched it might be new to the company, or in a deadline, you never know

thatswrong0 · on May 23, 2022

Right - the former case is one that’s unavoidable and, in the case the first developer didn’t get it quite right, should ideally be covered by code review, but doing a good code review is IMO pretty hard, and giving that sort of feedback requires that the reviewer also understands the problems at hand (which in our case is frequently not the case as our development teams are small and we’ll often have a backend focused dev reviewing another backend focused dev’s frontend work).

And in the latter case, at least in my organization, we don’t often have hard “must meet” deadlines, but instead have more self-imposed deadlines that become externalized and concrete when these “deadlines” (aka ship date estimates) are communicated org wide.

So all I can figure to do is give zoom talks, write documentation that’s shared to all developers, and try to encourage coworkers to take the opportunity to push soft deadlines back if needed to pay down tech debt.

jackblemming · on May 22, 2022

Good code rarely needs to change because it's complete. It's meant to be built on top of, rather than modified for every new consumer. Think standard libraries. There is no reason for the linked list module to ever change unless it's for bug fixes or performance improvements.

Business logic needs to change all the time, because businesses are always changing. This is why we separate it out cleanly, so it can change easily.

Know what type of code you're writing so you can plan and design appropriately.

taeric · on May 22, 2022

Meh. Good code is a weasel term. Code can be easy to extend. Easy to change. Easy to throw away. Sometimes tradeoffs are made between those options.

This is like saying a good piece of furniture is easy to add to. I mean, maybe?

nine_k · on May 23, 2022

To my mind, one of the important aspects is that code must be easy to understand, so as to write it correctly.

josephg · on May 23, 2022

Every virtue is easy to argue for in isolation. All else being equal, who wouldn't prefer their code to be easy to read? Or easy to extend, performant, simple, small, secure, packed with features, well tested, and so on.

The trick is that writing code thats easy to understand often takes more time. Making code performant will often make it harder to read, and harder to change. Adding lots of features will make your code harder to change.

Its easy to point at the virtues of good code. And its easy to pick out some personal favorites. But the difference between an intermediate and expert programmer is knowing how (and when) to trade those values off against each other. When you're prototyping, write messy code that you can change easily. When you're writing a web browser, agility doesn't matter as much as security and performance. If lives depend on your code (eg in medical, rocketry, etc) then testing becomes a lot more important. Working with a lot of junior engineers? Try and write code they can understand and maintain more easily. And so on.

Its a fine thing to have a personal style when programming. But the mark of excellence is whether you can adapt your style to suit the actual problem you're trying to solve.

Tepix · on May 23, 2022

I like the idea of thinking about various code components as furniture!

In practice you can get by with a wild collection if non-matching furniture. It will be non-asthetic but it will work.

Code projects are quite similar in that regard. On the other hand sometimes it will be such a poor fit that it starts breaking things. Does that also apply to furniture?

Perhaps if you put the bed too close to the closet you can no longer open its door all the way...

taeric · on May 23, 2022

There are also furniture pieces that are made to connect to each other. Some are obvious like peg boards, but things like sectional couches, too.

Then there are the myriad array of cabinet doors. And folks like heavy built-in cabinets, but Ikea ones are just fine. They typically have better hardware. But nobody wants to be the equivalent of an installer in software, it seems.

Since you said beds, consider getting the wrong size bed for a room. Or mismatched head and foot boards.

lr4444lr · on May 22, 2022

Hard disagree. I wish the world worked that way, but it doesn't. So much happens in the time between the product inception and delivery. Initial proposal of value, prototypes, customer feedback and requests, known performance issues to work around, making hooks for non-devs to execute important overrides... the list goes on, and contracts have dates that have to be met. Stuff can't be rewritten to address every turn. This is why Agile (however abused it is) largely beat Waterfall. Software should serve people and yield to their needs. Not the other way around.

jackblemming · on May 22, 2022

I don't know what you're disagreeing with, but I suppose you're welcome to open up a PR to change the standard library with your specific use case so you can meet a contract date. I imagine you'll be quite handedly laughed at.

aprdm · on May 23, 2022

Many companies I worked at had forks of stanrdadd libraries internally

azemetre · on May 23, 2022

I've seen this often as well, typically some library will get flag by programs like snyk giving you a "high" score. The way snyk scores packages is completely asinine. It favors libraries that are constantly being updated compared to say some library that is feature complete and in maintenance mode. One way around this is to literally pull all the source code and paste it into your repo.

Glavnokoman · on May 23, 2022

Same here. And every time that was a mistake and the fork sucks.

lr4444lr · on May 22, 2022

Standard libraries was one example you cited for a broad claim about "good code". I'm not arguing about standard libraries.

Const-me · on May 23, 2022

I have replaced pieces of C++ standard library quite a few times.

std::list calls malloc/free for every node instead of a bunch of them, expensive. Also they are doubly linked, for some applications a single-linked lists fit better.

std::vector<bool> lacks data() method, makes serializing/deserializing them prohibitively expensive.

Even something as simple as std::min / std::max for float and double types aren't using the correct implementation on AMD64, which is a single instruction like _mm_min_ss / _mm_max_sd.

agentultra · on May 22, 2022

Was going to swoop in to say this!

I would add that sometimes leaving oneself room to expand/respond to changes is just what your code needs. Expansion points. Whatever you want to call it.

The maxim I use is, "avoid being overly specific." If you have polymorphism in your language the less you say about the type of a variable the more places it can be used and the more ways it can be composed with other functions. This requires a style of design that pushes side effects to the edges of the program (consequently where they're the easiest to change).

With this style of programming responding to change is straight-forward to reason about. No need for complicated indirection between objects and tracing behaviors through v-tables. If you are using OOP keep your data and behaviors separate.

The stuff that solidifies rarely changes. The stuff built on it changes a lot. And as time goes on you'll find that refactors will start pushing more upper layers down where they will eventually solidify.

hyperpape · on May 22, 2022

Pointing to a list makes the problem look too easy, because it’s such a clearly defined abstraction. Of course a linked list mostly doesn’t need to change—it’s an easy problem.

Give me a definition of good code that applies to:

- the standard library

- a gui toolkit

- the kernel

- the apps built on top of vendor provided gui toolkits that change

- a web application backend

- a web application front-end

- a database

- more things I’m forgetting

gutnor · on May 22, 2022

Corollary of that is that MVP for library code is very different than MVP for business code.

MVP for business code is a great way to get the tool in front of the users and get traction, request for more work. Once you release your library, desire for changes basically drops to 0.

It's working. If it's clunky, the clunkiness just gets wrapped into a utility class somewhere deep in the belly your client application with about 1 commit change per year to change the copyright notice.

Similarly your corporate leverage falls to 0. You make a library to save people time, congrats you did it. Every update you ask people to do that does not bring new feature they need reduce your value. Good luck justifying a cosmetic change ROI.

Olreich · on May 23, 2022

If you make a good code and build a bunch of things on top of it, but find out it wasn’t matched to a use case and need to move to a new architecture, that good code is going to be a pain to remove or change. “Good” shouldn’t be conflated with “everywhere”. A lot of bad code is everywhere because fixing it seems too painful.

The linked list module may have a poor interface or an awful bug with workaround. But you can’t change it until the next version of the language in 3 years because everyone relies on it. It’s not good, just a dependency. It needs to change and can’t. I want devs to embrace that changing heavily-relied upon things can be good even if it’s painful.

jackblemming · on May 23, 2022

If you switch from int to float, you don’t “refactor” or “extend” the int type to be a float. You build a new system and switch over.

If you switch from mergesort to quicksort you don’t “refactor” or “extend” mergesort. You write quicksort and replace the calls of mergesort to quicksort.

zokier · on May 22, 2022

> Good code rarely needs to change because it's complete.

Maybe. But on the other hand 99% of code is not good, and thus is more likely to need to change. And that is a reality that we need to deal with unfortunately.

quickthrower2 · on May 22, 2022

I don't think people think like this enough: which is why you will get bespoke double entry accounting code mixed up with all the other business logic in almost every application that needs it.

The idea of making an abstract double entry accounting module that is extensible it's own thing is rare. It's probably Conway's law at play. There is no one responsible for doing so, the teams were set up with their goals and everyone shares that code.

hnxs · on May 22, 2022

good code is code that gets you promoted, increases your networth, and helps you retire faster

depending on your work environment, good code may actually be bad code.

Glavnokoman · on May 23, 2022

Here! I'd say it is most of the time the bad code.

cush · on May 22, 2022

Most any code can be library code given enough growth in a business.

GuB-42 · on May 22, 2022

> Developers from certain languages [Java]...

I am mostly a C++ developer but I have been on some Java projects recently, and I am a bit shocked by the "what if it changes?" culture. Lots of abstraction, weak coupling, design patterns, etc... It looks like a cult to the gang of four.

Of course, it is not exclusive to Java, and these patterns are here for a reason, but I have a feeling that Java developers tend to overdo it.

rr808 · on May 23, 2022

Java is actually a great language. I think the Spring culture ruins it. As you say most of the abstractions are out of control.

ncmncm · on May 23, 2022

A language is nothing but the code it engenders. Bad code, bad language.

A language that evolves "is" the new code that is written in it, with deep legacy substrata written to previous versions. You can have a good top level but an embarrassing legacy. We should always strive to make our legacy embarrassing, because that marks improvement.

If ten-year-old code in your language is not an embarrassment, your language is stagnating.

azth · on May 23, 2022

Modern Java is quite great actually with many cool features like: records, switch expressions, pattern matching, etc.

DerArzt · on May 23, 2022

Yeah, but how many dev's are utilizing anything beyond Java 8?

marginalia_nu · on May 23, 2022

I think a large part of the difference is because interfaces and polymorphism is effectively free in Java, whereas virtual methods in C++ comes at a cost.

Banana699 · on May 23, 2022

Virtual dispatch always has a cost, it's "free" in Java in the sense that it's always done at the VM level, so you might as well just use it, even final methods are just an agreement with the compiler, the VM doesn't care, it will dynamically look up the method in the class hierarchy like God intended. C++ makes it painful and obvious what you're getting yourself into.

The JVM is off course very clever and is, I'm sure, doing tons of shenanigans to reduce the cost, but that's not free, that's someone else investing tons of time and effort and complexity to reduce the cost of a fundamentally expensive operation.

NavinF · on May 23, 2022

> even final methods are just an agreement with the compiler, the VM doesn't care, it will dynamically look up the method

You’re right about the “final” keyword being a placebo, but you got the rest exactly backwards.

The JVM is ridiculously aggressive in optimizing for throughout over latency: It assumes that everything is final and compiles the code with that assumption until proven otherwise. If it sees a method getting overridden, it will go back and recompile all the callers and everything that was incorrectly inlined.

A lot of Java code depends on this. For example if you only load one of several plugins at runtime, there’s no overhead vs implementing that plugin’s feature in the main code base.

pkolaczk · on May 23, 2022

A single plugin case is a kinda optimistic wishful thinking. Sure, this case happens. Sometimes.

But in real code you often have plenty of things like iterators or lambdas, and you'll have many of types of those. So the calls to them will be megamorphic and no JVM magic can do anything about it.

While in C++ world you'd use templates or in Rust you'd use traits, which are essentially zero cost, guaranteed.

benibela · on May 23, 2022

>While in C++ world you'd use templates or in Rust you'd use traits, which are essentially zero cost, guaranteed.

Templates create more code

If the code becomes too large to fit in the cache, it becomes very slow

pkolaczk · on May 23, 2022

Somehow I never noticed it happening in practice. In all the cases where cache was the problem, it was caused by data, not code. CPUs prefetch the code into cache quite well.

NavinF · on May 23, 2022

Are you talking about type erasure and generics? If so I agree, but that’s unrelated to devirtualization

pkolaczk · on May 23, 2022

They are kinda related in a way that Java implementation of generics does not help with devirtualization, while C++ templates / Rust traits do help by not needing virtual calls from the start.

Consider the pre- Java 1.5 sort method:

Collections.sort(List list, Comparator comparator);

If you load more than one Comparator type, then the calls to comparator are megamorphic and devirtualization won't happen unless the whole sort is inlined.

In languages like C++, you'd make it a template and the compiler would always know the target type, so no need for virtual.

ncmncm · on May 23, 2022

Default virtual was among the dumber design mistakes in Java, but it has lots of competition.

imtringued · on May 23, 2022

Why? The JVM has complete knowledge over the entire code base at runtime. It knows which methods require virtual function calls and which ones are just regular function calls. If nothing extends the class, then there will be no virtual functions in the entire class. If something extends the class but it does not override any methods then there again will be no virtual functions. If a class overrides a single method, only that method is going to be a virtual function.

NavinF · on May 23, 2022

See my reply to the GP. Default virtual is the only thing that makes sense given how the JVM works.

ncmncm · on May 23, 2022

You understand the JVM was designed at the same time as the language? It could work any way they liked. And does.

NavinF · on May 23, 2022

Sure, but if you have the same preferences (throughout over latency), you’ll find that there are no performance benefits to be gained from non-virtual functions in any JITed language. The “final” keyword is just there for documentation.

ncmncm · on May 23, 2022

Virtual or not isn't about performance, it is about system architecture. Virtual is structurally about implementation. Exactly to the degree that the public interface matches the inheritance interface, the abstraction is a failure.

At least, if you are being object-oriented, which Java tries to force on you. Of course, you are free to violate that expectation, and sometimes must since Java offers no other means of organization; so if you do, more power to you.

Transfinity · on May 23, 2022

Free except for you've written 3x as much code and it's 10x harder to understand.

marginalia_nu · on May 23, 2022

Eh, it's a bit boilerplaty, but much of that stuff is typically done through an IDE.

Don't know about harder to understand, the entire point is to remove confusing implementation details from callers.

deschutes · on May 23, 2022

I'd rather see how it's implemented.

In my experience enterprisy abstractions are a lot of motion without any progress. They impede change and stymie understanding.

The cynical part of me thinks that is the whole point.

achikin · on May 23, 2022

The issue is that when you are trying to understand/modify someone else code it always comes down to confusing implementation details rather than abstract architecture on top of them.

pkolaczk · on May 23, 2022

Oh yeah, I've heard this so many times - JVM can optimize all that dynamism out. Except in cases when it can't or just won't.

The reality is, it is very far from free. Most Java developers are simply not aware of the real cost. Then they are surprised how the code gets 5x speedup and needs 20x less memory after rewriting it to Rust or C++.

marginalia_nu · on May 23, 2022

You don't have the option of telling Java whether to use "that dynamism". All Java calls are virtual calls, in most simple cases the JIT optimizes it out (i.e. single implementer of an interface), sometimes it can't.

This isn't just speculation about what the JVM does, you can examine the bytecode being generated by the JIT-process to verify whether this optimization happens.

Do you have real-world examples of this 5x speedup from this decade? 20x memory I can sort of see because the GC overhead can be pretty nasty in some edge cases, but I'd expect closer to 1.5-2x speedup from C++ if the Java code is anywhere near well written.

pkolaczk · on May 23, 2022

> Do you have real-world examples of this 5x speedup from this decade?

Does https://github.com/pkolaczk/latte count? Or the Optional cost described on my blog here: https://pkolaczk.github.io/overhead-of-optional/?

(I have more such examples, but many I can't share).

> you can examine the bytecode being generated by the JIT-process to verify whether this optimization happens.

And what do I do with that knowledge if it turns out the optimization didn't happen?

marginalia_nu · on May 23, 2022

> Does https://github.com/pkolaczk/latte count?

This is just Rust code. Where is the equivalent Java code?

Like the 50x-100x memory consumption is highly suspicious. If Java uses 50x the memory consumption compare to C++, how come I can allocate a long[SINT_MAX-10] on a system machine with 32 Gb of RAM? Shouldn't the process require of order 0.8 Tb of RAM if this statement is correct? Or can C++ allocate a 2 bn array of longs in 40 Mb of RAM? If so let me know, I would be very interested in using this novel compression technology.

> Or the Optional cost described on my blog here: https://pkolaczk.github.io/overhead-of-optional/?

Why aren't you using OptionalLong[1]? You shouldn't use Optional<Long>, that's never a good choice. At any rate, nobody should be claiming Java optionals are are free, they're a high level abstraction and absolutely do not belong in hot codepaths.

In general it's fairly easy to construct benchmarks that favor any particular language, which is why you constantly see these blog posts about how high level interpreted languages (JS, PHP, Haskell) are faster than C++.

You can easily construct "comparisons" that make JVM languages look superior to C++ as well, just carelessly allocate throwaway objects of different sizes and lifetimes (like you can in Java), oh no, why is C++ slowing down? Surely there's no heap fragmentation! That's a bad faith benchmark though. It doesn't really demonstrate anything other than that C++ following Java idioms isn't very good.

> And what do I do with that knowledge if it turns out the optimization didn't happen?

The way the JIT works is by aggressively overassuming, and then recompiling with more generalized interpretations of the code when assumptions turn out to be false. But the wider problems of compilers occasionally generating suboptimal instructions isn't something that is Java specific.

[1] https://docs.oracle.com/en/java/javase/12/docs/api/java.base...

pkolaczk · on May 23, 2022

> At any rate, nobody should be claiming Java optionals are are free, they're a high level abstraction and absolutely do not belong in hot codepaths.

1. In this particular case you might be lucky, because someone provided a hand-coded, specialized workaround. But that was not the purpose of that benchmark. And in bigger code bases you often are not that lucky or don't have time to roll your own, so you must rely on generic optimizations. Sure, you may get with Java quite close to C by forgetting OOP and implementing everything on ints or longs in a giant array. But that defeats the purpose of using Java; and that would make it lower-level and less productive than C.

2. Someone form the commenters on Reddit actually tried OptionalLong, and it did not help. See the comments section, there should be a link somewhere.

3. I can use this high-level abstraction in C++ at negligible cost in hot paths.

> This is just Rust code. Where is the equivalent Java code?

You probably won't find exactly equivalent code for software bigger than a tiny microbenchmark. The closest you can get are other tools built for similar purpose e.g. cassandra-stress or nosqlbench. I can assert you that the majority of CPU consumption in those benchmarking tools comes from the database driver, not the tool itself. And comparing tools using a well-optimized, battle tested Java driver with a similar tool using a C++ or Rust driver can already tell you something about the performance of those drivers. Generally I found all of the C++ drivers and the Rust driver for Cassandra are significantly more efficient than the Java one. Fortunately, outside of the area of benchmarking, that might not matter at all because in many cases it is the server that is the bottleneck. Actually all those drivers have excellent performance and have been optimized far more than typical application code out there.

> Like the 50x-100x memory consumption is highly suspicious.

This isn't a linear scaling factor. It applies to this particular case only. And the reason this number is so huge are: 1. the Rust tool runs in a fraction of memory that is needed even for the JVM alone; Rust has a very tiny runtime 2. the Java tools are configured for speed; so they don't even specify -Xmx and just let GC autotuning configure that. And I guess the GC overshoots by a large margin. because it often ends up at levels of ~1 GB. So it could be likely tuned down, but at the expense of speed.

humanrebar · on May 22, 2022

On the other hand, Java projects are better tested and more portable in my experience.

GuB-42 · on May 22, 2022

Better tested, maybe, I don't know but I believe you.

More portable, yes, but it is complicated. Java runs on a VM, so it gets portability from here, if your platform runs the VM, it will run the project. However, as a user, I still had issues with using the right VM on the right version, OpenJDK and Oracle JDK are not completely interchangeable. Messing up with classpath and libraries too. Not so different from C actually, but at the JVM level instead of the platform, the advantage being that it is easier to switch JVM than to switch platform.

nullcipher · on May 22, 2022

more portable.. to what ? are we talking of desktop applications here? I can't remember the last java program I used on the desktop.

magpi3 · on May 22, 2022

Minecraft is written in Java.

astrange · on May 23, 2022

It relies on native libraries like libjwgl so it does still have portability issues.

And most game consoles can’t run Java so they had to totally rewrite it there.

mcosta · on May 23, 2022

Most game consoles are amd64 or ARM. Running java there is trivial.

astrange · on May 23, 2022

Minecraft for 3DS, Xbox 360, etc already exists and is a rewrite in C++. It’s older than current game consoles.

But they don’t have JREs even if they have the right kind of CPU in them.

Benjammer · on May 22, 2022

Reminds me of this classic: https://programmingisterrible.com/post/139222674273/how-to-w...

nocman · on May 23, 2022

Not sure how I missed seeing this before today, but thank you very much for bringing it up. That is a particularly good article. It has a lot of food for thought.

swah · on May 23, 2022

I love tef's writing - and now he takes pictures of crows full time! :)

ChrisMarshallNY · on May 22, 2022

That's great!

Thanks!

strictfp · on May 22, 2022

I think the most important mind shift is from "let's make this extendable by plugins/scriptable so we can modify it while it's live" to "if requirements change, let's just change the source code and redeploy".

I also disagree with the SOLID principles. KISS is more important than adding extra code and sacrificing performance to allow extension without touching the original source files. Unless you goal is explicitly that.

You're trying to write the simplest, most straight-forward encoding of the solution. If you can avoid duplication and make the code read well, you're golden.

theteapot · on May 22, 2022

> "I think the most important mind shift is from "let's make this extendable by plugins/scriptable so we can modify it while it's live" to "if requirements change, let's just change the source code and redeploy"."

People who develop web apps can go ahead and have this epiphany, but this obviously doesn't work where you don't have complete control over "redeploy", which is a large class of software.

PeterisP · on May 23, 2022

Sure, deployment can be tricky, however the big question then is if you make it extendable by plugins or scriptable then is the process you will use for testing, managing and deploying these plugins or scripts any easier?

I have seen cases where you go to pretty much the same effort as the "core" deployment, so the deployment benefits of the configurable option were fake but the complexity costs were real.

Spivak · on May 22, 2022

> if requirements change, let's just change the source code and redeploy

The intractable problem ends up being "fuck, half our code base implicitly depended on the current behavior and now we can't change the it without half our tests failing."

This is why 10,000 ft abstractions can sometimes be nice because now all your business logic exists in a fantasy world of interfaces you control.

mmis1000 · on May 23, 2022

I think "don't make everything plugin-able" doesn't mean "Let's hard glue everything". And if change a single file can break half of unit tests. I think your test is bad at best. (Or there is only integration tests because you already hot glued them?)

doctor_eval · on May 22, 2022

Agree, this is why I like microservices, because they enable exactly this, and make it easy.

Sort out your abstractions at a high level (auth, storage, routing, messaging, etc) and then write small programs to implement the logic. If one of them is wrong - redeploy.

ehnto · on May 23, 2022

My only criticism of this piece, is that it's so dry and well articulated, some might not realize it's satire.

There is some conversation here about the number of instances something has to happen before you should abstract it, which is a handy rule of thumb. You should also consider the tradeoff of complexity, sometimes even if you have 5, 10, 15 snippets of code that are almost the same you still don't need to abstract it, because the differences are not complex to manage, but an abstraction would be.

jitl · on May 22, 2022

There’s a lot of value to being able to command-click a function or method call and jump to one single definition. This substantially reduces friction when reading/understanding code or change sets.

One of the best things about dynamic languages like Typescript is that in these languages you can avoid interface/implementation duality while still being able to mock or test code by using test-only runtime mutation of class instances or modules.

taeric · on May 22, 2022

This has been doable in static languages for a long time now, as well.

abernard1 · on May 22, 2022

You are correct, and for some reference, Mockito 1.0 came out in 2014.

@Mock, @InjectMocks, and the ability to wire up an entire dependency tree has been something Typescripters have still not perfected like crusty Java devs.

b3morales · on May 22, 2022

Depends what you mean by "static". Swapping out a method or class implementation at runtime is pretty inherently a dynamic task.

taeric · on May 23, 2022

Apologies, I assumed "static typed" was a fairly safe assumption.

piyh · on May 22, 2022

Can you put that second sentence into simple english for those imposters such as myself?

Jtsummers · on May 22, 2022

They seem to be referring to "duck typing". The typing principle exemplified by the statement: If it looks like a duck, walks like a duck, and quacks like a duck, then it's a duck. Does it matter that it's a waddling man in a duck costume saying "quack"? Nope, still a duck. You don't need an explicit interface and an explicit declaration that you're implementing it, or to implement it fully, you just need to implement the operations relevant to the use of the object.

You have a procedure or something that you want to test and it takes, as a parameter, a logging service? You don't want to instantiate a full logging service and set up the database, because that's heavyweight for a test and irrelevant for the particular test? Fine, you throw together a quick and dirty logger that answers the method `log` and pass that in instead. No need to know what the precise interface has to be, or implement or stub out all its other capabilities. You know it has a `log` method because the procedure under test uses it, so that's what you give your quick and dirty mock logger. No more, no less.

jitl · on May 23, 2022

In Typescript or other dynamic languages, here's what you need to do to mock a single method on an instance of a class:

    const instance = new MyClass()
    instance.myMethod = () => getMockResult()

Usually test frameworks provide a helper to do this kind of thing for you, but it's easily accomplished with no magic, using built-in language features.

In nominally-typed languages with static type systems, it's more difficult to alter the behavior of specific instances of classes at runtime. Instead, convention is to declare an interface that specifies some set of methods, and then declare a class that implements the interface. Instead of using a concrete type in business logic, you use the interface type instead. Then, in your tests you can use a different class that also implements the interface:

    interface MyInterface {
      myMethod(): ResultType
    }
    
    class MyClass implements MyInterface {
      myMethod() {
        return new ResultType()
      }
    }
    
    class MyInterfaceMock implements MyInterface {
      private myMethodImpl: () => ResultType
      constructor(myMethodImpl: () => ResultType) {
        this.myMethodImpl = myMethodImpl
      }

      myMethod() {
        return this.myMethodImpl()
      }
    }

    const instance: MyInterface = new MyInterfaceMock(() => getMockResult())

The "duality" I'm referring to is that in this pattern, every type needs to be defined twice, first as an interface, and then as an implementation.

As other commenters said, there are tools like Mockito that eliminate the need for this double-definition, and enable the same easy re-definition of method behavior at runtime. My point wasn't that is is impossible in Java or other static languages, just that one of the strengths of a more dynamic language is that flexible runtime behavior alteration is ~2 lines of code, versus the 39769 lines of code in eg Java's Mockito.

onion2k · on May 22, 2022

I've worked with people who would prefer a complex function to generate a list of properties from a data source over a much simpler hardcoded list, on the basis that if a new option is added its easier. They used this pattern for things like asset classes, which admittedly did change about once every 5 years. It made me sad.

scintill76 · on May 23, 2022

Still peeved about times I was forced to do this, when any meaningful change would also necessitate a code change anyway.

lhnz · on May 22, 2022

I think "what if it changes?" can also be used to argue for more concrete, simpler code that is less DRY and therefore can be rewritten or deleted with greater ease.

I wouldn't refer to overly generic code as easier to change.