Hacker News new | past | comments | ask | show | jobs | submit login
“What if it changes?” (chriskiehl.com)
431 points by goostavos on May 22, 2022 | hide | past | favorite | 293 comments



"What if it changes?" is a reasonable question to ask. But every time you do you are walking a tightrope. My rule of thumb is that we look at what is in use TODAY, and then write a decent abstraction around that. If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction that suits us TODAY not for the future. Bonus points if the abstraction allows us to extend easily in the future, but nothing should be justified with a "what if".

The reason a lot of Java or C# code is written with all these abstractions is because it aids unit testing. But I've come to love just doing integration testing. I still use unit testing to test complex logic, but things like "does this struct mapper work correctly" are ignored, we'll find out from our integration tests. If our integration tests work, we've fulfilled our part of the contract, that's all we care about. Focus on writing them and making them fast and easy to run. It's virtually no different to unit tests but just 10x easier to maintain.


> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction...

That is a good rule of thumb, and I often follow it too. But it does take some discernment to recognize cases where something would benefit from an abstraction or some common code, even if it is only used twice.

I used to work for a company that imported airspace data from the FAA (the US Federal Aviation Administration) and other sources. The FAA has two main kinds of airspace: Class Airspace and Special Use Airspace.

The data files that describe these are rather complex, but about 90% of the format is common between the two. In particular, the geographical data is the same, and that's what takes the most code to process.

I noticed that each of these importers was about 3000 lines of C++ code and close to 1000 lines of protobuf (protocol buffer) definitions. As you may guess, about 90% of the code and protobufs were the same between the two.

It seemed clear that one was written first, and then copied and pasted and edited here and there to make the second. So when a bug had to be fixed, it had to be fixed both places.

There wasn't any good path toward refactoring this code to reduce the duplication. Most of the C++ code referenced the protobufs directly, and even if most of the data in one had the same names as in the other, you couldn't just interchange or combine them.

When I asked the author about this code duplication, they cited the same principle of "copy for two, refactor for three" that you and I approve of.

But this was a case where it was spectacularly misapplied.


I think your example illustrates why it's so important to choose the right way to generalize/share code depending on the circumstances. I've found that when there's a 90% overlap between 2-3 use cases, many people tend to go with "one common code path for all that's shared and then inject the 10% difference in via components/callbacks/config vars". This works reasonably well when the flow of execution is the same and what changes is just the specifics of some of those steps. But if the differences are also in which steps even happen, then in my experience this approach couples the whole thing too tightly and makes it harder to reason about what actually happens in a given configuration.

What I like to do instead is break the shared code paths into a palette of smaller subfunctions/subcomponents and then have each use case have its own high level code path that picks and chooses from these subfunctions: One does ABCDE, another does ACDEX. It makes it supremely easy to reason about what each of them actually do, because they read almost like a recipe. It becomes a sequence of high level steps, some of which are used by several use cases, while others are unique. I've found this way of generalizing is almost "cost free" because it doesn't really couple things at a high level, and it's the kind of readability refactor that you'd often want to do anyway even if the code wasn't being shared.


> ...break the shared code paths into a palette of smaller subfunctions/subcomponents and then have each use case have its own high level code path that picks and chooses from these subfunctions: One does ABCDE, another does ACDEX. It makes it supremely easy to reason about what each of them actually do, because they read almost like a recipe. It becomes a sequence of high level steps, some of which are used by several use cases, while others are unique.

Isn't this just the Command pattern? - https://en.wikipedia.org/wiki/Command_pattern


I love this. Refactor the first time. Remix the rest of the times.


Do you know if there’s a name for this pattern? I admire it all the time in Peter Norvig’s code. It leads to very approachable code.


I don't know if there is an official name, but in my head I call it "helpers/components/mixins are better than frameworks." Or, "if one happens to want to write a framework, one ought to try hard to refactor it 'inside-out' to a set of composable components."

The most important (though not only) issue with frameworks is that you typically can't compose/mix more than one together - every framework is "exclusive" and takes control of the code flow. Whereas "components" can usually be easily mixed with each other, and leave control of the code flow to the programmer.


I generally think of this as the same principle of "prefer composition over inheritence". Leave the top-level free to compose the behaviour it requires rather than inheriting the framework's behaviour, for exactly the reasons you describe.


This is frameworks vs libraries. In the first case the framework is calling the code with config and hooks to change behaviour. In the second case there are common library functions called from completely separate “application” code.


I don't know an official name for it. It seems like it's almost too basic - "subdivide into helper functions" - to make it into the Gang of Four or other design pattern collections. But in my head I'm calling it the "Recipe Pattern"


It sounds like a version of the strategy pattern to me.

https://en.wikipedia.org/wiki/Strategy_pattern


> and it's the kind of readability refactor that you'd often want to do anyway even if the code wasn't being shared.

Couldn't disagree more tbh. Some of the worst code I've ever had to work with has been over abstracted "recipe" code where I'm trying to descern complex processes based off two word descriptions of them in function names.

Doing this too much is a great way to turn a readable 100 line algorithm into a 250 line clusterfuck spread across 16 files.


> Doing this too much

ok, so you're talking about overdoing it. It's still a good approach when done right.


Not really, unless "done right" is for like a 2000 line function or something.

If code is running once in order, there's no reason to break it up into functions and take it out of execution order. That's just stupid.


Martin Fowler, in his book "Refactoring" outlines circumstances were you can leave bad code alone.

Basically if it works and you don't have to touch it to change it, leave it alone.


I think you've completely missed the point.


Oh god that reminds me. Our company did this but for a whole project.

It was back when a bunch of social networks released app platforms after Facebook's success. When hi5 released their platform, rather than refactoring for our codebase to work on multiple social networks... someone ended up just copying the whole fucking thing and did a global rename of Facebook to Hi5.

For the 3rd social network I refactored our Facebook codebase to work with as many as we wanted. But we never reigned in Hi5, because it had diverged dramatically since the copy. So we basically had two completely separate codebases: one that handled hi5, and one that had been refactored to be able to handle everything else (facebook, bebo, myspace, etc)


No bets on which one is buggier. Or which one's bugs (and also their fixes) break more networks.


Hi5 was less buggy because new features were just never ported to it - it was deemed not worth the effort.


I also got this heuristic from Martin Crawford. However I believe it applies to snippets (<100 lines of code at the very most) only, for the reason you gave. But even then, it sometimes happen that you find a bug in a 4 line snippet that you know was duplicated once, and have to hope you can find it through grep or commit history. So while being careful not to over-engineer and apply KISS/YAGNI ('you ain't gonna need it'), one-time duplication can be a pain.


I cannot edit my comment anymore, but I realized Crawford is the Martin of 'Forest Garden' fame. I was obviously meaning Martin Fowler, from the 'Refactoring' book.

Maybe we'll have 'Forest Software' in some time. 'A code forest is an ecosystem where the population of bugs, slugs, trees and weed balance themselves, requiring very little input from the engineer'.


> There wasn't any good path toward refactoring this code to reduce the duplication. Most of the C++ code referenced the protobufs directly, and even if most of the data in one had the same names as in the other, you couldn't just interchange or combine them.

That makes it sound like the problem is more of a spaghetti mess than duplication.

But I think the advice to copy something when you need two versions is supposed to be applied to specific functions or blocks or types. Not entire files. Then it wouldn't have duplicated the geographical code.

It's also important to have a good answer to how you'll identify duplicated bugs. I'm not sure how to best handle that.


If I needed to guess: They probably referenced the protobufs directly, because there are always 2 and "You have to tell it which one!".


What if the FAA updates the code for the coordinates of the one and not the other. Then your abstraction is moot.


Of course not, abstraction works even better there! Every point that differs will have either a conditional, or an abstract part to be implemented by child classes. So the abstraction lets you know at a glance what are the key points to look for.


> If something is used once, ignore any abstractions.

This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once. CS 101 style. :D

The primary reason for building abstractions is not removing redundancy (DRY) nor allowing big changes, but making things simpler to reason about.

It is way simpler to analyze a program that separates input parsing from processing from output formatting. Such separation is valuable even if you don't plan to ever change the data formats. Flexibility is just added bonus.

If the implementation complexity (the "how") is a lot higher than the interface (the "what") then hiding such complexity behind an abstraction is likely a good idea, regardless of the number of uses or different implementations.


Nah, I’ll take your 50 line main() every day over those 10 files with 10 lines of boilerplate and one line of working code each. But at the end of the day you just need to roll with the style of the org you’re working with.

I drop in to Java shops from time to time, and am more than happy to port my simple class structures that make sense and do things into the 18 level hierarchies described in the article. I just assume there is somebody there that is really invested in all those interfaces, adapters, and impls and I’m not here to start silly fights with them. The code will still work no matter how many pieces it’s cut into and how many unnecessary redirections you add so no worries.

But for my own stuff I like to keep things compact and readable.


Where did I write it would be 50 lines of code only? And where do you get the 10:1 boilerplate to real code ratio? Maybe just use a more expressive language if you can't build proper abstractions and need a lot of boilerplate?

And why go so extreme? A main() calling into 3 functions like load, process and save will be still plenty better than a single blob of IO mixed with computation and would still contain no boilerplate.

> I drop in to Java shops from time to time, and am more than happy to port my simple class structures that make sense and do things into the 18 level hierarchies described in the article.

I certainly agree with that, but that has nothing to do with abstraction. Abstraction and indirection are different things. Those terrible FizzBuzz Enterprise like hierarchies are typically a mixture of insufficient abstraction and far too much of indirection. Abstraction reduces complexity, while indirection increases it. AbstractFactoryOfFactoryOfProblems is indirection, not abstraction, contrary to what the name suggests.


And why go so extreme?

I wouldn’t. I’d break it up the same as you, with those 3 functions. After we’d shown we were going to be doing lots of similar things. But given the choice between too complicated and too simple, that’s the direction I’d lean.

Apologies if that wasn’t clear in context.


> And why go so extreme? A main() calling into 3 functions like load, process and save will be still plenty better than a single blob of IO mixed with computation and would still contain no boilerplate.

Sure. But a main() ordered into loading, processing, and saving would be similar amounts of better, despite not using the abstraction of functions.


Code style / formatting is a secondary thing. If someone made the effort of splitting it into well organized 3 pieces, and denoted those pieces somehow (by comments?), that also counts as an abstraction to me, even though it is not my preferred code style.


If you consider organization in general to be abstraction, then I think that might cause some overselling of abstraction and miscommunication with others.

Unless I'm the one using words weirdly here.


It is not just reordering the lines of code.

In order to organize code that way, you need to establish e.g. some data structures to represent input and output that are generic enough that they don't depend on the actual input/output formatting. There you have the abstraction.

The key thing is to be able to understand the processing code without the need to constantly think the data came from CSV delimited by semicolons. ;)


>> If something is used once, ignore any abstractions.

>

> This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once.

I broadly agree with you, but devils advocate time: not all abstractions are at the same level.

Writing a static function `slurp()` that takes in a filename and returns the file contents isn't an abstraction in the same sense as having a `FILE *` type that the caller cannot look into which functions like `fprintf()` and `fscanf()` use to operate on files.

I think an opaque datatypes (like `FILE`) are "more abstract" than static functions defined in the same file you are currently reading.

IOW, "Abstraction" is not a binary condition, it is a spectrum from full transparency to full opacity.

Static functions in C would be full transparency (no abstraction at all).

Opaque datatypes in C would be full opacity (no visibility into the datatype's fields unless you have the sources, which you may not have).

C++ classes would be something in-between (the private fields are visible to the human reading the header).


I agree, and that's why I said that good abstractions are those which have good implementation complexity vs interface complexity ratio. File abstraction is a perfect example of this - a simple concept you can explain in 5 minutes to a kid, but implementations often several thousands lines of code long.

Also, the simpler the interface, usually the more contexts it can be used in. So those abstractions with nice interfaces naturally tend to be more reusable. But I argue this is the consequence, not the primary reason. You probably won't end up with good abstractions by mercilessly applying DRY.


> This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once. CS 101 style. :D

Yes, if the program will be written and tested exactly once, with no change requests to come later, it's perfectly fine to write it as one big main().

It all depends on what the stakeholders need, clear communication with them is the real trick.


Well, what if the program suddenly crashes and gives you a stacktrace pointing to main()? Assuming you were not the original author of the code, you'd have to read most of the code to understand it.

If the main was split into well defined, separate pieces, at least you could quickly rule out quite a lot of complexity. If it crashed in parsing, so wouldn't need to understand the processing logic, etc.

Sure it is easy to read one blob of code, if it is only 100 lines of code. But it is a different story if it is 10000 lines and now you have to figure out which of the 100 variables are responsible for keeping the state of the input parser and which are responsible for "business logic".


But writing it would be harder, no?

I mean if it's only like 50 short lines, that would be okay-ish, but in this case why do it in C and not use perl or awk?(i suppose you want fast text processing, so I won't suggest python). If the processing is hard, then you will need debugging (which is better in segregated functions) and to prototype a bit (unless I'm the only one who does that?).


I think the specific example mentioned might be subjective, but I agree with your point.

In my mind, the common emphasis on the DRY/WET thing with abstractions leads many people to miss the point of abstractions. They’re not about eliminating repetition or removing work, they’re about making the work a better fit for the problem. Code elimination is a common byproduct of abstractions, but occasionally the opposite may happen to.

I see an abstraction as being comprised of a model and a transformation. The villain isn’t premature abstractions, it’s abstractions where the model is no better (or worse!) for the problem than what’s being abstracted over.


I could not agree more with this.

I would add, though, that in my experience you can often identity parts of a design that are more likely to change than others (for example, due to “known unknowns”).

I’ve used microservices to solve this problem in the past. Write a service that does what you know today, and rewrite it tomorrow when you know more. The first step helps you identify the interfaces, the second step lets you improve the logic.

In my experience this approach gives you a good trade off between minimal abstraction and maximum flexibility.

(Of course lots of people pooh-pooh microservices as adding a bunch of complexity, but that hasn’t been my experience at all - quite the opposite in fact)


Microservices is just OOP/dependency-injection, but with RPCs instead of function calls.

The same criticisms for microservices (claims that it adds complexity, or too many pieces) are also seen for OOP.

Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.


I don't think the paragraph metaphor works well since written works are often read front to back, and the organizational hierarchy isn't so important on such a linear medium. There are books that buck the trends and IMO you don't really notice the weirdness once you get going. E.g. books with long sentences that take up the whole paragraph, or paragraphs that take up the whole page, or both at the same time. Some books don't have paragraphs at all, and some books don't have chapters.

Splitting material into individual books makes a little more sense as a metaphor, especially if it's not a linear series of books. You can't just split a mega-book into chunks. Each book needs to be somewhat freestanding. Between books, there is an additional purchasing decision introduced. The end of one book must convince you to go buy the next book, which must have an interesting cover and introduction so that you actually buy it. It might need to recap material in a previous book or duplicate material that occurs elsewhere non-linearly.

A new book has an expected cost and length. We expect to pay 5-20 dollars for a few hundred pages of paperback to read for many hours. We wouldn't want to pay cents for a few pages at a time every 5 minutes. (or if we did, it would require significantly different distribution like ereaders with micropayments or advertising). Some books are produced as serials and come with tradeoffs like a proliferation of chapters and a story that keeps on going.

Anyway, it's a very long way to say that some splitting is merely style, some splitting has deeper implications, the splits can be too big or too small, and some things might not need splits at all.


I'd like to argue against [quote].

[author] uses the [simile] to argue the [argument].

The obvious flaw in the [argument] is of course [counterargument].

[quote]: Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

[author]: Mr_P

[simile]: microservices or smaller classes are like paragraphs in an essay.

[argument]: since no one complains about breaking up an essay into paragraphs, no one should complain about breaking up a system into paragraphs.

[counterargument]: breaking up a system in smaller microservices or classes is not at all like breaking up an essay into paragraphs, which I think this comment has demonstrated.


> Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

There are orders of magnitude different amounts of work in each of these cases. (I’m not saying it’s a lot of work but it’s still significantly more in some of those cases relative to the others.)


Perhaps "break up your book into chapters" is a better metaphor for microservices. Breaking a chapter into paragraphs makes me think more of OO design or functional decomposition.


It’s breaking up into whole books. Each has is stored, distributed, addressed and built separately. You have to become an expert at making the implied overhead efficient, because it will dominate everything you do.


> Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

They would if each paragraph of that essay lived at a different domain/url.


Even if each paragraph was its own file. It's just a bad metaphor.


A microservice contains many classes. Those classes are organized into packages and so many of them are necessarily “public.” The microservice boundary is a new kind of grouping, where even this collection of packages and public classes presents only one small interface to the rest of the architecture. AFAIK this is not a common or natural pattern in OOP and normal visibility rules schemes don’t support or encourage it.


My favorite books are the ones where you read a paragraph and realize, after the fact, that it's just 1 sentence.


  > If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction
I refactor for the second time. I don't like chasing bugs in multiple places.

My rule of thumb is that there are only three quantities in the software development industry: 0, 1 and infinity. If I have more than 1 of something, I support (a reasonable approximation of) infinite quantities of that something.


Agreed, except avoid the term "abstraction". When one starts to talk about abstractions, one stops thinking.

The right word is "generalization", and that's what you are actually doing: you start with a down-to-earth, "solve the problem you've got!" approach, and then when something similar comes up you generalize your first solution.

Perhaps part of the problem is that in OO, inheritance is usually promoting the opposite: you have a base class and then you specialize it. So the base class has to be "abstract" from day one, especially if you are a true follower of the Open Close Principle. I don't know about others, but for me abstractions are not divine revelations. I can only build an abstraction from a collection of cases that exhibit similarities. Abstracting from one real case and imaginary cases is more like "fabulation" than "abstraction".

The opposite cult is "plan to throw away one", except more than just one. Not very eco-friendly, some might say; it does not looks good at all when you are used to spend days writing abstractions, writing implementations, debugging them, and testing them. That's a hassle but at least once you are done, you can comfort yourself with the idea that you can just extend it... Hopefully. Provided the new feature (that your salesman just sold without asking if you could do it, pretending they thought your product did that already) is "compatible" with your design.

The one thing people may not know is how much faster, smaller and better the simpler design is. Simple is not that easy in unexpected ways. In my experience, "future proofing" and other habitual ways of doing things can be deeply embedded in your brain. You have to hunt them down. Simplifying feels to me like playing Tetris: a new simplification idea falls down, which removes two lines, and then you can remove one more line with the next simplification, etc.


Java in particular is missing certain language features necessary for easily changing code functionality. This leads to abstractions getting written in to the code so that they can be added if needed later.

A specific example is getters and setters for class variables. If another class directly accesses a variable, you have to change both classes to replace direct access with methods that do additional work. In other languages (Python specifically), you can change the callee so that direct access gets delegated to specific functions, and the caller doesn't have to care about that refactor.


Getter and setters are unnecessary. The thing that most people are trying to avoid by using these is mutating state. However a getter or setter does nothing to prevent this. A simple `const` keyword goes so much farther than adding useless indirection everywhere.

Edit: I suppose it may be argued that you need to set some other state when you set a member variable. If that's the case, then it's no longer a getter or a setter and the function should be treated differently.


Getters and setters are much more useful when accessing or setting the element should require some other function calls. Caching, memoization, and event-logging are examples where you might want this to happen.

You can say that's not a getter/setter, but then your definition is just different than the people you're responding to.


Caching, memoization, and event-logging can be handled by wrapper objects that implement the interface so the base object doesn't need to contain all these layers of outside concerns. Let each class focus on it's single area of use.

interface Store { Query() }

// these all have the Query() method

type/class MySQL implements Store

type/class Cache implements Store

type/class Logger implements Store

var db Store

db = new Logger(new Cache(new MySQL()))


However Getters/Setters are often the worst place to implement cross-cutting concerns, like caching, memoization and logging.

Of course, in more limited languages/environments they're probably the only tool you have, so there's that.


Getter and setter are not just for keeping state immutable. They allow an api to control _how_ state changes. The most obvious example is maintaining thread-safety in multi-threaded environments.

I get they can be cumbersome, but using them really matters especially as a project grows... an API that has a simple single client today may have many different (and concurrent!) ones tomorrow. The pain of using S&Gs now saves refactoring later.


The number of getters and setters I've written that never got changed into anything more than read/change variable has to be _hundreds_ of times more than the ones that ever did anything else.

At what point is it cheaper to just refactor into getters/setters later when needed? That point _has_ to be miles behind me.


True.

Another problem (from a class/library-consumer point of view) is having getters/setters suddenly becoming more expensive to call, blocking, or even having side effects after an update.

It often only affect the runtime behavior of the code.

Changing the interface, however, will give me a hit that something else has changed.


OOP languages shouldn't need getters and setters because there shouldn't be even a concept of variable access and mutation, just all method calls - that's what OOP is all about, after all, not just putting variables into bags and staying in a procedural mindset.


Smalltalk-style OO anyway: All You Can Do Is Send A Message.

That isn't the only type of OO. Look at CLOS in Common Lisp for a counterexample: https://wiki.c2.com/?HowObjectOrientedIsClos


That will just make everything more convoluted and less flexible. When you send a message over websockets you want a Datatype for each message type. It's not going to have any complicated method calls. You just insert the data or retrieve it on the other side. Since the framework expects you to define setters and getters you do it reluctantly.


I think the concern is: It's currently a getter/setter but might change later.

Maybe for debugging you want to log a callstack every time the field gets accessed, for example.

Or when you set the field, you should invalidate some cached value that uses it.


That's a design choice though -- if you're structuring your code to avoid mutable state, you're not going to have setters. And if you're structuring your code such that you're telling objects what to do, rather than pulling data out of them and acting on them remotely, then you're not necessarily going to have getters either.


To be fair, the the Open-Closed principle is basically an article of faith in Java (along with the rest of SOLID).


The getter setter nonsense is 99% compliance for specific frameworks like Hibernate or shudder, JSF but it caught on and now nobody wants to be seen without using ugly getters and setters which would be perfectly fine if the language natively supported them.


> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better.

That is just as bad as a general rule as "What if it ever changes, we need to abstract over it!". As always: It depends. If the abstraction to build is very simple, like making a magic number a named variable, which is threaded through some function calls, at the same time making things more readable, then I will rather do that, than copying copy and introducing the chance of introducing bugs in the future, by only updating one place. If the abstraction requires me to introduce 2 new design patterns to the code, which are only used in this one case ... well, yes, I would rather make a new function or object or class or whatever have you. Or I would think about my over all design and try to find a better one.

Generally, if one finds oneself in a situation, where one seems to be nudged towards duplicating anything, one should think about the general approach to the problem and whether the approach and the design of the solution are good. One should ask oneself: Why is it, that I cannot reuse part of my program? Why do I have to make a copy, to implement this feature? What is the changing aspect inside the copy? These questions will often lead to a better design, which might avoid further abstraction for the feature in question and might reflect the reality better or even in a simpler way.

This is similar in a way to starting to program inside configuration files (only possible in some formats). Generally it should not be done and a declarative description of the configuration should be found, on top of which a program can make appropriate decisions.


I agree that counting the number of times you repeat yourself is not the right metric to determine whether or not to introduce an abstraction. Abstraction is not compression. But I don't think it depends on how simple any abstraction would be either. Simplicity does play a role for pragmatic reasons of course but it's not the key question in this case.

The key question is whether there is a functional dependency or just a similarity between some lines of code. If there is a functional dependency, it should be modeled as such the first time it is repeated. If there is only coincidental similarity then introducing a dependency is simply incorrect, regardless of how often any code happens to get repeated.


I agree! Maybe one could say: Not all repetitions are of equal nature in terms of what causes them, and to understand the cause is important.


> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction...

As others have said, this is a good rule of thumb in many cases because finding good abstractions is hard and so we often achieve code re-use through bad abstractions.

But really good abstractions add clarity to the code.

And thus, a good abstraction may be worth using when there is only two, or even just once instance of something.

If an abstraction causes a loss of clarity, developers should try to think if they can structure it better.

EDIT: This comment below talks about good example of how a good abstraction adds clarity, while a bad abstraction takes it away: https://news.ycombinator.com/item?id=31476408


When I'm asked "what if it changes?", I usually answer with something like "we'll solve it when, and if, it happens". I'm a fan of solving the task at hand, not more, not less. If I know for sure that we're going to add feature X in a future version, sure I'll prepare my code for its addition in advance. But if I don't know for certain whether something will happen, I act as if it won't. It's fine to refactor your code as the problem it solves evolves. You can't predict future, and you'll have to be able to deal with mispredictions if you try, too.


It takes me twice a long to get my integration tests to work if I don't have unit tests making sure the parts work along the way.

If you write an integration test, and it fails, what's broken?


> It takes me twice a long to get my integration tests to work if I don't have unit tests making sure the parts work along the way.

That's a valid concern, but if your unit-tests are only for making sure that part you just wrote works as expected than just have a test-case up for that specific part, and change it when you move to the next part.

The value of unit-tests is supposed to be regression testing: when you change something that breaks a different unit in a different part of the stack.

> If you write an integration test, and it fails, what's broken?

Well, I debug it it the same way I debug any bug. After all, most bug reports are from a full execution in the field; I am probably already set up to debug the full application anyway[1].

[1] Once a bug reproduction is set up in a fairly automated way.


you know... you debug and find out


The thing about unit tests is that the better they are, the less you have to debug.

A very good one, with effective use of matchers, you don't even have to read the test to know what you did wrong. You just know from the error message what you broke.


> The thing about unit tests is that the better they are, the less you have to debug.

>

> A very good one, with effective use of matchers, you don't even have to read the test to know what you did wrong. You just know from the error message what you broke.

Agree, and agreed. The counterpoint is that unit-tests take time to write and time to maintain - you have to balance that time spent against the time that you would spend debugging an integration test.


Integration tests take far more time to maintain. Major functionality changes can affect all of your tests. With unit tests they may invalidate a few, but that’s okay because they were cheap to begin with.

If your unit tests are hard, you need to refactor your code.


> Integration tests take far more time to maintain.

So? You're going to have them anyway or else you can't deploy.

> Major functionality changes can affect all of your tests. With unit tests they may invalidate a few, but that’s okay because they were cheap to begin with.


Orders of magnitude matter, and if you have a testing pyramid instead of a testing ice cream cone, there can be up to two orders of magnitude difference between the number of unit tests and integration tests.

If you start with unit tests, then the integration tests are just verifying the plumbing. That only changes when the architecture changes, which is hopefully a lot less than how often the requirements change substantially.


Not to mention most unit tests are utterly useless in reality and test things we know to be true (1 + 1 -level nonsense), not real edge cases.

The logic that usually gets ignored in unit tests is the ones that actually needs to be tested, but skipped because it is too difficult and might involve a few trips to the database which makes it tricky (in some scenarios you need valid data to get a valid test result, but you cannot just go grab a copy of production data to run some test).

And then there is the problem of testing related code, packages and artifacts being deployed to production which is really gross in my mind and bloats everything further.

A team I've worked on has resorted to building actual endpoints to trigger test code that live alongside other normal code (basically not a testing framework), so that they could trigger test and "prove the system works" by testing against production data at runtime.


Your message is just a collection of ad-hoc points with no structure, context or justification for any of them.


The "message" is a response to the last paragraph.

>The reason a lot of Java or C# code is written with all these abstractions is because it aids unit testing.

That is the justification for talking about testing. Code is being ripped apart to make it easier to test, while the tests that are used as a justification for ripping apart the code are low quality as 99% of the work in unit testing is thinking of and setting up the test case, not the actual test code.


"Copy code if it's used twice" is terrible advice. You are creating a landmine for future maintainers of your code (often yourself, of course). Someone will inevitably change only one of the two versions at some point in the future and then you're going to have to rely on tests to catch the issue - except that your tests will probably also reflect the duplication and you'll also forget to change the 2nd test.

The only possible justification for duplicating code would be that creating an appropriate abstraction is harder. Given that there are generally economies of testing when you factor out common code, that's usually just not true.

"Duplication is evil" is a more reliable mantra.


It's a rule of thumb, not a hard rule. If I have something I need to use in a separate project, I'm copying it. I'm not going to write a library just so I can import it into 2 different projects.

Yes it means stuff needs to be changed in 2 places. Yes it means someone can change one and not the other. But it also means that each thing can be maintained on it's own without worrying about another. In the early stages you don't know how much can truly be reused and whether you're just cornering yourself. I've had scenarios in the past where we've written an abstraction around some common code and then the third application we want to use it in just does not fit the model we initially thought of. Could we change the library? Yes obviously. But are we going to face the same issue on the 4th project to use this library? Probably. It's a large maintainance load. At some point you end up making breaking changes to the library and you're comitted to either maintaining multiple major versions, or maintaining an abstraction that is supposed to work for every scenario, which can be a huge time sink.

There are tradeoffs to be made. I'd rather lose the maintenance burden of library when consumers have vastly different needs and just take the hit if having to do a Sourcegraph search for usages of some code. This search would need to be done to find all consumers of the code anyway if it was a library. So the end result is rarely different in my experience.


Imo, the correct rule is "copy if it is more likely to be modified separately" and "create one method if it likely has to change at the same time".


Excellent advice! I wish more programmers paid attention to this than just the 2-3 rules. The 2-3 rule tends to create unintentional tight coupling between things that becomes an iron bar that is even more evil to rectify.


>Someone will inevitably change only one of the two versions at some point in the future and then you're going to have to rely on tests to catch the issue

It works both ways

Someone modifies code thats used in both places and breaks othef thing


It is important to carefully look into the functional context where that abstraction is used.

If you are looking for example into System Integration, Data Integration, ETL and so on, not using a canonical format from the beginning, will get you into the type of almost exponential grow in mappings between sources and targets.

https://www.bmc.com/blogs/canonical-data-model/

https://www.enterpriseintegrationpatterns.com/CanonicalDataM...


I think the test pyramid still has legs. Write both.

I do agree a lot of abstractions in C#/Java seems to be testing implementation stuff leaking into the abstraction layer. A lot of inversion of control in these languages seems purely to allow unit testing, which is kind of crazy.

Personally I prefer the "write everything in as functional a style as possible, then you'll need less IoC/DI". This can be done in C# and Java too, especially the modern versions.


I have a general rule:

Once is an incident. Deal with it.

Twice is a co-incident. Deal with it. But keep an eye out for it...

Third time? Ok, this needs properly sorting out.


Mr Bond, they have a saying in Chicago: 'Once is happenstance. Twice is coincidence. The third time it's enemy action' - Goldfinger


I got handed off a little online customer service chat application at a previous job, it had been written by someone I put at a similar skill level as mine but with different personality traits. One of his personality traits was to code to the spec and not consider "what if it changes".

This online chat had two functionalities, chat with a worker a and leave a message from worker with suggestions as to what to look at in response, there was no connection between these two functionalities specified and so my friend had written it without connection, it was difficult without doing a full rewrite to get state information from one part of the application to another one (this was written in JQuery)

Anyway, 6 months+ down the line it got respecified, now it needed shared state between the two parts of the application, which meant either significant rewrite or hacks, so hacks was chosen. Ugly hacks but worked (I think ugly hacks was the correct choice here definitely because chat application almost completely scrapped a year later for a bot)

After I was done I said "but why write it like that?" It was specified no state was needed between the two parts, "yeah but it should be obvious that is going to change, they would keep wanting to add functionality to it and probably share state as to the two communications channels"

tldr: there are some potential changes that seem more likely than others, and the architecture should take those potential changes into consideration.


Wow I always believed this but was too scared to admit because its not fashionable to think this way.


Reminds me of FizzBuzzEnterpriseEdition . https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...

You never know when you might need to change the implementation of how the "Fuzz" string is returned, so you need a FuzzStringReturner.

And you never know when you might need multiple different ways of returning "Fuzz", so you need a FuzzStringReturnerFactory.

And for SOLID it's important to separate concerns, so you want your FuzzStringReturnerFactory separate from your FuzzStringPrinterFactory.

And that barely scratches the surface of what you need!


Yip dealing with a code base right now that's like this. Devs do this for job security. Make the thing incomprehensible to others guarantees you can't be fired right? It also guarantees a promotion right up to team lead or dev manager.

Another note it's not what's the best SOLID design it's what that original dev thinks is the best SOLID design. SOLID in itself is loose enough that you can have designs that vary massively but are still technically SOLID.


A good dev should comment their readable code so well that anyone can drop in and figure it out.

Unfortunately that good dev is no longer employed at this company...


FizzBuzzEnterprise is just "closed to modification" / "open to extension" taken to the extreme on a trivial problem.


Except that sometimes it’s hard to tell when the problem you’re working on is trivial. I saw this pattern used as a source of data for a drop down list of office locations. It should have been “select id, name from office_locations”. It really was that simple. But it was an “enterprise app”, so instead we had 5 classes and so much more.


It would be easier if people were encouraged to find the simplest solutions in their education, rather than being indoctrinated with somebody's long list of Patterns that will somehow make your code Good if you do them enough.


Maybe we just work at completely different kinds of places, but I’ve worked with a ton of people who don’t know what patterns are and very few who over-apply them


Conversely, we have a single React component for displaying locations, and a whole plethora of different parameters to send into the backend call with a variety of different effects. Having a factory or interface pattern here would have been really nice.


Sure, if that’s the use case. But I think the point of this thread is that in many cases we add functionality despite the lack of immediate necessity, and this causes problems.


From the test suite:

    this.doFizzBuzz(TestConstants.INT_14, TestConstants._1_2_FIZZ_4_BUZZ_FIZZ_7_8_FIZZ_BUZZ_11_FIZZ_13_14);
This is hilarious


I haven't seen anyone else point out: One of the ways you can often identify really experienced programmers is by them being able to pretty accurately separate thing-that-probably-will-change from things-that-probably-won't.

Finding a good way to balance YAGNI with this-will-definitely-change-in-a-few-months-because-it-always-does is incredibly hard, and I've really appreciated working with engineers who make that prediction correctly.


I'd say it goes even beyond that into being able to separate the hypothetical future problems that will be a minor irritation from the ones that will wreck your whole month.

As you get to learn more failure modes for software, you start to realize you can't plug all the holes in the dyke, not if you had absolute control of every hand on your team. You can stop four problems or you can hedge against twenty. The problem is that hedging is way harder to explain to people. It's how we ended up with unsatisfying concepts like 'code smells'. It's not evil, it might not even be that bad, but... don't do it anyway.


I think there are also very low hanging fruits that doesn't take more effort but makes something way more future proof. Experienced engineers can identify those, and most of the low hanging fruits seem to be DB design related.


> Finding a good way to balance YAGNI with this-will-definitely-change-in-a-few-months-because-it-always-does is incredibly hard, and I've really appreciated working with engineers who make that prediction correctly.

Is this a generalizable skill? To me, it feels like changes are more often driven by shifting business requirements and ephemeral victories of internal political battles rather than sound technical reasons.


Agreed. I think being able to predict what is likely to change is heavily dependent on knowing the business domain well since that is going to be the driver of change. I think it is a specific application of curiosity. It's also a matter of being forward thinking in general. I think both are skills one can foster and can apply more generally.


That's the point, though. What makes someone senior is being good at everything that isn't writing code. Such as being in tune with the business and how it's demands behave over time.


Since OP works in Amazon he might be familiar with the mental model of "one and two way doors". A one way door is a decision that is impossible or very difficult to undo or change. A two way door is easy to change. The idea is to spend most of your energy on one way door decisions and little in two way doors. This act as a remedy to things like bike shedding. If something is easy to change, just go ahead and do it!

The relevance here is that we can apply this concept in reverse. If we make something easy to change, is close to a two way door. Hence we reduce the time we need to spend on it's design/consideration/ etc.

Personally I like to write code that is more on the flexibler side to increase my optionality. I can then iterate faster throwing things at the wall and change my mind as needed. Of course this flexibility doesn't come for free. Overengineering and cost of carrying are real, so apply your best judgement.


"One and two-way doors" is a nice way to phrase it. I use a similar heuristic to figure out where to spend more effort designing things upfront. For most web apps, the one-way door is usually going to be the database schema - data migrations are trickier to do once you have to deal with real data.

The other big class of "one-way" decisions are with regards to code that live in environments you don't control, e.g. mobile and desktop apps.

One tip I would offer is when building something new, you should try and delay making one-way decisions as long as you can, until you have a clearer picture of how things should work.


This is really good advice. Flexibility is leverage. It’s not a “what if” question, but deliberately preparing for exploration and fast iteration.


This is probably a bit of an aside to the implied problem at hand but.. considering this from the React side of my company's codebase, I wish this question was at least kept softly in mind when designing components. I frequently encounter components that were very clearly created as "one-shot"s that are then unfortunately extended by piling on more props and conditional behavior by the next developers who need something like the current component but ever so slightly different.

Often the solution initially would have been to separate out the presentation side of the component from the behavioral wrapper that chooses what data needs to be shown / what actions are performed by interactions. By the time the component arrives at my lap (because I too need something same same but different), however, it has become a monstrosity that can take a long time to disentangle via ADD (anger driven development).

I think asking oneself a simple question such as "how would someone make this search box work with a different data source?" would probably result in components that are decomposed into simpler, smaller parts that allow for much easier reuse and adaptation.

On the flip side, I'm also of the belief that the second developer to touch a component is necessarily better equipped to answer that question, so the onus should probably be on them to make the proper generalizing changes.

(I'm still trying to figure out how to write a document that expresses this idea more concretely to my coworkers because it often isn't quite this simple..)

¯\_(ツ)_/¯


> On the flip side, I'm also of the belief that the second developer to touch a component is necessarily better equipped to answer that question, so the onus should probably be on them to make the proper generalizing changes.

This.

Given a long enough timeline, pretty much all abstractions fail. People are too timid to replace them when they do, and given enough churn, the code gets out of hand.


I think there are some quick wins that doesn't take too much dev time but makes components reusable or at least easily make it reusable in the future.

- Like you've already said, prefer splitting components into purely presentational components("dumb components") and components with some logics ("smart components")

- If you are using design components, make sure to pass on those props. e.g. if you are using Chakra, pass BoxProps to <Box>.

- try to split out logic into hooks. This can be very specific to the current use case.

These aren't hard things to do but lets you quickly create a generic component by only changing the smart component/hooks or only reuse the dumb components with another specific hook, if the use case is a bit different.


Yes! This is such a great summary of the pains of front-end, although I'm sure these pains transcend any one arm of software development. What laypeople may see as a simple search box is probably a monstrosity of spaghetti code that was coded by a single person under a surprise deadline and interfaces with 3 different APIs (recent suggestions, search-on-type quick results, full search...) don't forget the many different states- focus, active, disabled, loading- necessary to consider and to build styling around. Everything is tightly coupled and semantically horrific because that's what the designers and managers demand for this year's particular flavor of the product's design under whatever deadline has been sprung on those involved in actually building the thing.

I think the interactive nature of these things and years of 'training' that the average computer user has endured to expect how a search box behaves tends to hide the hidden complexity lurking in UI elements everywhere. Another great example is a <select> element.


Most abstractions will accept one layer of ugly hacks for situations they were never meant to deal with. I'd recommend waiting until the second layer of hacks starts to form, then refactoring with what you've learned, since that second layer of hacks is when things start to really fall apart.


Second developer that touched it might be new to the company, or in a deadline, you never know


Right - the former case is one that’s unavoidable and, in the case the first developer didn’t get it quite right, should ideally be covered by code review, but doing a good code review is IMO pretty hard, and giving that sort of feedback requires that the reviewer also understands the problems at hand (which in our case is frequently not the case as our development teams are small and we’ll often have a backend focused dev reviewing another backend focused dev’s frontend work).

And in the latter case, at least in my organization, we don’t often have hard “must meet” deadlines, but instead have more self-imposed deadlines that become externalized and concrete when these “deadlines” (aka ship date estimates) are communicated org wide.

So all I can figure to do is give zoom talks, write documentation that’s shared to all developers, and try to encourage coworkers to take the opportunity to push soft deadlines back if needed to pay down tech debt.


Good code rarely needs to change because it's complete. It's meant to be built on top of, rather than modified for every new consumer. Think standard libraries. There is no reason for the linked list module to ever change unless it's for bug fixes or performance improvements.

Business logic needs to change all the time, because businesses are always changing. This is why we separate it out cleanly, so it can change easily.

Know what type of code you're writing so you can plan and design appropriately.


Meh. Good code is a weasel term. Code can be easy to extend. Easy to change. Easy to throw away. Sometimes tradeoffs are made between those options.

This is like saying a good piece of furniture is easy to add to. I mean, maybe?


To my mind, one of the important aspects is that code must be easy to understand, so as to write it correctly.


Every virtue is easy to argue for in isolation. All else being equal, who wouldn't prefer their code to be easy to read? Or easy to extend, performant, simple, small, secure, packed with features, well tested, and so on.

The trick is that writing code thats easy to understand often takes more time. Making code performant will often make it harder to read, and harder to change. Adding lots of features will make your code harder to change.

Its easy to point at the virtues of good code. And its easy to pick out some personal favorites. But the difference between an intermediate and expert programmer is knowing how (and when) to trade those values off against each other. When you're prototyping, write messy code that you can change easily. When you're writing a web browser, agility doesn't matter as much as security and performance. If lives depend on your code (eg in medical, rocketry, etc) then testing becomes a lot more important. Working with a lot of junior engineers? Try and write code they can understand and maintain more easily. And so on.

Its a fine thing to have a personal style when programming. But the mark of excellence is whether you can adapt your style to suit the actual problem you're trying to solve.


I like the idea of thinking about various code components as furniture!

In practice you can get by with a wild collection if non-matching furniture. It will be non-asthetic but it will work.

Code projects are quite similar in that regard. On the other hand sometimes it will be such a poor fit that it starts breaking things. Does that also apply to furniture?

Perhaps if you put the bed too close to the closet you can no longer open its door all the way...


There are also furniture pieces that are made to connect to each other. Some are obvious like peg boards, but things like sectional couches, too.

Then there are the myriad array of cabinet doors. And folks like heavy built-in cabinets, but Ikea ones are just fine. They typically have better hardware. But nobody wants to be the equivalent of an installer in software, it seems.

Since you said beds, consider getting the wrong size bed for a room. Or mismatched head and foot boards.


Hard disagree. I wish the world worked that way, but it doesn't. So much happens in the time between the product inception and delivery. Initial proposal of value, prototypes, customer feedback and requests, known performance issues to work around, making hooks for non-devs to execute important overrides... the list goes on, and contracts have dates that have to be met. Stuff can't be rewritten to address every turn. This is why Agile (however abused it is) largely beat Waterfall. Software should serve people and yield to their needs. Not the other way around.


I don't know what you're disagreeing with, but I suppose you're welcome to open up a PR to change the standard library with your specific use case so you can meet a contract date. I imagine you'll be quite handedly laughed at.


Many companies I worked at had forks of stanrdadd libraries internally


I've seen this often as well, typically some library will get flag by programs like snyk giving you a "high" score. The way snyk scores packages is completely asinine. It favors libraries that are constantly being updated compared to say some library that is feature complete and in maintenance mode. One way around this is to literally pull all the source code and paste it into your repo.


Same here. And every time that was a mistake and the fork sucks.


Standard libraries was one example you cited for a broad claim about "good code". I'm not arguing about standard libraries.


I have replaced pieces of C++ standard library quite a few times.

std::list calls malloc/free for every node instead of a bunch of them, expensive. Also they are doubly linked, for some applications a single-linked lists fit better.

std::vector<bool> lacks data() method, makes serializing/deserializing them prohibitively expensive.

Even something as simple as std::min / std::max for float and double types aren't using the correct implementation on AMD64, which is a single instruction like _mm_min_ss / _mm_max_sd.


Was going to swoop in to say this!

I would add that sometimes leaving oneself room to expand/respond to changes is just what your code needs. Expansion points. Whatever you want to call it.

The maxim I use is, "avoid being overly specific." If you have polymorphism in your language the less you say about the type of a variable the more places it can be used and the more ways it can be composed with other functions. This requires a style of design that pushes side effects to the edges of the program (consequently where they're the easiest to change).

With this style of programming responding to change is straight-forward to reason about. No need for complicated indirection between objects and tracing behaviors through v-tables. If you are using OOP keep your data and behaviors separate.

The stuff that solidifies rarely changes. The stuff built on it changes a lot. And as time goes on you'll find that refactors will start pushing more upper layers down where they will eventually solidify.


Pointing to a list makes the problem look too easy, because it’s such a clearly defined abstraction. Of course a linked list mostly doesn’t need to change—it’s an easy problem.

Give me a definition of good code that applies to:

- the standard library

- a gui toolkit

- the kernel

- the apps built on top of vendor provided gui toolkits that change

- a web application backend

- a web application front-end

- a database

- more things I’m forgetting


Corollary of that is that MVP for library code is very different than MVP for business code.

MVP for business code is a great way to get the tool in front of the users and get traction, request for more work. Once you release your library, desire for changes basically drops to 0.

It's working. If it's clunky, the clunkiness just gets wrapped into a utility class somewhere deep in the belly your client application with about 1 commit change per year to change the copyright notice.

Similarly your corporate leverage falls to 0. You make a library to save people time, congrats you did it. Every update you ask people to do that does not bring new feature they need reduce your value. Good luck justifying a cosmetic change ROI.


If you make a good code and build a bunch of things on top of it, but find out it wasn’t matched to a use case and need to move to a new architecture, that good code is going to be a pain to remove or change. “Good” shouldn’t be conflated with “everywhere”. A lot of bad code is everywhere because fixing it seems too painful.

The linked list module may have a poor interface or an awful bug with workaround. But you can’t change it until the next version of the language in 3 years because everyone relies on it. It’s not good, just a dependency. It needs to change and can’t. I want devs to embrace that changing heavily-relied upon things can be good even if it’s painful.


If you switch from int to float, you don’t “refactor” or “extend” the int type to be a float. You build a new system and switch over.

If you switch from mergesort to quicksort you don’t “refactor” or “extend” mergesort. You write quicksort and replace the calls of mergesort to quicksort.


> Good code rarely needs to change because it's complete.

Maybe. But on the other hand 99% of code is not good, and thus is more likely to need to change. And that is a reality that we need to deal with unfortunately.


I don't think people think like this enough: which is why you will get bespoke double entry accounting code mixed up with all the other business logic in almost every application that needs it.

The idea of making an abstract double entry accounting module that is extensible it's own thing is rare. It's probably Conway's law at play. There is no one responsible for doing so, the teams were set up with their goals and everyone shares that code.


good code is code that gets you promoted, increases your networth, and helps you retire faster

depending on your work environment, good code may actually be bad code.


Here! I'd say it is most of the time the bad code.


Most any code can be library code given enough growth in a business.


> Developers from certain languages [Java]...

I am mostly a C++ developer but I have been on some Java projects recently, and I am a bit shocked by the "what if it changes?" culture. Lots of abstraction, weak coupling, design patterns, etc... It looks like a cult to the gang of four.

Of course, it is not exclusive to Java, and these patterns are here for a reason, but I have a feeling that Java developers tend to overdo it.


Java is actually a great language. I think the Spring culture ruins it. As you say most of the abstractions are out of control.


A language is nothing but the code it engenders. Bad code, bad language.

A language that evolves "is" the new code that is written in it, with deep legacy substrata written to previous versions. You can have a good top level but an embarrassing legacy. We should always strive to make our legacy embarrassing, because that marks improvement.

If ten-year-old code in your language is not an embarrassment, your language is stagnating.


Modern Java is quite great actually with many cool features like: records, switch expressions, pattern matching, etc.


Yeah, but how many dev's are utilizing anything beyond Java 8?


I think a large part of the difference is because interfaces and polymorphism is effectively free in Java, whereas virtual methods in C++ comes at a cost.


Virtual dispatch always has a cost, it's "free" in Java in the sense that it's always done at the VM level, so you might as well just use it, even final methods are just an agreement with the compiler, the VM doesn't care, it will dynamically look up the method in the class hierarchy like God intended. C++ makes it painful and obvious what you're getting yourself into.

The JVM is off course very clever and is, I'm sure, doing tons of shenanigans to reduce the cost, but that's not free, that's someone else investing tons of time and effort and complexity to reduce the cost of a fundamentally expensive operation.


> even final methods are just an agreement with the compiler, the VM doesn't care, it will dynamically look up the method

You’re right about the “final” keyword being a placebo, but you got the rest exactly backwards.

The JVM is ridiculously aggressive in optimizing for throughout over latency: It assumes that everything is final and compiles the code with that assumption until proven otherwise. If it sees a method getting overridden, it will go back and recompile all the callers and everything that was incorrectly inlined.

A lot of Java code depends on this. For example if you only load one of several plugins at runtime, there’s no overhead vs implementing that plugin’s feature in the main code base.


A single plugin case is a kinda optimistic wishful thinking. Sure, this case happens. Sometimes.

But in real code you often have plenty of things like iterators or lambdas, and you'll have many of types of those. So the calls to them will be megamorphic and no JVM magic can do anything about it.

While in C++ world you'd use templates or in Rust you'd use traits, which are essentially zero cost, guaranteed.


>While in C++ world you'd use templates or in Rust you'd use traits, which are essentially zero cost, guaranteed.

Templates create more code

If the code becomes too large to fit in the cache, it becomes very slow


Somehow I never noticed it happening in practice. In all the cases where cache was the problem, it was caused by data, not code. CPUs prefetch the code into cache quite well.


Are you talking about type erasure and generics? If so I agree, but that’s unrelated to devirtualization


They are kinda related in a way that Java implementation of generics does not help with devirtualization, while C++ templates / Rust traits do help by not needing virtual calls from the start.

Consider the pre- Java 1.5 sort method:

Collections.sort(List list, Comparator comparator);

If you load more than one Comparator type, then the calls to comparator are megamorphic and devirtualization won't happen unless the whole sort is inlined.

In languages like C++, you'd make it a template and the compiler would always know the target type, so no need for virtual.


Default virtual was among the dumber design mistakes in Java, but it has lots of competition.


Why? The JVM has complete knowledge over the entire code base at runtime. It knows which methods require virtual function calls and which ones are just regular function calls. If nothing extends the class, then there will be no virtual functions in the entire class. If something extends the class but it does not override any methods then there again will be no virtual functions. If a class overrides a single method, only that method is going to be a virtual function.


See my reply to the GP. Default virtual is the only thing that makes sense given how the JVM works.


You understand the JVM was designed at the same time as the language? It could work any way they liked. And does.


Sure, but if you have the same preferences (throughout over latency), you’ll find that there are no performance benefits to be gained from non-virtual functions in any JITed language. The “final” keyword is just there for documentation.


Virtual or not isn't about performance, it is about system architecture. Virtual is structurally about implementation. Exactly to the degree that the public interface matches the inheritance interface, the abstraction is a failure.

At least, if you are being object-oriented, which Java tries to force on you. Of course, you are free to violate that expectation, and sometimes must since Java offers no other means of organization; so if you do, more power to you.


Free except for you've written 3x as much code and it's 10x harder to understand.


Eh, it's a bit boilerplaty, but much of that stuff is typically done through an IDE.

Don't know about harder to understand, the entire point is to remove confusing implementation details from callers.


I'd rather see how it's implemented.

In my experience enterprisy abstractions are a lot of motion without any progress. They impede change and stymie understanding.

The cynical part of me thinks that is the whole point.


The issue is that when you are trying to understand/modify someone else code it always comes down to confusing implementation details rather than abstract architecture on top of them.


Oh yeah, I've heard this so many times - JVM can optimize all that dynamism out. Except in cases when it can't or just won't.

The reality is, it is very far from free. Most Java developers are simply not aware of the real cost. Then they are surprised how the code gets 5x speedup and needs 20x less memory after rewriting it to Rust or C++.


You don't have the option of telling Java whether to use "that dynamism". All Java calls are virtual calls, in most simple cases the JIT optimizes it out (i.e. single implementer of an interface), sometimes it can't.

This isn't just speculation about what the JVM does, you can examine the bytecode being generated by the JIT-process to verify whether this optimization happens.

Do you have real-world examples of this 5x speedup from this decade? 20x memory I can sort of see because the GC overhead can be pretty nasty in some edge cases, but I'd expect closer to 1.5-2x speedup from C++ if the Java code is anywhere near well written.


> Do you have real-world examples of this 5x speedup from this decade?

Does https://github.com/pkolaczk/latte count? Or the Optional cost described on my blog here: https://pkolaczk.github.io/overhead-of-optional/?

(I have more such examples, but many I can't share).

> you can examine the bytecode being generated by the JIT-process to verify whether this optimization happens.

And what do I do with that knowledge if it turns out the optimization didn't happen?


> Does https://github.com/pkolaczk/latte count?

This is just Rust code. Where is the equivalent Java code?

Like the 50x-100x memory consumption is highly suspicious. If Java uses 50x the memory consumption compare to C++, how come I can allocate a long[SINT_MAX-10] on a system machine with 32 Gb of RAM? Shouldn't the process require of order 0.8 Tb of RAM if this statement is correct? Or can C++ allocate a 2 bn array of longs in 40 Mb of RAM? If so let me know, I would be very interested in using this novel compression technology.

> Or the Optional cost described on my blog here: https://pkolaczk.github.io/overhead-of-optional/?

Why aren't you using OptionalLong[1]? You shouldn't use Optional<Long>, that's never a good choice. At any rate, nobody should be claiming Java optionals are are free, they're a high level abstraction and absolutely do not belong in hot codepaths.

In general it's fairly easy to construct benchmarks that favor any particular language, which is why you constantly see these blog posts about how high level interpreted languages (JS, PHP, Haskell) are faster than C++.

You can easily construct "comparisons" that make JVM languages look superior to C++ as well, just carelessly allocate throwaway objects of different sizes and lifetimes (like you can in Java), oh no, why is C++ slowing down? Surely there's no heap fragmentation! That's a bad faith benchmark though. It doesn't really demonstrate anything other than that C++ following Java idioms isn't very good.

> And what do I do with that knowledge if it turns out the optimization didn't happen?

The way the JIT works is by aggressively overassuming, and then recompiling with more generalized interpretations of the code when assumptions turn out to be false. But the wider problems of compilers occasionally generating suboptimal instructions isn't something that is Java specific.

[1] https://docs.oracle.com/en/java/javase/12/docs/api/java.base...


> At any rate, nobody should be claiming Java optionals are are free, they're a high level abstraction and absolutely do not belong in hot codepaths.

1. In this particular case you might be lucky, because someone provided a hand-coded, specialized workaround. But that was not the purpose of that benchmark. And in bigger code bases you often are not that lucky or don't have time to roll your own, so you must rely on generic optimizations. Sure, you may get with Java quite close to C by forgetting OOP and implementing everything on ints or longs in a giant array. But that defeats the purpose of using Java; and that would make it lower-level and less productive than C.

2. Someone form the commenters on Reddit actually tried OptionalLong, and it did not help. See the comments section, there should be a link somewhere.

3. I can use this high-level abstraction in C++ at negligible cost in hot paths.

> This is just Rust code. Where is the equivalent Java code?

You probably won't find exactly equivalent code for software bigger than a tiny microbenchmark. The closest you can get are other tools built for similar purpose e.g. cassandra-stress or nosqlbench. I can assert you that the majority of CPU consumption in those benchmarking tools comes from the database driver, not the tool itself. And comparing tools using a well-optimized, battle tested Java driver with a similar tool using a C++ or Rust driver can already tell you something about the performance of those drivers. Generally I found all of the C++ drivers and the Rust driver for Cassandra are significantly more efficient than the Java one. Fortunately, outside of the area of benchmarking, that might not matter at all because in many cases it is the server that is the bottleneck. Actually all those drivers have excellent performance and have been optimized far more than typical application code out there.

> Like the 50x-100x memory consumption is highly suspicious.

This isn't a linear scaling factor. It applies to this particular case only. And the reason this number is so huge are: 1. the Rust tool runs in a fraction of memory that is needed even for the JVM alone; Rust has a very tiny runtime 2. the Java tools are configured for speed; so they don't even specify -Xmx and just let GC autotuning configure that. And I guess the GC overshoots by a large margin. because it often ends up at levels of ~1 GB. So it could be likely tuned down, but at the expense of speed.


On the other hand, Java projects are better tested and more portable in my experience.


Better tested, maybe, I don't know but I believe you.

More portable, yes, but it is complicated. Java runs on a VM, so it gets portability from here, if your platform runs the VM, it will run the project. However, as a user, I still had issues with using the right VM on the right version, OpenJDK and Oracle JDK are not completely interchangeable. Messing up with classpath and libraries too. Not so different from C actually, but at the JVM level instead of the platform, the advantage being that it is easier to switch JVM than to switch platform.


more portable.. to what ? are we talking of desktop applications here? I can't remember the last java program I used on the desktop.


Minecraft is written in Java.


It relies on native libraries like libjwgl so it does still have portability issues.

And most game consoles can’t run Java so they had to totally rewrite it there.


Most game consoles are amd64 or ARM. Running java there is trivial.


Minecraft for 3DS, Xbox 360, etc already exists and is a rewrite in C++. It’s older than current game consoles.

But they don’t have JREs even if they have the right kind of CPU in them.



Not sure how I missed seeing this before today, but thank you very much for bringing it up. That is a particularly good article. It has a lot of food for thought.


I love tef's writing - and now he takes pictures of crows full time! :)


That's great!

Thanks!


I think the most important mind shift is from "let's make this extendable by plugins/scriptable so we can modify it while it's live" to "if requirements change, let's just change the source code and redeploy".

I also disagree with the SOLID principles. KISS is more important than adding extra code and sacrificing performance to allow extension without touching the original source files. Unless you goal is explicitly that.

You're trying to write the simplest, most straight-forward encoding of the solution. If you can avoid duplication and make the code read well, you're golden.


> "I think the most important mind shift is from "let's make this extendable by plugins/scriptable so we can modify it while it's live" to "if requirements change, let's just change the source code and redeploy"."

People who develop web apps can go ahead and have this epiphany, but this obviously doesn't work where you don't have complete control over "redeploy", which is a large class of software.


Sure, deployment can be tricky, however the big question then is if you make it extendable by plugins or scriptable then is the process you will use for testing, managing and deploying these plugins or scripts any easier?

I have seen cases where you go to pretty much the same effort as the "core" deployment, so the deployment benefits of the configurable option were fake but the complexity costs were real.


> if requirements change, let's just change the source code and redeploy

The intractable problem ends up being "fuck, half our code base implicitly depended on the current behavior and now we can't change the it without half our tests failing."

This is why 10,000 ft abstractions can sometimes be nice because now all your business logic exists in a fantasy world of interfaces you control.


I think "don't make everything plugin-able" doesn't mean "Let's hard glue everything". And if change a single file can break half of unit tests. I think your test is bad at best. (Or there is only integration tests because you already hot glued them?)


Agree, this is why I like microservices, because they enable exactly this, and make it easy.

Sort out your abstractions at a high level (auth, storage, routing, messaging, etc) and then write small programs to implement the logic. If one of them is wrong - redeploy.


My only criticism of this piece, is that it's so dry and well articulated, some might not realize it's satire.

There is some conversation here about the number of instances something has to happen before you should abstract it, which is a handy rule of thumb. You should also consider the tradeoff of complexity, sometimes even if you have 5, 10, 15 snippets of code that are almost the same you still don't need to abstract it, because the differences are not complex to manage, but an abstraction would be.


There’s a lot of value to being able to command-click a function or method call and jump to one single definition. This substantially reduces friction when reading/understanding code or change sets.

One of the best things about dynamic languages like Typescript is that in these languages you can avoid interface/implementation duality while still being able to mock or test code by using test-only runtime mutation of class instances or modules.


This has been doable in static languages for a long time now, as well.


You are correct, and for some reference, Mockito 1.0 came out in 2014.

@Mock, @InjectMocks, and the ability to wire up an entire dependency tree has been something Typescripters have still not perfected like crusty Java devs.


Depends what you mean by "static". Swapping out a method or class implementation at runtime is pretty inherently a dynamic task.


Apologies, I assumed "static typed" was a fairly safe assumption.


Can you put that second sentence into simple english for those imposters such as myself?


They seem to be referring to "duck typing". The typing principle exemplified by the statement: If it looks like a duck, walks like a duck, and quacks like a duck, then it's a duck. Does it matter that it's a waddling man in a duck costume saying "quack"? Nope, still a duck. You don't need an explicit interface and an explicit declaration that you're implementing it, or to implement it fully, you just need to implement the operations relevant to the use of the object.

You have a procedure or something that you want to test and it takes, as a parameter, a logging service? You don't want to instantiate a full logging service and set up the database, because that's heavyweight for a test and irrelevant for the particular test? Fine, you throw together a quick and dirty logger that answers the method `log` and pass that in instead. No need to know what the precise interface has to be, or implement or stub out all its other capabilities. You know it has a `log` method because the procedure under test uses it, so that's what you give your quick and dirty mock logger. No more, no less.


In Typescript or other dynamic languages, here's what you need to do to mock a single method on an instance of a class:

    const instance = new MyClass()
    instance.myMethod = () => getMockResult()
Usually test frameworks provide a helper to do this kind of thing for you, but it's easily accomplished with no magic, using built-in language features.

In nominally-typed languages with static type systems, it's more difficult to alter the behavior of specific instances of classes at runtime. Instead, convention is to declare an interface that specifies some set of methods, and then declare a class that implements the interface. Instead of using a concrete type in business logic, you use the interface type instead. Then, in your tests you can use a different class that also implements the interface:

    interface MyInterface {
      myMethod(): ResultType
    }
    
    class MyClass implements MyInterface {
      myMethod() {
        return new ResultType()
      }
    }
    
    class MyInterfaceMock implements MyInterface {
      private myMethodImpl: () => ResultType
      constructor(myMethodImpl: () => ResultType) {
        this.myMethodImpl = myMethodImpl
      }

      myMethod() {
        return this.myMethodImpl()
      }
    }

    const instance: MyInterface = new MyInterfaceMock(() => getMockResult())
The "duality" I'm referring to is that in this pattern, every type needs to be defined twice, first as an interface, and then as an implementation.

As other commenters said, there are tools like Mockito that eliminate the need for this double-definition, and enable the same easy re-definition of method behavior at runtime. My point wasn't that is is impossible in Java or other static languages, just that one of the strengths of a more dynamic language is that flexible runtime behavior alteration is ~2 lines of code, versus the 39769 lines of code in eg Java's Mockito.


I've worked with people who would prefer a complex function to generate a list of properties from a data source over a much simpler hardcoded list, on the basis that if a new option is added its easier. They used this pattern for things like asset classes, which admittedly did change about once every 5 years. It made me sad.


Still peeved about times I was forced to do this, when any meaningful change would also necessitate a code change anyway.


I think "what if it changes?" can also be used to argue for more concrete, simpler code that is less DRY and therefore can be rewritten or deleted with greater ease.

I wouldn't refer to overly generic code as easier to change.


Exactly. It's also easy to demonstrate that if it changes we just change the code, duh!! Simple concrete code is far easier to change.


Yes. That was the joke.


> Never let anyone explore the answer to your "what if it changes?" question. The impact of such a change is irrelevant! For the question to retain its power, fear must live in the imagination. The change's impact must be unstated, horrible, and so fraught with Lovecraftian terrors that only your approach can defend against it.

I see this in a lot of contexts at work, not just defensive coding. Red tape that slows us down a lot, supposedly protecting us against unimaginably horrible things which no one can seem to articulate under questioning.

I mean, look how red that tape is! It must be protecting us against something pretty dangerous. Red means danger, right?


Say you’re piloting a sailboat downwind and there’s an obstacle between you and your destination. You will need to plot a course to circumvent the obstacle. You have a choice to turn the boat toward the wind (edit: not through the the wind, just toward it) to pass the obstacle on one side, or away from the wind to pass on the other side. One of these two options retains optionality at little or no cost, that’s likely to be the better choice. Likewise in software, look for zero cost opportunities to retain optionality (eg. free ways to defer decisions till later).

This is a bit sophisticated to teach junior developers, so we just teach them “consider the implications of your design decisions with regard to supporting future needs, including those not yet known.” Yes, you can certainly over-index on this dimension, but that doesn’t make it a useless or necessarily harmful consideration. (Not implying the article disagrees with this; it does appear to be satire)


In the subset of competitive video games known as fighting games, like Street Fighter, there is a slang term for a very similar tactic, called an “option select”[0]. Put simply: you’re maximising the expected value of your decision, by expanding the possibility space of what can occur after your decision, instead of eagerly (and maybe mistakenly!) committing to any particular possibility and locking yourself into some subset of that possibility space.

Go figure that I’ve overheard and used “option select” in the context of technical decisions at work, among coworkers who play fighting games.

[0]: https://youtu.be/rBLpz369i7Q


> One of these two options retains optionality at little or no cost

I may misunderstand your analogy here, but it seems to me that retaining optionality has a significant up-front cost here that may never be realized?


You turn toward the wind, then you’ll be more upwind until later. So you retain optionality by having “more room to go downwind”. (Please if we have a more experienced sailor here, do explain it better!)

What up-front cost are you thinking of?

A non-sailing version might be say you’re on a bicycle, and you can drop down the mountain now and go lateral later, or go lateral now and drop down later, and either way you arrive at your destination after the same time and effort. Going lateral now retains optionality.

Very open to hearing better analogies for the same concept! Perhaps from the world of business?


I interpreted ‘turning towards the wind’ as actually trying to sail against the direction of the wind (very hard/annoying).

So trying to do that first for gains you may never realize is not a great idea.

Like trying to drive your bike uphill so that may easily go downhill at some later point in time.

The biking example makes it clearer what you meant (to me anyhow).


Ah yes I could have been more clear. Edited.


This is an old debate. The counter-slogan is "You Aren't Going To Need It." [1] All you can say from the outside is "well, it depends what it is."

Often the best way to design for change is to make it easy to edit the code, test it, commit, and deploy, but not everything is a web app.

[1] https://martinfowler.com/bliki/Yagni.html


That was the joke. The author is delightfully subtle.

I worked at a big company that assigned its summer interns to write an online sarcasm detector. At end of summer, they announced they had been completely successful. No one asked for evidence. Clearly the students had understood the assignment perfectly.

Anyway, it was said that this had been assigned. I never spoke to anyone actually involved. Or, admitted it.


I wouldn't say that YAGNI is the counter-slogan. Rather, it's the overarching principle. It's not going to change. You aren't going to need to code for that contingency.


What if it needs to change is better than, "What if it changes?"

In a business environment, Write code that is meant to be extended upon by people other than you.

It's not mindless tyranny, its good design.


Rewriting code is often faster than understanding obtuse code written for requirements that never come.

I once encountered a 5k loc program designed to take the average of a series of Boolean values. The code had been designed such that the input data format could be changed etc etc. unfortunately, of the 5 metrics which could have ever been requested only 2 were implemented. It was to difficult to work with the code to implement the three others ( just finding where to implement them took 3 days).

Ultimately all the extra abstraction abstracted over the wrong things. The 5kloc project was replaced with a 100 line file that did exactly what was needed.


Unless the person other than you is me. I’ll just delete your monstrosity and replace it with a couple hundred lines of focused, simple code and revel in the mostly red diff on GitHub.


That's fine, as long as you are still solving the original problem in the required time. The IRS won't excuse that you missed filing your withholding data on time because you were making the reporting tool easier to extend upon later.


The idea should be that it probably will change, and to prepare for that you need to code it a way that if it does change, you have to update the code only in one or two places, you haven't scattered the knowledge of that detail all over the code.


If you overdo that, you end up with more abstraction than is helpful, which will hurt your ability to refactor as well. When I was younger, I've done a lot of abstractions because doing abstractions was cool. Now as I'm older and wiser, I only add interfaces if it's really needed. Even if I know that there will be some abstraction coming in the future, I felt it more helpful to add the abstraction only right before implementing the second use case for it, because then I have thought a lot about how the second use case is to be implemented.


I think an important skill you pick up after a couple decades is to know the difference between “this is likely to change” and “my gut says what-if-it-changes but my experience tells me YAGNI”

My general approach is to write my code in a way that could easily be turned into an interface once a second or third implementation needs to be introduced and hope a junior doesn’t come along and fuck it up in the meantime. I think one reason people go to the abstract interface early (when there is only one impl) is that they see it as a guard against someone else coming along and changing the well thought out layering. But it doesn’t work and just makes things harder to read and work with.


I don't like this authors approach. Always ask the question, but never add complexity because you assume you know the answer. If you have "multiple layers of indirection" that is "unused" because of paranoia that you might have to refactor your code, then you're doing it wrong. Write code such that you never have to be afraid to refactor it. That's not write code that never has to be refactored. In fact, refactor often and liberally. And if things break when you refactor, then that code is bad code. But don't add worthless layers of indirection because you write code that can't be tested, full of side effects, requiring entire components be rewritten from scratch, etc.

The top commenter says good code rarely needs to be changed. I think thats foolish. Good code is constantly changed, because its good enough it can be.


Try to produce useful interfaces and data structures, even internally. When data is stored make it easy to verify that it's in a recognized, expected, format; possibly with a version scheme that can be extended later.

Historically successful protocols often have a basic feature level and later extend it with features that are not required (at the time) but improve functionality. All the more so when users with accounts need more features than federated (not authenticated) users from other platforms.


Look up Poe's Law.


My personal bugaboo is worrying about if the size of `int` changes. It's not going to change. It's 32 bits. 25 years ago was the end of 16 bits. 25 years ago.

Even if you do want to target real mode DOS, good luck getting your modern app to fit in 64K. Heck, just upper-casing a Unicode string takes 640K.


It's 32 bits only if you ignore microcontrollers and DSPs. Also, the size of 'long int' is definitely not constant if your code needs to be portable between Windows and something else.

I have written some DSP code that was portable between architectures that did not have same sized _char_. Thankfully it was almost 20 years ago and TMS320C55x that had 16-bit char and 16-bit int is AFAIK dead.


> It's 32 bits only if you ignore microcontrollers and DSPs.

I know. But you're not likely to port any apps between microcontrollers, DSPs, and general purpose machines.

> Also, the size of 'long int' is definitely not constant if your code needs to be portable between Windows and something else.

It's also not portable between Linux and Mac OSX. This makes `long int` a completely useless type. Use `int` and `long long`.

> that did not have same sized _char_

I remember one person who responded to me about a DSP that had sizeof(char)==32, and how it was wonderful that the C Standard accommodated that so code could be portable.

I challenged him to port the C implementation of `diff` to it and see how far he gets with it. I predicted total failure :-)

The C and C++ standards would do the programming world a favor by standardizing on:

1. 2's complement

2. fixed sizes for char, short, int and long

3. dumping any character sets other than Unicode

Machines that don't support that will require a customized compiler anyway, and it's very unlikely any code will be portable to it without significant rewrite.


> I remember one person who responded to me about a DSP that had sizeof(char)==32, and how it was wonderful that the C Standard accommodated that so code could be portable.

The support for 32-bit char doesn't automatically make code portable between architectures with different char sizes. But it's what makes it possible to write a C compiler at all for those weird architectures. It certainly doesn't make code that assumes things portable to architectures where those assumptions do not hold.

> The C and C++ standards would do the programming world a favor by standardizing on: 1. 2's complement 2. fixed sizes for char, short, int and long 3. dumping any character sets other than Unicode

This would be a great favour to embedded developers everywhere, because it would finally free us from the C legacy. Rust and Zig look promising, and at some point I even thought D might work but now I understand why it wouldn't. I wonder if you're aware of the standard sized type aliases int8_t, int16_t etc and what's your opinion of them.


> it's what makes it possible to write a C compiler at all for those weird architectures.

Only if one is pedantic. There's no practical problem at all customizing C compiler semantics for weird architectures. After all, everybody did it for DOS C compilers.

> I even thought D might work but now I understand why it wouldn't

People do use it for embedded work. I don't know why you wouldn't think it would work.

> I wonder if you're aware of the standard sized type aliases int8_t, int16_t etc

I am. After all, I wrote stdint.h for Digital Mars C.

> and what's your opinion of them.

There are three of each:

    typedef long int32_t;
    typedef long int_least32_t;
    typedef long int_fast32_t;
1. Too many choices. I have experience with what happens when programmers have too many choices, with the differences between them being slight, esoteric and likely not substantive. They blindly pick one.

2. People have endless trouble with C's implicit integral conversions. This makes it combinatorically much worse.

3. int32_t makes sense for the first hour. Then it become annoying, and looks ugly. `int` is much beter.

4. `int` is 32 bits anyway. No point in bothering with stdint.h.


> Only if one is pedantic. There's no practical problem at all customizing C compiler semantics for weird architectures. After all, everybody did it for DOS C compilers.

Resorting to non-standard extensions locks you in with that specific compiler. This is the exact reason why standards exist in the first place.

> People do use it for embedded work. I don't know why you wouldn't think it would work.

The lead developer seems to have strong knee-jerk reactions to things that he does not understand and limited understanding of what microcontrollers are or what embedded software does. Who really cares about porting diff to a system that doesn't have filesystem or text console?

I agree that integral type promotions are a great way to shoot oneself in the foot, but the other explanations are not really convincing. If you did read the comments you are responding to, you should already know that int is not always 32 bits.


You're already locked in with a specialized compiler for those unusual architectures, and despite being Standard conforming, it still isn't remotely portable.

> you should already know that int is not always 32 bits.

I also wrote a 16 bit compiler for DOS, with 16 bit ints. I know all about it :-) I've also developed 8 bit software for embedded systems I designed and built. I've written code for 10 bit systems, and 36 bit systems.

> Who really cares about porting diff to a system that doesn't have filesystem or text console?

I infer you agree the software is not portable, despite being standard conforming. As a practical matter, it simply doesn't matter if the compiler is standard conforming or not when dealing with unusual architectures. It doesn't make your porting problems go away at all.

I went through the Great Migration of moving 16 bit DOS code to 32 bits. Interestingly, everyone who thought they'd followed best portability practices in their 16 bit code found they had to do a lot of rewrites for 32 bit code. The people who were used to moving between the 16 and 32 bit worlds had little trouble.

C++ is theoretically portable to the 16 bit world, but in practice it doesn't work. Supporting exception handling and RTTI consumes all of the address space, leaving no space left for code. Even omitting EH and RTTI leaves one with a crippled compiler if extensions are not added to support the segmented memory model.

How do I know this? I wrote one. I lived it.


> I also wrote a 16 bit compiler for DOS, with 16 bit ints. I know all about it :-) I've also developed 8 bit software for embedded systems I designed and built. I've written code for 10 bit systems, and 36 bit systems.

I'm not sure how this means that int is 32-bit.

> I infer you agree the software is not portable, despite being standard conforming. As a practical matter, it simply doesn't matter if the compiler is standard conforming or not when dealing with unusual architectures. It doesn't make your porting problems go away at all.

I guess we could venture into arguing what "the software" and "portable" here means. What I mean that I was working on standard-conforming C codebase that worked correctly on both architectures I mentioned above. This is what I consider portable. Having standard-conforming compiler for both does not make the problems go away, but it makes things much easier than having two non-conforming almost-but-not-exactly-C compilers or totally proprietary languages.

I know that DOS was bad. It's been more than 20 years now. Let's get over with it.


https://www.analog.com/en/products/landing-pages/001/sharc-p... has a 32-bit word length. It's been a few years since I worked with them, but the compiler had a "native" and a "compatible" mode -- the first where `char` is 32 bits, the second where we multiply each pointer by four before exposing it to C so we can pretend that we're addressing individual 8 bit char elements within the 32 bit hardware word. Much less efficient, much easier to program for. Don't try running code in compatibility mode without enabling compiler optimisations.


C does require 2’s complement now and removed trigraphs. I’m not sure they have much contact with Unicode since nobody uses the “wide char” stuff.

As for variable sized int types, it’s a nice idea but it’s really unlikely your C program is actually portable across them. You haven’t got good enough test coverage for one thing. If someone wants to do that in a new language users should be required to declare which sizes of char/long they actually expect to work.


In D it is ridiculously simple:

    byte = 8 bits
    short = 16 bits
    int = 32 bits
    long = 64 bits

    add `u` prefix for unsigned version
22 years of experience shows this works very, very well, and is very portable.

The only variable size is size_t and ptrdiff_t, but we're kinda stuck there for obvious reasons.

There have been some issues with printf formats mismatching when moving D code between systems, but we solved that by adding a diagnostic to the compiler to scan the printf format against the argument list and tell you how to fix the format.


I think that makes perfect sense, but for some reason people want their max file sizes and max array sizes to be different on different CPU architectures without testing it or anything.


It goes the other way too. One of my worst memories from Java was when OpenJDK authors decided that they want the maximum array size to be the same (signed int32 range) on different CPU architectures and heap sizes without testing it or anything. It turned out that with enough small objects, that counter overflows. Of course having 2^29 tiny objects around was huge overhead to begin with so I understand why it hadn't been tested and why it doesn't usually happen in real life, but it would have been pretty easy avoid with simple "if it's a size, use size_t" heuristic.

After you accept that size_t can be platform specific and there's nothing scary about it, the format string issue is solved with something like "%z".


I used to be a good citizen and used size_t for all my buffer sizes, data sizes, memory allocation sizes, etc but I have abandoned that completely.

If my program needs to support >4GB sizes I need 64-bit always, makes no sense to pretend "size_t data_size" could be anything else, and if it doesn't need to support that, using 8 byte size_t on 64-bit machines makes to sense either, just wasting memory/cache line space for no reason.


They do have some use as a “currency type” - it’s fine if all your code picks i32, but then you might want to use a library that uses i64 and there could be bugs around the conversions.

And C also gives you the choice of unsigned or not. I prefer Google’s approach here (never use unsigned unless you want wrapping overflow) but unfortunately that’s definitely something everyone else disagrees on. And size_t itself is unsigned.


All software is incorrect given a long enough time frame. Whatever, just get paid and make the problem go away. Be short sighted, it works


I think game developers are at the front lines of changing requirements and that is why they build scriptable engines and not programs perse.

I think engines and libraries are the ultimate solution if you want to program defensively.


I always find it funny that short, not very interesting (imho) articles always get a tonne of discussion here on HN. Reminds me of Parkinson’s Law of Triviality, but that one’s for inside of an organization.


This article is very humorous. It is also a saddening reflection on software development as it is often practiced.


What is sad is all the people who took it dead seriously, and agreed.


I'll take the problem of occasionally propagating a disruptive change through a large codebase over daily wading through a dozen 2-line functions in multiple repos just to make any progress.


It's satire, calm down y'all!


In OOP world things can be hard to change if there is a over reliance on encapsulation and inheritance.

For some people, making things easy yo change means making them more complicate, adding more layers of abstractions and patterns.

I am using a more Data Oriented Approach [1] in which I keep the data separated from code, try to use immutable data structures and try to use methods and functions with as little side affects as possible. I am also trying to architect the software to be modular. I.e. split the code in classes grouped by functionalities, split it in different projects.

And I have success at this even if I am using a language that's very object oriented and being part of a community who was used to put a very high price on object orientation, abstraction and design patterns. I'm referring to C#/.Net.

[1] https://www.manning.com/books/data-oriented-programming

PS. This book and some articles and videos made by others on Data Oriented Programming can be huge helpers and big time savers for me. I'm no longer thinking in objects, I'm thinking in data and functions which process the data.

I'm no longer a prisoner in the Kingdom of Nouns. [2]

[2] http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom...


I wish the author had provided more examples of the kind of code they wish had not been written because "it might change in the future", because I'm not quite sure what they're advocating against.

Wrapping external libraries ? It depends on the level of maturity of the library. I have seen HTTP libraries change their API often enough for a long time. I wrapped them because it did _change_. (Although I'd rather have fetch_my_foo wrapping calls to the lib that does http call, rather than having fetch_my_foo calling a wrapper that calls a lib)

Wrapping input / output ? That can have _some_ uses in testing (Although it might be better to have unwrapped code doing the I/O, calling code that does the logic.)

Hiding a class behind an interface that will only have a single Impl ? It the class is doing I/O, it might very well need at least a test implémentation.

'new FileInputStream(new Buffered stream(new Buffer(new Encoding( new ...' ? Yeah, this is typically the case where a proper API need to have 'File.read("xxx")' from the get go rather than force people to either wrap or share a wrapper.

Etc,etc...


Instead of preparing for possible future requirements we should strive to make the code easy to understand. THEN if the need arises to change it, that becomes easy. Not because the code was designed to handle possible future use-cases, but because it was designed to be easy to understand, and thus easy to modify without causing more bugs.

If it turns out the possible future user-cases never materialize, we still benefit from the easy to understand code when we need to fix bugs in it.

There are two different types of reuse:

a) Supporting multiple use-cases of the same code, with different parameters.

b) Modifying the code to do something else than it does currently.

If the code is easy to understand it is easy to change for both cases. But if you try to make it ready for multiple use-cases from the start it becomes much harder to change it since it is already more complicated than it needs to be, to support just the initial user-case.

There is no Silver Bullet. Code is our tool for dealing with complexity. But it does not remove complexity. And more code you create more complexity you will have.


Reminds me of this talk, "Stop Writing Classes": https://youtu.be/o9pEzgHorH0

Changed my perspective on this newfound tool I'd learned to use called "indirection".

My favorite quote is something like "don't waste 10 minutes now to save yourself 2 minutes later".


I once inherited a codebase where everything was hidden behind two interfaces for no reason whatsoever.

A "Business Object interface" is implemented by a "Business Object implementation" which delegates its method calls to a "Data Access Object interface which defines exactly the same methods as the Business Object interface" which then is implemented by a "Data Access Object implementation" which then does a simple SQL query.

You can argue that the DAO has a right to exist. After all, you are separating concerns and organizing your codebase. The BO only exists to make me suffer through 4 layers of indirection. When I write code I just drop the DAO interface and call the original class DAO. If I ever need to migrate to something else, sure, I could just rename that damn thing to DAOImpl and create an interface with the same name as the original class and the same methods. I never had to.


I want to avoid closing the door on possibilities that I think likely in the future but a lot of that happens where you make commitments to other users such as in an API or in commandline options. I want there to be a way to add a "v2" to the api if I feel that I have to and I like the commandline option that says --build=android rather than --android because I feel I can more smoothly add allowed options that way. This is the area one needs to try to be forward thinking.

One can way overdo even "door opening" and create pain and complexity for users.

As for the code ..... I like to be able to read it. I prefer not to have to jump around every 5 seconds if possible. I like it to be easy to write tests for. If it's easy to test and I have lots of tests then changing it shouldn't be too stressful.

I prefer to do things the simplest way until I have a good reason why it's not enough.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: