The amount of broken code that exists in production seems to disagree with that ...

jeffreygoesto · on Sept 4, 2022

Why though? Most broken code I saw is due to not getting the requirements and edge cases right. Getting to know those is communkcation and wanting to know them is experience.

dagss · on Sept 4, 2022

I have seen lots of code where the author did not understand DB transactions (at all!), without idempotency even if it was critical, test suites that call getters and setters for coverage stats but without any actual assertions; those kind of things.

ironmagma · on Sept 4, 2022

I doubt anything someone writes here could change your mind. There are just so many things: knowing when to add tests, checking multiple places when fixing a bug, avoiding mutation, knowing when to do a rewrite, avoiding complexity… these are all things that are just knowing how to code that are difficult.

dahart · on Sept 4, 2022

It was a valid question, no need to try to project hard-headedness. You probably won’t change any minds that way. So how do people come to know these things? Can you learn them in a vacuum sitting by yourself without communicating with anyone? Knowing when and how to add tests is a company-specific task, it’s different every place you’ll work, someone has to tell you. What if the primary reason code breaks really is because people weren’t communicating over the process, requirements, dependencies or design enough? All the things you mentioned are things a mentor and code reviews and documentation are meant to address, i.e., communication.

ironmagma · on Sept 4, 2022

I think all these things can be learned even as a company of one, without ever speaking to your customers. It will take longer and the lessons will be harder to learn, but you will get all of it because it’s just problems with the programmer himself/herself, not problems with other people.

We can “what if” all day.. and yes, mentoring can improve these things but that can be said about lots of things that aren’t caused by poor communication. It just really sounds like you need the answer to be poor communication which is why I reacted that way.

dahart · on Sept 4, 2022

> It just really sounds like you need the answer to be poor communication which is why I reacted that way.

Please. This is out of order. You didn’t react to me originally, and nobody here demanded that the answer is communication. You’re the only one insisting. You were asked a simple question, and the (as yet unanswered) question to you was why you attribute problems in production to coding and not communication, since there are plenty of examples of miscommunication leading to production bugs. This ad-hominem nonsense is you just making assumptions.

I can easily agree that you’re sometimes right, that there are some bugs caused by lack of knowledge or skill or schooling, and that there are bugs caused by individuals and that communication plays little to no role. I can also safely say, after a career that spans about the same length as the article author’s, and after doing decades of both programming and management, that the majority of problems that matter in production and the biggest and worst problems I’ve seen were caused by poor communication. One example would be that I’ve twice watched engineering departments decide to rewrite a large codebase from scratch, and it turned into a many year effort costing many millions of dollars, with thousands of production bugs and issues all resulting from the decision that wasn’t well planned. There are of course also bugs that are both, caused by individual knowledge or skill, but could have been saved with more oversight.

Feel free to share some concrete examples if you have some that you feel demonstrate production issues are cause more often by pure code and are not miscommunication. I’ve seen some and I have no doubt there are some. I’m open to hearing your answer and examples. Keep in mind this thread so far from my perspective is text-book miscommunication; the claim that production bugs means code is the cause and communication is not, is so broad and so vague and so ill-defined that of course I have no idea what you really mean. Feel free to elaborate and illustrate your point more clearly.

ironmagma · on Sept 4, 2022

> I’ve twice watched engineering departments decide to rewrite a large codebase from scratch

This is the problem, not that they communicated about it wrong. If your rewrite fails, it's because you either didn't know what you were getting into (scoping issues, happens with a 1-person team too), didn't have enough follow through (again, not unique to teams), or didn't have a gradual transition plan to the new codebase and thus couldn't devote enough bandwidth to it while retaining customers. The failure of these are not related to communication, they're just execution problems. An executive somewhere failed at these companies, not a team.

dahart · on Sept 4, 2022

> The failure of these are not related to communication, they're just execution problems. An executive somewhere failed at these companies, not a team.

Hard disagree, this is full of assumption. The teams were complicit in failing to plan well enough. I was there. Expecting an executive to handle this, or placing the blame, sounds like a very bad expectation. Execs can’t plan something like this on their own, they rely on the team to even know what needs rewriting and how to do it and how long it’ll take.

(You still haven’t answered the question.)

ironmagma · on Sept 4, 2022

Executives are there to take the blame; that's their whole job. If a team can't do the project an executive decided they were going to do, it's the exec's fault for deciding that's what they should try to do. This is why executives are paid stock/bonuses (because their pay should depend on the company's performance) and it's also why execs are sometimes fired through no fault of their own.

Teams are supposed to be complicit; that's why the company hired execs. If you want a company that has no hierarchy, you can always just not hire any executives. "Execs can’t plan something like this on their own" – well yeah, execs can't do anything substantial on their own. If they did, they would cease to be execs.

By question, I'll assume you mean your challenge to provide examples. Well, let's start with the CVEs. How many thousands of those are caused by poor hygiene in the form of buffer overflows? Then there are the crashes that aren't security bugs, like NullPointerExceptions. Actually, the most common security issue is misconfiguration, and there are a few examples here: https://www.insightsforprofessionals.com/it/leadership/famou.... Now, you can probably explain this away by saying any kind of organizational incompetence is a sign of poor communication. You can probably even argue with some validity that poor hygiene is also a result of poor communication, but this is why I am not really here to change minds.

jeffreygoesto · on Sept 4, 2022

Ok, fair. My understanding of "coding" is much narrower then, I sort those skills more under "sw engineering". But that is really just taste.

ironmagma · on Sept 4, 2022

Idk, I think mutation can be pretty easily classified as a major source of errors in software. Avoiding it is as fundamental IMO as avoiding writing code that just behaves randomly depending on a call to a PRNG, even though many people don’t know this.

I think of SW engineering as taking some process that’s inherently broken and making it work more often than not. Coding seems more like doing the thing correctly in the first place.

end_of_line · on Sept 4, 2022

Still you do have some requirements. Many legacy systems are 'as-is'. Just more or less working source code with huge label 'don't touch it'.

jeffreygoesto · on Sept 4, 2022

True. Question is, what makes you succeed in that situation? Try to elicit why it became what it is or just run? ;)

end_of_line · on Sept 4, 2022

Understanding that you don't really need to understand it all and you need to understand only part of them. If you do try otherwise, you run into overhours :-)

Most popular technique to 'just do the thing' is using 'if my case do #1, else do #2'. Mostly because nobody really understand the business logic. Many reasons, no real one single document with spec, only wikipedia called confluence where you have half truths, barely truths and contradicting each other statements.

In such situation you almost never will understand the logic because there are even no use cases and thus all you can get are regression tests (because there are no use cases list). It gets accumulated over years, many externals / consultants with half-life time 1 year and so continues - chaos in specs (sometimes even transfering from one tool to another one with losing info) and big rotation in companies too dependent on externals.

So instead doing simple if-else, it's better to use wrapper / decorator or even better proxy of chain if more complicated.

It all depends and this textbox is too short to explain the fun facts :-)