Learnings from 5 years of tech startup code audits

jacquesm · on May 26, 2022

That's a very interesting set of findings. What is important to realize when reading this that it is a case of survivorship bias. The startups that were audited were obviously still alive, and any that suffered from such flaws that they were fatal had most likely already left the pool.

In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them. But business case problems, for instance being unaware of the true cost of fielding a product and losing money on every transaction are extremely common. Improper cost allocation, product market mismatch, wishful thinking, founder conflicts, founder-investor conflicts, relying on non-existent technology while faking it for the time being and so on have all killed quite a few companies.

Tech can be fixed, and if everything else is working fine there will be budget to do so. These other issues usually can't be fixed, no matter what the budget.

izacus · on May 26, 2022

I've found that slowdown from tech debt killed as many companies as any other issue. It's usually caused by business owners constantly pivoting, but being too slow on the pivot and too slow to bring customer wishes to fruition (due to poor technical decisions and tech debt) is probably one of the top 5 reasons for dead companies I've seen.

jacquesm · on May 26, 2022

That's a good point, tech debt can be a killer. But the more common pattern that I've seen is that companies that accumulate tech debt but that are doing well commercially eventually gather enough funds to address the tech debt, the companies that try to get the tech 'perfect' the first time out of the gate lose a lot more time, and so run a much larger chance of dying. The optimum is to allow for some tech debt to accumulate but to address it periodically, either by simply abandoning those parts by doing small, local rewrite (never the whole thing all at once, that is bound to fail spectacularly) or by having time marked out for refactoring.

The teams that drown in tech debt tend to have roadmaps that are strictly customer facing work, that can get you very far but in the end you'll stagnate in ways that are not easy to fix, technical work related to doing things right once you know exactly what you need pays off.

bcrosby95 · on May 26, 2022

Maybe once you get to that stage it doesn't really matter. Maybe if you're going for a billion dollar earth shaking idea, it doesn't really matter.

However, I've worked for a small company for quite a while now. We've had several successful projects and several failures.

In my experience, technical debt taken too early can easily double the time it takes you to find out if a project is a dud. That matters to us.

My general rule is: push off technical debt as late as you can. Aways leave code slightly better than you found it. Fix problems as you recognize them.

I think a big mistake developers make is thinking "make code better" should be on some roadmap. You should be making code better every time you touch it. Nothing about writing a new feature says that you have to integrate it in the sloppiest way possible.

jimbokun · on May 26, 2022

> I think a big mistake developers make is thinking "make code better" should be on some roadmap. You should be making code better every time you touch it. Nothing about writing a new feature says that you have to integrate it in the sloppiest way possible.

I vehemently agree.

One of my first jobs was working for a mathematician at a bank, who could code well enough to compute his mathematical ideas, but not a software engineer so hired me to do more of the coding for his team.

He would say "Jim, just get this done but don't spend time making it fancy." In other words, don't spend time refactoring or cleaning up the code, expressed in his own words.

I would say "sure" and then proceed to refactor and clean up the code as I went. It took less time than it would have to write the code "ugly" then deal with the accumulated tech debt, and I finished everything on time so he was happy.

bombcar · on May 26, 2022

Even just #commenting on weirdness you discovered while working on tech-debt code can be invaluable; a little note on the two hours you spent on it could save days later when trying to figure out why something isn't working. Sometimes the problems can't be fixed right then, but you can mark them so later when it does break you have a hint as to what is going wrong.

JamesBarney · on May 26, 2022

What are the technical debt issues you've run into that've crippled your development velocity?

jorangreef · on May 27, 2022

Technical debt has the second order effect of crippling morale.

jakey_bakey · on May 26, 2022

Exactly this. People forget that the point of the metaphor is that debt is a tool you use to grow faster.

Credit card debt (e.g. sloppy code and test-free critical backend processes) is pretty bad and should be paid down ASAP.

Mortgage debt (e.g. no UI tests on the front-end) is quite safe and you can kick the can down the road.

azemetre · on May 26, 2022

In my experience when you don't design and deliver code with testing or accessibility in mind, you end rewriting entire components. This all drastically adds to the end costs. Most leadership thinks this is "efficient" but it's not really. If you do it correct the first time you can consistently deliver features throughout the entire year rather than having to take several months to quickly duct tape everything from falling apart.

I never liked the "debt" metaphor. If a housing developer neglected to build a proper foundation, would you call that "debt?" I feel like it's very similar, it's bad metaphor for a concept that has very little to do with finance.

p_l · on May 26, 2022

That's missing the case where the tech debt results in lowered commercial performance, as things necessary to keep customers happy enough to provide the cash flow are getting harder and harder.

jghn · on May 26, 2022

They’re saying that in their experience this isn’t nearly as common as it is typically portrayed.

nickcox · on May 26, 2022

> They’re saying that in their experience this isn’t nearly as common as it is typically portrayed.

How would you know if the poor commercial performance was due to tech debt or not though? It's the intangibility of tech debt that makes it so insidious.

jghn · on May 26, 2022

Can't speak for the person I was citing. But I think I get what they're saying.

My personal experience has been that tech debt is more often caused by business level decisions and not engineering decisions. Deadline on this contract is next week so let's ship what we got and worry about it later. Hey, good news everyone we just pivoted 180 degrees so let's try to salvage what we've got.

So yes it very well might be that a mountain of tech debt was the final nail in the coffin. But why was that tech debt there in the first place? I was understanding the GP as saying they saw business decisions leading to poor engineering instead of engineering just doing dumb things on their own. I've seen plenty of examples of the latter but a lot more of the former in my travels.

fullstackchris · on May 26, 2022

The double challenge here is doing this all whilst essentially keeping it in the background out of any customer sight. Even if you know exactly what you need after a while from a business perspective, you still need to reimplement it in a way that doesn't cause your product / service / platform to lose customers. I find this always to be an extreme challenge. It's a bit of a treadmill too: doing it this way (without causing breaking changes) certainly takes longer too. So it all piles up into a big messy stack of work :)

nonameiguess · on May 26, 2022

I feel like the biggest problem from a business strategy perspective is all we have are these personal opinions and gut feels. Even this article mentions having done 20 code audits, but presents nothing but qualitative findings. Ideally, some business school out there would be embedding researchers in randomly selected startups to know for sure how often you fail because of tech debt versus failing because of worrying too much about tech debt. That's an empirical question, yet all we get are informed expert opinions, but no auditable, reproducible research evidence. It's all so unscientific.

Not to say you're wrong, but we have no real way of even deciding. All I can do is lean on my own experience, but I've seen nowhere near every product team out there and the ones I have seen are nothing close to randomly sampled or blinded.

theptip · on May 26, 2022

It’s one of those systemic health type things. It’s really hard to die of tech debt on its own, but if you move slower, you’ll die more often from other shocks.

Another way of thinking about it is that you have N months of runway, and based on your velocity you can pull off a pivot in M months, and the more tech debt, the more time it will take to successfully pivot. If you don’t have a full pivot worth of runway remaining, and you need to pivot, you die. (Of course this oversimplifies by holding pivot magnitude equal but hopefully this illustrates the point.)

I do agree that away from the margin, companies that are incredibly successful can afford to punt harder on tech debt. I suppose “know thyself” might be useful advice here; it’s probably not good advice for the median startup to ignore tech debt completely IMO.

I think the main point though is to optimize for agility; tech debt can let you move faster in the short and even medium term, so sometimes it’s right to tactically take on some debt. But not so much that you get bogged down later; make sure you carve out time to fix the stuff that starts to be painful.

phphphphp · on May 26, 2022

I’m not sure I agree. Technical debt is a symptom, it’s the consequence of bad management that leads to working on the wrong things.

If you’re running a startup and haven’t yet found your feet in terms of a product offering, and you’re building your product(s) in such a way that technical debt builds up through continuously layering half-baked on half-baked, it’s indicative that you’re not actually pivoting and not actually evolving, you’re just adding new half-baked ideas to a half-baked system… and being able to do that at twice the speed isn’t going to address the real problem: half-baked ideas don’t make a product, whether that’s 10 half-baked ideas or 100.

My experience is that any company in which evolution/experiments/pivoting is constrained within the boundaries of what already exists because of the sunk cost fallacy has made a grave error at a leadership level, not at a code level. If you can’t validate something without mashing more code into your system, that’s the problem to address.

I’ve seen companies with horrendous tech debt die, and you could certainly frame their death as being a consequence of the tech debt (“if they had just got the perfect system…”) but that assumes the perfect system would somehow prevent them from making the mistakes that got them there in the first place. It wouldn’t. The technical debt is an expression of their mistakes, not the cause. You could dump the perfect system at their feet and they’d be surrounded by garbage again a few years from now.

icedchai · on May 26, 2022

I worked at a company that was mired in tech debt. At least 4 different UI frameworks were in use, one of which was totally not supported. Multiple versions of the app were left accessible, with links from the new to the old, because the new version was not feature complete. "Feature flags" were expressed in at least 3 different ways. It was a nightmare to figure out if something was on or off, and why. The back end was based on a non supported language version, with several old, deprecated third-party packages as a result. The company appeared organized, superficially, but at the lower levels of implementation it was a total dumpster fire.

They were constantly "pivoting", but leaving the old junk around.

redisman · on May 26, 2022

There’s tech debt and then there’s poor engineering leadership. There’s no valid reason for a startup looking for market fit to switch frameworks or feature flags multiple times unless you’re just being clueless and looking for silver bullets. Just pick a few “boring” technologies and you’ll be perfectly capable of building anything “web” for at least a decade without messing around .

icedchai · on May 26, 2022

You are right, but there's always folks pushing for a "better" framework, even if the same old boring stuff works. If one of them is fairly vocal and a little bit persuasive, a new project will start using it... on and on it goes.

goto11 · on May 26, 2022

Technical debt is a sensible strategy when you are a startup aiming for growth. If you become successful, you can hire enough developers to pay back the depth in due time. If you fail, the debt doesn't matter.

Take Facebook: They build an empire on PHP. Now they have built some clever compilers on top on PHP in order to make it safe and performant without breaking their existing piles of legacy code. Overall this is probably ridiculously inefficient compared to just using a safe and performant platform from the beginning. But using PHP in the beginning allowed them to move fast in the critical growth phase.

phphphphp · on May 26, 2022

I really struggle with the analogy of technical debt as equivalent to financial debt. The analogy works great in theory, but it doesn't translate to the real world. The technical decisions we make today will influence the decisions we make tomorrow, and the decisions we make tomorrow will influence the decisions we make next week... and so the system we have a year from now will be layers and layers of deeply interwoven technical debt that you can't just have your accountant pay off at the click of a button.

If we're married to the financial debt analogy, then technical debt has compounding interest like a payday loan... and payday loans are typically used in very distressed circumstances, and are very dangerous. There's appropriate times to take a payday loan, and there's appropriate times to take on technical debt, but it has to be handled with great care and be an immediate wake up call to address the underlying cause.

letitbeirie · on May 26, 2022

Some tech debt behaves like a payday loan with usurious compounding interest.

The more insidious kind behaves more like a completely unhedged call option - fine, until it very suddenly isn't.

goto11 · on May 26, 2022

Yeah, compounding interest is part of the metaphor. As long as you grow faster than the compounding interests, you are good. If the options for a startup is the keep growing or die, then taking on technical debt is reasonable.

Of course it is different for a steady-state company or organization. You need to keep technical dept at a manageable level.

icedchai · on May 26, 2022

The developers you hire may hate you for it, though. And won't stick around. That'll create churn that creates even more debt.

goto11 · on May 26, 2022

Does Facebook have problems recruiting?

icedchai · on May 26, 2022

I think all companies are having problems recruiting in the current environment. FB for a myriad of reasons.

Tabular-Iceberg · on May 26, 2022

>If you become successful, you can hire enough developers to pay back the depth in due time.

How many successful companies actually do this?

At one point in my career I transitioned from a startup to one that had been acquired some 12 years before, and found it to be even more chaotic than the startup. Instead of playing a frantic game of whack-a-mole with all the pivots and feature ideas of the founders, you had a few dozen teams playing whack-a-mole with the pet projects of their respective product managers who were trying to make a name for themselves. Which was much worse because you had to coordinate with every other of those teams, and of course work with all the integrations with the parent company.

Charitably speaking, maybe these older successful companies are bad simply because the field of software engineering was still too immature when they came about, and today's startups will actually pay back their debt when they become successful in the future. Sure, we have better tools now than then, but we still don't have a static analysis tool that can determine if we built the right or wrong thing for an ever changing market.

quickthrower2 · on May 26, 2022

Sounds like a version of the mythical man month. Throwing 100 or 1000 developers at it will not reduce tech debt alone. It is probably harder to eliminate debt with more developers.

randomdata · on May 26, 2022

I have not found working on the wrong things to be problematic so long as you take the time to eliminate the wrong things once they have established themselves as being wrong.

Not taking time is, at heart, where tech debt is born. That can manifest debt across all areas of the development process. Pressure to not take time can certainly come from management, but I have also witnessed on numerous occasions the reverse, where management asks developers to slow down and take the time to do the work well; sometimes to no avail.

Either way, your underlying thesis is quite true that given the perfect system an imperfect team will quickly reintroduce said problems into the system. This is why many software companies have become hyper-picky (even before the tech crash) about hiring. They want to try and avoid building a team that wants to shortcut that time.

JamesBarney · on May 26, 2022

What killed them was they never found PMF. Eventually the tech debt slowed them down so they couldn't take as many swings at finding PMF.

But in the counterfactual if they'd tried really hard to avoid tech debt that would have slowed them down at the beginning, not to mention there are plenty of organizations that will write very complex abstract code to avoid tech debt, and end up making the code base incredibly painful to work with. So overall did get they get less swings?

I've worked on a lot of old code bases and the biggest issues I've run into, issues that crippled development velocity, were 95% boneheaded decisions and overengineering. And never the types of code quality issues someone like Uncle Bob talks about in Clean Code.

spdionis · on May 26, 2022

Well, why are those companies pivoting so often in the first place? Isn't the root cause probably in GP's list?

new_stranger · on May 26, 2022

And keeping 100% of all features instead of removing the least-used features as you add new ones to keep tech debt from growing indefinitely and reaching a point where new features take months to ship.

brainzap · on May 26, 2022

we need data on this

trivialsoup · on May 26, 2022

> What is important to realize when reading this that it is a case of survivorship bias.

This is totally true, but taken too seriously it leads to inability to learn anything from almost any information whatsoever. What’s more, whatever you do (whether you take the advice of those who have gone before or not), you will not be able to decide whether you made good decisions or merely “survived”.

How does one proceed when anything can be survivorship bias and determining cause and effect for large scale operations like running a business is essentially impossible.

(When I say “anything can be survivorship bias” I specifically mean that no matter the cohort you cannot decide whether you’ve accidentally excluded unknown failures, and hence you have no assurance of the actual strength of any analysis you do).

leroman · on May 26, 2022

> In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them.

Not my experience..

What does a “tech failure” look like? Do the servers catch fire? Is the web site down? Maybe people are unable to login to their stations?

Hi-tech business is “Tech”, so the failure of the business is in fact the tech failing. More specifically, the business was unable to direct the tech to solve real problems and solve them well enough.. New hires took too long to onboard.. Engineers were only superficially productive.. Communication between the stake holders and engineers was lacking.. etc.. etc..

Take note that in all scenarios above “work” is being done, “progress” is being made.. ceremonies are everywhere and success is seemingly around the corner.. Or is it?

It’s just very hard to see these issues, they are hidden under layers of meetings, firings, hiring, pivots, milestones with little progress in actual business value.

marginalia_nu · on May 27, 2022

I think the harder you scrutinize the distinction between a tech problem and a business problem, the harder it becomes to find it.

When there appears to be such a distinction, that's usually a manifestation of something like Conway's law, a symptom that there exists an unhealthy organizational divide between business and technology.

leroman · on May 27, 2022

I suppose that is my point, sayings such as "I haven't seen a single start-up failing due to tech" is not possible to defend..

jackblemming · on May 26, 2022

>relying on non-existent technology while faking it for the time being and so on have all killed quite a few companies

This doesn’t count as a tech problem?

jacquesm · on May 26, 2022

Obviously not, it's a problem to decide to fake non-existing tech but more of a management decision than a problem with the technology itself, there is an infinite number of things that don't exist, no matter how much you want them to exist and if you are not capable of coming up with a working solution and rely on the world around you to move fast enough to bail you out then I would say that is a psychological problem more than anything else.

A common theme right now is 'AAI', using people to fake an AI that may not come into being at all, let alone before your runway (inevitably) runs out.

lozenge · on May 26, 2022

I saw one "secretary AI" that schedules meetings over email in your calendar. Just cc it to start using it (once you signed up). The idea seemingly being, fake it with low cost outsourcing to prove there's a demand for this and then make it.

The developers you'd hire to make it an actual AI and the developers you'd hire to make it a Mechanical Turk are very different skill sets.

stonemetal12 · on May 26, 2022

I wouldn't count Theranos' failure a tech problem. I would consider it a fraud problem.

pphysch · on May 26, 2022

I'd hazard a guess that many cases of startup fraud start out as good-faith delusions of grandeur, and only pivot to bad-faith fraud when the founders realize it's the only option to keep the lights on. Because the product results aren't there.

That is, plan A is Stripe, plan B is Theranos.

jacquesm · on May 27, 2022

You may well be on to something here. But I've seen a couple where plan A was Theranos.

mathattack · on May 26, 2022

I was in a startup that failed in part due to tech issues. The AI model just didn’t work. There were a lot of other problems but if the tech worked, they could have easily gotten paying customers.

rr808 · on May 26, 2022

> yet to come across a company where the tech was what eventually killed them

I would think that a poor quality product, or one not as good as competitors would be a big killer. Google, Facebook, Amazon have amazingly superior products. I think you're missing something.

aiisjustanif · on May 26, 2022

> In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them.

How about the cases where it caused fines due to failed security compliance that didn’t help the situation. Thinking fintech companies especially.

jacquesm · on May 27, 2022

I've yet to come across a company killed by fines, are there any examples of those? If anything I think the fines are still much too light.

lupire · on May 26, 2022

The fines are trivial in the US.

thewarrior · on May 26, 2022

Interesting. Have you written more about this somewhere ? If not you should.

jacquesm · on May 27, 2022

Never really thought about it, I'm typically under NDA but in aggregate I could probably do something with this without breaking those NDAs.

dzonga · on May 26, 2022

> Simple Outperformed Smart. As a self-admitted elitist, it pains me to say this, but it’s true: the startups we audited that are now doing the best usually had an almost brazenly ‘Keep It Simple’ approach to engineering.

I wrote about this before that as an industry, we have made writing software complex for complexity's sake.

> imagine, in a world where there wasn't a concept of clean code but to write code as simply as possible. not having thousands of classes with indirection.

what if your code logic was simple functions and a bunch of if statements. not clever right, but it would work.

what if your hiring process was not optimizing for algorithm efficiency but that something simply works reliably.

imagine a world where the tooling used by software engineers wasn't fragile but simple to use and learn. oh the world would be a wonderful place, but the thing is most people don't know how to craft software. but here we're building software on a house of cards [0]

:[0] - https://news.ycombinator.com/item?id=30166677

TrackerFF · on May 26, 2022

Hot take: The current trend of writing code, AND hiring engineers, is the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start - with respect to scalability.

Have you seen the personal blogs of devs today? What should be a simple HTML + CSS website with the simplest hosting option possible, is now written in a framework with thousands of dependencies, containerized, hosted on some enterprise level cloud service using k8s.

That's great and all if you suddenly need to scale your blog to LARGE N number of readers, but the mentality is still persistent - when one should be focused on core features and functionality - in the simplest way possible, you're bogged down with trying to configure and glue together enterprise-level software.

Maybe it's a bit unfair to put it that way - a lot of engineers know the various systems and services in and out, and prefer to do even the simplest things that way. But I've lost count how many times I've encountered devs. that BY DEFAULT start with the highest level of complexity, to solve the simplest problems, for no other reason that "but what if" and "it feels wrong that it should be that easy".

kevin_nisbet · on May 26, 2022

So a couple of thoughts.

> Have you seen the personal blogs of devs today?

I don't know that this is a fair comparison, because side projects can and are often a way to explore ideas, understand tech, play around, etc. So I don't know that I'd agree that it's a great extrapolation to the way an engineer works based on side projects or a blog that may have different objectives.

I do agree with the sentiment though, that we want to be watching for indicators to how a team member approaches problems.

> the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start

I don't know it's fair to say everyone, but is something I agree companies, especially startups should filter for. When I acted as hiring manager, and was trying to build SRE as an example, I would remind candidates, and the team continuously that we're not google. So while we want to bring ideas and approaches in from what google has published as "SRE", we do need to consciously leave large parts out that are appropriate to our needs and stage of maturity.

Calamitous · on May 26, 2022

I disagree that they go down the path they do because they think they’re going to be FAANG-sized, but rather it’s a case of cargo-culting, “we’ll use these tools/architectures because the best companies use them, therefore they must be the best tools/architectures.”

brianlweiner · on May 26, 2022

I don't even know if it's cargo culting as much as engineers using their day jobs as opportunities to learn marketable skills for job hopping.

Nearly ever new technology introduced at places I've worked was because someone was keen to get it onto their resume.

eugenekolo · on May 26, 2022

I believe this might fall under "resume driven development"

lupire · on May 26, 2022

But why are the skills marketable, if no business needs them?

Why are companies hiring people who proudly put Overengineer as their job title on resume?

TrackerFF · on May 26, 2022

New and shiny > old and dull.

At least as far as marketing goes, when trying to hire young and hungry devs. More so at startups.

My old job was quite spread geographically and organizationally - lots of small offices with engineers that had more or less total freedom when it came to tooling. It was actually a gov. agency - so that might surprise someone, but it was one of those places that was transitioning to the digital age, and therefore, didn't really have much solid structure.

The various teams pretty much used the tools they wanted to solve the problems at hand - I think we had three different versioning control systems at play, and multiple different databases. Working with data across the organization was a total nightmare.

But we did have a common platform for communication, sharing stuff, and all that. I think we were around 250 devs. and engineers, and a survey shoved that we used over 20 different programming languages.

One thing I DO remember, was that some people in most teams were constantly pushing for the latest (as in 1-3 years old) tools. Someone's writing an API in Flask? No - screw that. FastAPI where it's at. Team x is still writing RESTful APIs? We're doing GraphQL. And that's how it went.

When some of these guys would end up in dev. blogs or being interviewed, they'd of course push the "See, we're not old and stuffy anymore. We've hired lots of young engineers, and right now we're using [trendy stack]"

AQuantized · on May 26, 2022

Clearly there are businesses that serve millions or billions of users, and have a serious need for engineers with experience with the tools to do so. Engineers seeking those jobs are then motivated to use those tools to engineer systems serving orders of magnitude fewer customers simply so they can claim that experience.

giraffe_lady · on May 26, 2022

I think your general point is true but the personal blogs of devs angle is maybe not the most illustrative one.

We tend to apply industrial strength tools to our personal projects because it's some combination of what we already know, or we're trying to learn or refine an unfamiliar skill.

If you just gave me a linux shell I would not be able to confidently provision a secure webserver for static hosting. But I do know how to write cloudformation and deploy it. Sure this is a personal moral weakness by the standards of HN whatever, but it's where my career has led me so these are the tools I have.

ryandrake · on May 26, 2022

> If you just gave me a linux shell I would not be able to confidently provision a secure webserver for static hosting. But I do know how to write cloudformation and deploy it. Sure this is a personal moral weakness by the standards of HN whatever, but it's where my career has led me so these are the tools I have.

I wouldn't say it's a moral weakness, maybe more of a failing of the tech education ecosystem. It seems bizarre to me that in software, we teach complex high-level things before we teach simple low-level things. Programming students learn very complex high level languages in year 1, and then maybe by year 3 or 4 learn assembly, or what a CPU register is, or how RAM and cache works. It's like teaching a carpenter how to build a high-rise apartment building before teaching them how to measure or use a hammer.

giraffe_lady · on May 26, 2022

Well I didn't have any formal tech education, just what I picked up on the job and through my own curiosity.

But I mean you don't teach a car mechanic metallurgy and aerodynamics, except to the extent they'll need to apply that knowledge towards specific goals. At some point the discipline is mature enough that people genuinely don't need to, and can't, know every level of it from the ground up.

I think coding is approaching or already at the point where "cs/fundamentals of computation" should be a different degree from "professional software development."

lupire · on May 26, 2022

I don't think anyone using Jekyll or whatever for their blog is doing it because they use Jekyll at work.

photochemsyn · on May 26, 2022

side note: FAANG is an obsolete acronym according to The Economist, it's now Microsoft - Amazon - Meta - Apple - Alphabet, leading to the new acronym, MAMAA, which has the nice result that we can now talk about the outsized influence of Big MAMAA in the tech world.

jamil7 · on May 26, 2022

I never understand this idea of picking an arbitrary set of language features and saying, “what if your code logic was simple functions and a bunch of if statements”. The complexity won't magically go away, it'll just appear in a different set of problems.

ryanbrunner · on May 26, 2022

I think it's helpful to divide complexity into complexity in the business logic / problem you're trying to solve, which cannot be eliminated from a pure technical perspective (you should still try to simplify it through discussions with stakeholders though!), and complexity that isn't necessary to solve the problem.

Oftentimes the latter category could be necessary if you were at much higher scale, or if the business evolved in some way, etc., which is where this sort of stuff tends to originate. Just yesterday we were talking at my company about extracting a service in Go, since it's very high scale, very simple, and doesn't change much. On one hand, it's pretty likely we'll need to do that at some point, but on the other, it's not causing any issues right now, so there's not much point in doing it at the moment. Had we gone forward, that would have added complexity for a theoretical concern that may or may not happen in the future.

dzonga · on May 26, 2022

if this was the case we wouldn't have the problem of AbstractFactory that has plagued the Java ecosystem. if this was the case Golang wouldn't be here seeking to simplify things by not having classes. And having __err__ handling like it does. it's not pretty but it works. I pick on the Java because it's ecosystem is broad. However, the over-engineered complexity that resides there makes you wanna stay away.

Too · on May 26, 2022

One could also use this example for arguing the other way. AbstractFactory pattern would not be needed if Java had more rich feature set to begin with, in this particular example anonymous functions. (Which I believe it nowadays have). Patterns emerge when the foundation isn’t solid enough by itself to stand on.

People needed modularity, DI and callback functions (essential complexity) but since the only way to do that with the language was classes, you had to invent AbstractFactory pattern (accidental complexity).

goto11 · on May 26, 2022

Everyone does lip service to simplicity, but in reality simplicity is really difficult.

If you have seven conditions driving a decision, a bunch if's might be the simplest implementation. If you have hundreds of conditions, a tree of if's becomes impenetrable. There is no one-size-fits-all when it comes to simplicity.

Some problems are inherently complex. You can't design a payroll system or tax calculation system which is simpler than the set of rules and regulations it has to implement.

lupire · on May 26, 2022

> If you have hundreds of conditions, a tree of if's becomes impenetrable.

I mean, it worked for Amazon. I saw the code.

goto11 · on May 26, 2022

Fair enough, but you probably wouldn't call is simple.

commandlinefan · on May 26, 2022

> a tree of if's becomes impenetrable

Even in that case, a tree of if's isn't that bad (it's not great), but far worse is when you have the same set of if statements copied and pasted around dozens of places. Because you will forget to update one of them at some point.

thinkharderdev · on May 26, 2022

Thousands of classes with indirection is not clean code and write code as simply as possible is tautology. Of course it should be as simple as possible. The interesting question is what counts as simple.

Setting that aside though, the author seemed to mostly be talking about architectural simplicity in the article. He specifically called out "premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs" which I think is spot on. Distributed systems are fundamentally hard and involve a lot of difficult tradeoffs. But somehow we have convinced ourselves as a profession that distributed systems are somehow easier.

papito · on May 26, 2022

The hardest job as a software engineer is to come up with simple and obvious solutions to a hard problem.

Or you can stitch together eight different cloud services and let someone else debug that crap in prod. Not to mention subpar performance and an astronomical cloud bill.

naijaboiler · on May 27, 2022

It takes a lot of knowledge, experience and smarts to find simple solutions for hard problems

samhw · on May 26, 2022

> write code as simply as possible is tautology

What? That has nothing whatsoever to do with tautology. It's just a statement you agree with. If everyone else agreed with it, it might at most be a truism or an uninteresting statement, but evidently they do not. (They might claim to, but reality shows they optimise for other things - in my experience the simplest work, which does not always mean the simplest code, especially when you're accustomed to the mystic rituals of the Javanese tribes.)

thinkharderdev · on May 26, 2022

Fair enough, maybe tautology is the wrong word, but I do think everyone agrees with it. Who ever says "we need to have more complicated code"? The question is how do you define simplicity, because it is not always obvious. Every overly-abstracted mess I've ever seen was done in the name of "simplicity". Basically, let's add an abstraction so we can "simply" swap in another database in the future, or handle X hypothetical use case by only changing configurations. Likewise, I've seen 1500 line methods with dizzying, incomprehensible control flow that was nevertheless composed entirely of "simple" if/then/else statements. And a well chosen abstraction or two made things much simpler to read, understand and modify.

cgdub · on May 26, 2022

Thousands of classes with indirection is absolutely Clean Code. It's in the book.

nyanpasu64 · on May 26, 2022

PipeWire is an example of building a Linux audio daemon on "microservices, architectures that relied on distributed computing, and messaging-heavy designs":

- It takes 3 processes to send sound from Firefox to speakers (pipewire-pulse to accept PulseAudio streams from Firefox, pipewire to send audio to speakers, and wireplumber to detect speakers, expose them to pipewire, and route apps to the default audio device).

- pipewire and pipewire-pulse's functionality is solely composed of plugins (SPA) specified in a config file and glued together by an event loop calling functions, which call other functions through dynamic dispatch through C macros (#define spa_...). This makes reading the source less than helpful to understand control flow, and since the source code is essentially undocumented, I've resorted to breakpointing pipewire in gdb to observe its dynamic behavior (oddly I can breakpoint file:line but not the names of static TU-local functions). In fact I've heard you can run both services in a single daemon by merging their config files, though I haven't tried.

- wireplumber's functionality is driven by a Lua interpreter (its design was driven by the complex demands of automotive audio routing, which is overkill on desktops and makes stack traces less than helpful when debugging infinite-loop bugs).

- Apps are identified by numeric ids, and PipeWire (struct pw_map, not to be confused with struct spa_dict) immediately reuses the IDs of closed apps. Until recently rapidly closing and reopening audio streams caused countless race conditions in pipewire, pipewire-pulse, wireplumber, and client apps like plasmashell's "apps playing audio" list. (I'm not actually sure how they resolved this bug, perhaps with a monotonic counter ID alongside reused IDs?)

I feel a good deal of this complexity is incidental (queues pushed to in one function and popped from synchronously on the same thread and event callback, perhaps there's a valid reason or it could be removed in refactoring; me and IDEs are worse at navigating around macro-based dynamic dispatch than C++ virtual functions; perhaps there's a way to get mixed Lua-C stacktraces from wireplumber). I think both the multi-process architecture and ID reuse could've been avoided without losing functionality. Building core functionality using a plugin system rather than a statically traceable main loop may have been part of the intrinsic complexity of building an arbitrarily-extensible audio daemon, but I would prefer a simpler architecture with constrained functionality, either replacing pipewire, or as a more learnable project (closer to jack2) alongside pipewire.

layer8 · on May 26, 2022

> we have made writing software complex for complexity's sake.

I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore (like a virus killing its host and thus preventing further spread). This, in turn, means that the software industry tends to continually live on the edge of maximum complexity its members can (barely) handle.

ryandrake · on May 26, 2022

> I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore

I disagree that this is something that "naturally" happens. A lot of this thread is about how adding complexity is either a deliberate choice made by software developers or just that the developer simply was never taught how to do it the simple way--both of which illustrate a gap in software development education. When the tutorial about How To Create a TODO App starts with "Step 1: Install Kubernetes", I'd argue we have an education problem.

layer8 · on May 26, 2022

I’d argue that the fact these choices are being made is natural (otherwise you’d have to explain what the “unnatural” root causes are), and preventing or counteracting them exactly requires the conscious and continuous effort mentioned.

bacza2 · on May 26, 2022

The problem is simple means diffrent thing in a small codebase than in a big one. A bunch of if statements in a code that is small enough to understand everything is ok but when it become big it's hard to understand flow of data.

I do favor simple code but some complexity/abstracion is needed to make it easier to understand

emn13 · on May 26, 2022

But picking the right abstractions that aren't leaky in any of the aspects you really care about is critical, hard to measure (leakiness isn't obvious, nor what kind of aspects you care about), hard to get right, and hard to maintain (because your abstraction may need to evolve, which is extra tricky).

Obviously, getting that right makes subsequent developments much, much easier, but it's hardly a simple route to success.

ItsMonkk · on May 26, 2022

I see tech debt and simplicity as a mixture between 'tyranny of small decisions' and each individual coders 'cleanliness' level.

Each individual coder has a code cleanliness level, similar to how every friend's Mom growing up would always remark "Sorry the house is a mess", when it was spotless. If your used to 9/10 and it's a 7, that looks like a wreck. If you are used to 5 and it's a 7, that looks great. I urge other coders to increase their cleanliness level, and to look for others with high cleanliness for guidance. If you are coding next to people that 5 looks good to them, no matter how much they try to pay down technical debt, they never will.

I think tyranny is ultimately showing us that the tooling that we currently have is making is much trickier than should be to evolve those abstractions. Partially this is because of bad abstractions that caused bad tooling and bad tooling that caused bad abstractions. Because it's so difficult, we don't do it. We take the small decision and work slightly harder in a slightly buggier environment to get the new thing done. But of course now the problem is bigger which means its even less likely for us to ever actually pay down that debt.

> “I’m sorry I wrote you such a long letter. I didn’t have time to write you a short one.” – Blaise Pascal

commandlinefan · on May 26, 2022

> there wasn't a concept of clean code but to write code as simply as possible

Sounds good "on paper" - in fact, is tautologically true - but it's hard to find two people who agree on the definition of "simple". You say "not having thousands of classes with indirection", and I've definitely seen that over-design of class hierarchies create an untouchable mess, but I've seen designs in the other direction (one giant main routine with duplicated code instead of reusing code) that were defended as "simple".

llanowarelves · on May 26, 2022

A lot of complexity comes from premature scaling due to cargo cult or ergonomics.

But I argue a lot of complexity and bugs comes from poor/unclear/conflicting thinking. Especially when it crosses boundaries between multiple developers who had to modify it but didn't truly internalize that part/design of the code.

roflyear · on May 26, 2022

Bunch of if statements can be described as not simple. Some things in code can only be described as simple. Do those things.

mattbillenstein · on May 26, 2022

I've seen most of the architectural problems in consulting - it's amazing how a team of clever engineers can take a simple thing and make it sooo convoluted.

Microservices, craptons of cloud-only dependencies, no way to easily create environments, ORMs and tooling that wraps databases creating crazy schemas... The list goes on and on; if you're early, you could do a lot worse than create a sensible monolith in a monorepo that uses just postgres, redis, and nginx and deploys using 'git pull' ...

claytonjy · on May 26, 2022

The worst architecture I ever saw came from consultants, who built the initial bits of a startup I was hired into. It was nice to have a no-longer-present scapegoat to shake fists at when frustrated, but over time I came to realize their most maddening choices were at the behest of one of our founders, who had no software experience.

teaearlgraycold · on May 26, 2022

I saw the same thing. Founders asking the world of consultants who would try to deliver and then fail to be a responsible engineer. I started my previous job by telling the founders they were asking for the wrong things and the consultants work needed to be thrown out. Thankfully they listened and we ended up with a TypeScript monorepo monolith deployed to Heroku.

bornfreddy · on May 26, 2022

Nitpick: no need for Redis if you have Postgres. It can have comparable performance when similar tradeoffs are used.

dalyons · on May 27, 2022

That’s just not true as a categorical statement. Performance aside redis has all sorts of interesting data types, operations and primitives that pg doesn’t that you might want to leverage. It fulfills a different role

moneywoes · on May 27, 2022

Can you elaborate? Is postgres viable for caching?

commandlinefan · on May 26, 2022

> Microservices, craptons of cloud-only dependencies, no way to easily create environments, ORMs and tooling that wraps databases

So, Spring Boot you mean?

noisy_boy · on May 26, 2022

A Spring Boot service doesn't have to Microservice - you can happily fatten it up into a monolith. Cloud-only dependencies would come into play for Spring cloud (or something that is using cloud specific features) - for a "vanilla" CRUD app, they are not needed. Creating virtual/physical environments is out of Spring Boot's scope and better left to external tools though it has support for separate environments via profiles. ORMs/tooling that wraps database doesn't have to be part of Spring Boot - using Hibernate/JPA isn't mandatory; plain JDBC Template with hand-written SQLs would work fine.

lifeisstillgood · on May 26, 2022

>>> Business logic flaws were rare, but when we found one they tended to be epically bad.

oh yes ...

I always bang on to my junior staff that their job was known as "analyst programmer" for a reason. The analyst part matters probably even more than the programmer part. In large companies just discovering what needs to be coded is 90% of the job, (the securely coding it in the constraints of the enterprise the other 90% while the final 90% is marketing your solution internally)

Anyway .. yes

watwut · on May 26, 2022

> In large companies just discovering what needs to be coded is 90% of the job

Yes, but that is quite massive dysfunction of those companies. Meaning, we can yell at analysts-programmers as much as we want, what really needs to be fixed is the process that makes finding out requirements so ridiculously hard.

And yes, I work in one of those companies, it very clearly is dysfunction.

carlmr · on May 26, 2022

I think this can only change when we, as a society, expect code literacy from every person that finishes high school.

I don't mean expert programmers, but at least being able to read basic pseudocode algorithms.

It's hard to describe a problem if you don't even understand any language.

lifeisstillgood · on May 26, 2022

oh hell yes. Software literacy in my book (30,000 words still no end in sight) is literally, literacy.

Look I automate almost everything i can see. And where I put effort and focus the software that is a force multiplier for my brain (or a bicycle of the mind if you like).

But so often in a large company or normal life, there is a great gulf that the virtual world cannot - yet - cross. ut more and more we shall.

One thing that's just silly is I take photos on my iphone of bills and letters. I cannot be arsed to navigate the awful dropbox API but I would like to store them under "insurance" or whatever. Fuck having some AI monster read the bill. so I played with Pythonista and can just run an action after a photo - and it gets moved. It's my solution, not an app. And that's software literacy - where you can, write, not on paper, but on the world.

sicp-enjoyer · on May 26, 2022

The "magic AI" has undone years of coaching management about software expectations.

commandlinefan · on May 26, 2022

> discovering what needs to be coded is 90% of the job,

But you still have to predict based on a two-sentence description in a JIRA ticket how many "story points" it's going to take with 95% accuracy a dozen times within the span of a single "sprint planning session" every two weeks.

grvdrm · on May 26, 2022

Oh my god - I hadn’t heard the phrase “story points” in a few weeks and now I will have nightmares tonight!

zeristor · on May 26, 2022

This goes with doing the first 90% of the work, then the second 90% of the work then the last 90% of the work.

And engineers multiplying their initial estimate by 3, the project manager then multiplying that by 3 and rounding it up to be ten times more than the initial estimate.

carlmr · on May 26, 2022

>I always bang on to my junior staff that their job was known as "analyst programmer" for a reason.

I can't help but think about Tobias Fünke. Especially with you banging on your junior staff.

OJFord · on May 26, 2022

I suspect it's a British (perhaps commonwealth) colloquialism - 'to bang on [about something]' is to go on and on and on talking about it, with some implication of 'too much' or obsessiveness.

(Also, notice it's 'bang on to' the staff, not 'bang on' them. That is, the staff are the indirect object; the thing which is being said - banged on about - is the direct object.)

lifeisstillgood · on May 26, 2022

Yes, I bang on to my staff (talk endlessly to them) rather than bang my staff (have sex with them) ... or another colloquialism, to "bang my staff" which is a solitary activity that frankly you can guess from here.

carlmr · on May 27, 2022

I'm not sure you watched Arrested Development, but I was thinking about this scene where the character says he sees himself as the world's first analyst and therapist, analrapist for short.

He always says things that are non-sexual but have an almost sexual ring to them. And while I did understand the bang on to from context, it was exactly the kind of thing he would be saying. Together with the analyst and programmer, analpro for short, or something of the sort.

OJFord · on May 28, 2022

And I explained it because (as a native BrE speaker as I suspected the one you replied to was) it didn't have any such ring to me at all, it read perfectly naturally.

(I have seen Arrested Development fwiw. Didn't care for it, but I've seen it.)

etblg · on May 26, 2022

The world's first combined analyst and programmer -- an Analrammer for short.

carlmr · on May 27, 2022

Nice, I was thinking Analpro, but that's also good!

chiefalchemist · on May 26, 2022

> ...just discovering what needs to be coded is 90% of the job,...

Absolutely. The tech part is relatively easy. Deciding what to build, that's where the friction and magic happens.

lupire · on May 26, 2022

Your wording is ambiguous.

Are senior staff also analysts? Why or why not?

huma · on May 26, 2022

> Generally, the major foot-gun that got a lot of places in trouble was the premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs.

Finally, someone said it

wuliwong · on May 26, 2022

It is interesting, I've been at a company for a few years now and we've been slowing trying to break apart the two monoliths that most of the revenue generating traffic passes through. I am on board with the move to microservices. It's been a lot of fun but also a crazy amount of work and time and effort has been spent to do this.

I've pondered both sides of this argument. On one hand if this move had been done earlier it might not have been as difficult a multi-year project. On the other hand, when I look at the Rails application in particular, it was coded SO poorly that I if it was just written better, initially, it wouldn't even need to be deconstructed at this point. Also, if the same engineers that wrote that Rails app tried to write a bunch of distributed, even-driven microservices instead of the Rails app, we would probably be in even worse shape. (ᵔ́∀ᵔ̀)

BackBlast · on May 26, 2022

Have you considered a serious refactor instead of a migration?

I mean, just start with a cleanup session and proceed from there. Work on bit at a time and don't get too far from a working system.

ravedave5 · on May 26, 2022

Are you me? o_0. Shockingly similar situation.

thatsnotmepls · on May 26, 2022

You two might be colleagues lol

galdosdi · on May 26, 2022

Usually a link to a humorous YT video would be inappropriately uninteresting on HN, but this classic and brief satire of microservices is actually quite on point about precisely what is so dangerous about a microservices architecture run amok

https://www.youtube.com/watch?v=y8OnoxKotPQ

Summary: really trivial sounding feature requests like displaying a single new field on a page can become extremely difficult to implement and worse, hard to explain to stakeholders why.

suzzer99 · on May 26, 2022

This was 100% true for that startup I worked for as a side job. They would have been so much better off just building a standard java, PHP or .NET back end and calling it a day.

The head engineer (who had known the guy funding the thing since childhood) had no clue how node, stateless architecture, or asynchronous code worked. He had somehow figured out how to get access to one particular worker of a node instance, through some internal ID or something, and used that to make stateful exchanges where one worker in some microservice was pinned to another. Which goes against everything node and microservices is about.

I tried to talk some sense into them but they didn’t want to hear it. So for the last six months I just did my part and drained the big guy’s money like everyone else. I hate doing that - way more stressful than working your ass off.

7speter · on May 26, 2022

Its kind of discouraging to see the part where he says almost no one gets web tokens right the first time. Working on projects as someone entering the industry, its pretty clear that security is the most important part of a web app, and its so seemingly easy to get woefully wrong, especially if you’re learning this stuff on your own and trying to build working crud projects

photon12 · on May 26, 2022

It's a chicken egg problem. Developers use JWTs because it's what they think they know. Companies build libraries to support what developers are using. Security engineers say JWTs are easy to screw up [1]. Newer frameworks offer ways to move off of JWTs. New programming language comes out. New frameworks built for that programming language. What is someone most likely to build first as an integration? What developers are using. JWTs become defacto for a new framework. Security engineers report the same bugs they've seen. Even more languages and frameworks come out. Rinse. Lather. Repeat. Write up the same OAuth bug for the 15th time.

[1] http://cryto.net/~joepie91/blog/2016/06/19/stop-using-jwt-fo...

Edit: I was actually writing this code tonight myself for a project instead of it already being baked into the platform framework because SSO is only available as an "enterprise" feature and it's $150 a month for static shared password authentication. So market forces incentivize diverging standards.

throwaway2037 · on May 26, 2022

That flow chart in the shared link is very funny! Just this year, I was forced to migrate to a new internal authentication framework that... drumroll... uses JWTs for session management. Google tells me that it was already discussed on HN here: https://news.ycombinator.com/item?id=18353874

doublerebel · on May 26, 2022

JWTs solve problems about statelessness. Most companies don’t have these problems and are better off with stateful basic auth tokens/cookies that are widely understood and supported and can be easily revoked.

Also, signed and/or encrypted communication is usually easier to implement without involving JWTs.

Best thing to do in security is to not roll your own and instead use trusted libraries that have industry-reviewed sane defaults. One way to check: look at the issues and PRs in the public repo and see if security-focused issues are promptly addressed, especially including keeping docs up-to-date. Security professionals are pedantic (for good reason).

samhw · on May 26, 2022

Asymmetric cryptography solves problems of statelessness: i.e. encrypt your sensitive|read-only data with your public key, decrypt it with your private key, beep boop, you can now use your client as a database. JWTs are a whole other unnecessary lasagne of complexity – not good complexity but random complexity, like the human appendix – which invites bugs and adds nothing above the former in most implementations. (Hell, my current company generates JWTs and then uses them as plain old 'random' keys to look up the session data in a database. It's hilarious but also awful.)

patrakov · on May 26, 2022

Well, asymmetric cryptography is not even needed in the most common case, i.e. when you are using the client as a database. Symmetric crypto is enough, because it's your server that both encrypts/signs and decrypts/verifies. Asymmetric crypto may be strictly needed only if the sender and the recipient are different. And there is still an issue that the malicious client can return old and outdated but validly signed data - which you can't solve without either a server-side database or accepting old data up to a certain limit.

samhw · on May 27, 2022

Yeah, that's true, actually. As best I recall, I just meant that that is what people use JWT for regardless, and I wanted to convey that the only part doing the useful work there is the 'asymmetric crypto' part. I didn't want to get into the territory of providing alternative suggestions, only breaking down what is useful about JWTs when used for that purpose.

As for old and outdated data, I should think that's easily solved by having a 'created' and 'modified' stamp in the encrypted data, much like you have on an inode.

ravenstine · on May 26, 2022

> Most companies don’t have these problems

Can anyone cite a single real world example of a fully stateless system being run for the purpose of business? I ask this every time JWTs come up and no one can answer it.

As soon as you tap the database on a request for any reason, whether it's for authorization or anything else, you might as well kiss JWTs goodbye.

Then again, just don't use them anyway, because they have no benefit. Zero. Disagree? Prove it. I'm sure there's some infinitesimally small benefit if you could measure it, but the reality is that JWTs are saving you from an operation that databases and computers themselves are designed to be extremely good at.

Don't use JWTs. They're academic flim-flam being used to sell services like Auth0.

nijave · on May 27, 2022

They can be helpful if you have services that need to call other services on behalf of a user request.

For instance, user A calls Product service for Product information but that response also includes Recommended Products and Advertisements from those two services. Product service can pass the JWT from the client to Recommended Products and Advertisements which removes the need to establish trust between those internal services (since authentication and authorization info are just passed around from what the client provided).

You can also use them in federated auth schemes where the issuing system is separate from the recipient. I think the use cases are pretty similar to SAML for this type of system but with a smaller "auth token" size.

Just because you're accessing a database on a request doesn't mean you're accessing the database that stores the authorization and authentication info.

patrakov · on May 27, 2022

The problematic word is "THE" database. The subsystem that you hit can be not stateless, but can use a separate database that doesn't contain authentication data.

Nihilartikel · on May 27, 2022

I can only provide verification of the counter example.

Having worked on some VERY large web services, the session was tracked on the back end and instantly and trivially revocable.

ryanbrunner · on May 26, 2022

It's nuts to me that so many companies have moved off cookies for web app auth state. They're simple, they're well supported, they require very little work on the browser side, and the abstractions around them on the server side are basically bulletproof at this point.

I see all this talk about authentication, and it's just literally never been a problem or concern for my company.

treis · on May 26, 2022

Aren't JWTs just fancy cookies?

freeqaz · on May 26, 2022

JWTs are frequently stored in LocalStorage which means that any XSS is able to leak the JWT.

Cookies, on the other hand, can be configured to be HTTP-Only and inaccessible to JavaScript on the page. That prevents somebody with XSS from leaking the value without a second server-side vulnerability or weakness.

In addition, JWTs are impossible to revoke without revoking _all_ sessions. This is the biggest weakness, imo, and the reason that they shouldn't be used client-side.

I'm a huge fan of the approach the Ory is taking with Oathkeeper and Kratos: https://www.ory.sh/docs/kratos/guides/zero-trust-iap-proxy-i...

nijave · on May 27, 2022

Can't you revoke them the same way you revoke any other auth token and put the token ID in a database somewhere?

waplot · on May 27, 2022

it won't be stateless then, and you might as well just use traditional sessions then

ryanbrunner · on May 26, 2022

Sure, but browsers have done a lot of work to make cookies far more convenient (they're automatically sent with requests, you have browser APIs to work with them), and secure (Secure, HttpOnly, SameSite, etc.)

nijave · on May 27, 2022

Cookies are a storage and transport mechanism and JWTs are signed JSON blobs. You could put a JWT inside a cookie.

stragio · on May 26, 2022

Why not look into an open source auth solution such as supertokens? It's almost free and you can self-host. That way you implement your own auth system but the security issues are mostly dealt by them.

danjc · on May 27, 2022

Yesterday I was working on updating code that implements Microsoft Open ID Connect (produces a JWT).

Their documentation [1] is exceptional - all the gotchas and reasons for practices are clearly explained and there’s a first class library to handle token validation for you. I even ended up using the same library to validate tokens from Google.

Perhaps not all vendors produce equally well written documentation but I think it’s a lot easier to get it right today than it was 5 years ago.

1. https://docs.microsoft.com/en-us/azure/active-directory/deve...

yourapostasy · on May 26, 2022

That's usually because security is a bolt-on instead of bake-in within the control and data structures themselves. Too many people interpret "Make It Work Make It Right Make It Fast" to mean security is implemented at the "Make It Right" stage, when it should be at the "Make It Work" stage. That's if they're the lucky ones who get security designed in from the beginning into the architecture.

We're paying for the sins of that in Unix these days, the kernel attack surface is in-feasibly large to remediate to correctness anytime soon (if ever?).

makeitdouble · on May 26, 2022

I think there is still more to it that just not taking it seriously or planning for it.

JWT in particular has the weird quirks you need to know to prevent encryption swapping attacks, and I'm sure there's more traps I myself am not aware of. At this point I think security can be seen on the same plan as legal: assuming a random dev will be able to plan and navigate out all the issues by sheer common sense hasn't been a viable approach for long now.

yourapostasy · on May 26, 2022

> At this point I think security can be seen on the same plan as legal:...

Considering how Uber ignored legal ramifications of ride sharing intersecting with incumbent regulations until they were dragged into courts, that paints a potentially rather grim picture of the equivalent in software security. But your gist sounds more along the lines of, "include the experts along at the beginning of the ride".

When I said security as a "bolt-on", I should have been more clear. Most of the time when I see it happening, it has been at the behest of the business stakeholders overriding the earnest developers trying to include the security teams from the beginning, but waved off with "it can be added later".

The business stakeholders see in their real life housing contractors walk into finished houses, attach some doodads, pop in some batteries to wireless sensors and the central base station, and ta da!, they "have security"! And think, "just how hard can it be to do the same in software?", dismissing what their tech leads try to tell them.

There is a large element of the principal-agent problem here as well. Shiny proofs of concepts and shallow implementations get immediate bonuses and promotions. Taking 1.1-2.0X as long to implement the right way, the result of which is no drama and no discernible difference to the casual business user, get no or even negative recognition. The incentives structure the choices. There are no incentives that structure payouts over the long-haul tying back to original historical choices, with an increasing gradient of the payout the longer the original choices prove sound. Naturally, since measuring that accurately would be impossible.

The closest I've come to an analogy that works in these discussions but not as often as I'd like is this. I don't throw together four tilt-walls, top off with a roof, move in with a 20-ton safe, open the doors for business and call it a regional bank depository. There are bedrock anchors, sensors, inner reinforced concrete walls, SOP's, audits, man traps, insurance reviews, and on and on, that get designed in before the foundation is even poured.

Clients who didn't find this convincing wave it off with a, "haha, this isn't that important lol". I want a better analogy.

makeitdouble · on May 27, 2022

That's an interesting angle. Uber ignoring legal ramifications had wildly different effects depending on the countries, some completely shutting out Uber as a result, and more lax places accepted dealing with the consequences that surfaced one after the other.

I'm in a country from the former block, and see a bunch of naive projects pitched by the business side that gets shut down pretty fast by the legal team as nightmares in the making (e.g. stuff that boils down to "shouldn't it be easier to take money from a variety of sources and move it to other users ?") that would sink the whole company when shit hits the fan.

My hopes would be on more security issues slowly becoming legal issues (not unlike GDPR, breach disclosure duty and associated penalties etc.) but I can understand how dire it feels in countries where legal grounds were shaky in the first place.

sboomer · on May 26, 2022

That's how a software implementation by a newbie works. You can't expect a newbie to take security into account before the software is implemented. Instead, there should be a custom to rectify all the security errors in the end before the software is pushed to the server.

josephg · on May 26, 2022

That’s an almost impossible task. Code gets immensely more expensive to understand or modify based on its age. If you don’t bother thinking about security until the 11th hour, it’s too late. Things will slip through.

zer01 · on May 26, 2022

This is an interesting write up!

The only question I have is around your point on monorepos - every monorepo I’ve seen has been a discoverability nightmare with bespoke configurations and archaic incantations (and sometimes installing Java!) necessary to even figure out what plugs in to what.

How do you reason about a mono repo with new eyeballs? Do you get read in on things from an existing engineer? I struggle to understand how they’d make the job of auditing a software stack easier, except for maybe 3rd party dependency versions if they’re pulled in and shared.

pianoben · on May 26, 2022

Monorepos do require upkeep beyond that of single-product repositories. You need some form of discipline for how code is organized (is it by Java package? by product? etc). You need to decide how ownership works. You need to decide on (and implement) a common way to set up local environments. Crucially, you need to reevaluate all these decisions periodically and make changes.

On the other hand... this is all work you'd have to do anyways with multiple repositories. In the multi-repo scenario, it's even tougher to coordinate the dev environment, ownership, and organization principles - but the work isn't immediately obvious on checkout, so people don't always consider it.

Regarding auditing, I have always found that having all the code in one place is tremendously useful in terms of discoverability! Want to know where that class comes from? Guaranteed if it's not third-party, you know where it is.

Not to minimize the pain of poorly-managed monorepos - it's not a one-size-fits-all solution, and can definitely go sideways if left untended.

yardstick · on May 26, 2022

Probably because:

1) It's easy to miss a repo, if you don't have a list of them all somewhere.

2) It's easy to get out of sync with what version of your software corresponds to what branch/tag in each repo.

yourapostasy · on May 26, 2022

> 2) It's easy to get out of sync with what version of software corresponds to what branch/tag in each repo.

I'd like to hear how others solve this. The way I've addressed this is I bake into the build pipeline some way to dump to a text file all the version control metadata I could ever want to re-build the software from scratch. Then this text file is further embedded into the software primary executable itself, in some platform-compatible manner. Then I make sure the support team has the tooling to identify it in a trivial manner, whether a token-auth curl call to retrieve it over a REST API, or what have you. This goes well beyond the version number the users see, and supports detailed per-client patching information for temporary client-specific branches until they can be merged back into main without exposing those hairy details into the version number.

While this works for me and no support teams have come to me yet with problems using this approach, it strikes me as inelegant and I'm for some reason dissatisfied with "it ain't broke so don't fix it".

yardstick · on May 26, 2022

In our case we abandoned individual repos and went back to a monorepo to solve this issue. In theory the separation of code was nice, but in practice it was a real pain when a service added new APIs you wanted to update another service to use it.

All of our services do also print out in their startup logs what version they are based on git branch name and commit. Monorepo or not this was useful.

treis · on May 26, 2022

We have a releases repo that takes in the git version SHA for each application and handles deploys. It's... ok I guess. Just another example of complexity to meet the growing complexity of the system.

marcosdumay · on May 26, 2022

> 2) It's easy to get out of sync with what version of your software corresponds to what branch/tag in each repo.

That's what the `[dependencies] my-lib = "1.0"` was supposed to solve.

ge96 · on May 26, 2022

The thing I'm working on has 5 main repos that all run (yarn start) for the app to be fully functional.

I need to put that down somewhere the order/matching branches.

cerved · on May 26, 2022

  find / -type d -name .git

yardstick · on May 26, 2022

As an auditor you don't have anything checked out locally yet, so no .git will exist. If you ask an individual developer or randomly picked developers, they will only have their specific repos checked out. If you look at the server hosting the repos then yes you may get them all. Assuming they are all on one server...

gusbremm · on May 26, 2022

Once I worked on a team that none of the engineers knew that jwt payload was readable on the frontend. They were in shock when I extracted the payload and started asking questions about the data structure.

lmc · on May 26, 2022

It's kinda baffling that JWTs are unencrypted by default, to be fair.

bpicolo · on May 26, 2022

It's the whole point - they're signed, not encrypted.

You should use opaque tokens instead if you don't want the frontend or other services that have access to the token to read it.

lmc · on May 26, 2022

In many cases, the front end doesn't need to read the JWT, just pass it on to some API.

An encrypted JWT is still convenient as it can be decrypted and deserialized into a common data structure using existing libraries.

bpicolo · on May 26, 2022

One benefit of JWT as specced is that those APIs you pass it on to don't need to share an encryption key, which makes rolling the key without causing downtime impractical. With OIDC, for example, frequent key rotation helps you create a better security posture.

The benefit of signing versus encryption is many services are able to verify the authenticity without needing a shared secret. That includes untrusted services, which is frequently the case with OAuth 2.

You can encrypt a JWT token, but at that point it's not semantically a JWT anymore. It can be any JSON at all and doesn't need to match the JWT structure. The first and last parts of a JWT are a signing algorithm and signature, respectively.

mosdave · on May 26, 2022

I, for one, enjoy not needing to coordinate an encryption key between my service and my IdP.

lmc · on May 26, 2022

I also enjoy not worrying about how the next field I add to my JWT can be exploited after a base64 decode :-)

bornfreddy · on May 26, 2022

How else could frontend read them? If you don't need this then regular cookies are better.

lmc · on May 26, 2022

It's the other way round - the front-end shouldn't need to read JWTs, just pass them on.

mosdave · on May 26, 2022

if your frontend is interrogating the jwt you're doing it wrong

nijave · on May 27, 2022

Isn't it pretty common to read the expiration so you know when to refresh tokens?

bornfreddy · on May 27, 2022

It is, among other things like username or user e-mail address.

This is also, together with backend scalability, a major selling point for JWTs. Otherwise one might just as well use regular session ids in cookies.

samhw · on May 26, 2022

I mean, I'd be rather surprised too. What were you using JWTs for, if not asymmetric crypto? Presumably you weren't using it to sign the tokens, if they were surprised the client could access them? And I can't see many contexts where you would use it with a shared secret, where just sending JSON over HTTPS wouldn't suffice. (I'm assuming 'frontend' here denotes a client on the other side of the trust boundary.)

FlorianRappl · on May 26, 2022

I'm not getting your comment. The payload is not encrypted. I think you refer to the signature. The payload can always be decoded. It's just JSON into base64.

samhw · on May 27, 2022

Ah, sorry, that was what I was referring to when I said "Presumably you weren't using it to sign the tokens, if they were surprised the client could access them?". I classed that as too obvious for it to be what you meant.

chrisandchris · on May 26, 2022

For SSO? The biggest advantage (besides being stateless) about a JWT is that it is signed with an asymetric key and the client can validate the authenticity of the content. You can encrypt the content of the token, but that does not make to much sense (because the client anyway needs to decrypt it).

lstamour · on May 26, 2022

> For example because it’s so fast, [MD5 is] often used in automated testing to quickly generate a whole lot of sudo-random GUIDs.

Actually, it’s because programmers are lazy. GUIDs or UUIDs are 128-bits and MD5 produces 128-bits. A string like “not-valid” is not a valid UUID, but MD5(“not-valid”) is both possible to format like a UUID when output as hex (with dashes) but also self-descriptive - so you can name the token when generating it in a fixture function and know how to regenerate it later in a test, for example.

All the normal ways of generating UUIDs, including v6 and v7, are about trying to make them unique and collision resistant. But that’s nonsense when you want deterministic, reproducible tests. Hard-coding 32 characters is too much work, ain’t nobody got time for that. Magic numbers? Pfft. Just MD5 and write your own text…

Pro tip: have data model creator helper functions include a counter that resets every test (every time the database resets) and then assign a UUID like MD5(`InsertTableName-${counter}`) that way you have a unique ID that’s also easy to predict/regenerate.

That said… I’ve always personally preferred simple database IDs to be generally preferable over using UUIDs. It’s easier to understand THING 20 as an ID than 32-odd characters. But UUIDs are an industry standard, so they end up in your test code everywhere anyway…

jve · on May 26, 2022

> I’ve always personally preferred simple database IDs to be generally preferable over using UUIDs.

Unless you start migrating data between environments and want references to be alive.

Anyway, if you need a hardcoded GUID for tests or what, paste this into PowerShell: [Guid]::NewGuid()

Not arguing, just developing for a system that uses guids as primary IDs and writing tests for that system. I don't even need to hardcode GUID, as within test bootstrap I'm creating objects with generated IDs I can reference later for comparison.

lstamour · on May 26, 2022

I’ve done that before too - but it’s always possible if you run tests often enough that you’ll get an ID collision that randomly fails a test and causes a developer some grief. Easier to not use random sources of data as a rule of thumb within your unit tests.

contingencies · on May 26, 2022

Re. Security, predictable identifiers are often a vulnerability. Hence, don't present database IDs in public (ie. anywhere). Instead, generate unique non-predictable identifiers at creation time, and use a UNIQUE constraint (or similar). https://cwe.mitre.org/data/definitions/340.html

lstamour · on May 26, 2022

It’s true that in production, if it’s a security risk that IDs can be guessed, don’t make them predictable. But by that same logic you would have to stop using REST because it can let you guess an ID?

This advice is classified as varies by context because it doesn’t always apply. In test cases, predictable behaviour is better than randomness. There are exceptions, of course. Chaos monkey, fuzzing, and literally testing algorithms for uniform randomness, etc.

That said, you could get the best of both worlds if you used MD5 HMAC to create a UUID from a predictable number and a secret preventing guessing. If that’s your goal…

Of course, the secret could be trivially reverse engineered with MD5 if someone knew the ID number and algorithm to generate it, but I’m not sure we have the patience or need to use PBKDF2 or similar to create predictable, unguessable ID numbers… after all, it would be just as easy to use regular guessable numbers and put strong authentication so it doesn’t matter if you guess correctly.

contingencies · on May 27, 2022

Clean separation of concerns is good architectural practice. Whilst you are of course correct that you can potentially rely on mitigations (eg. authenticated APIs) if those subsystems change in future you have an emergent scenario producing undocumented vulnerabilities. Security people call this 'defense in depth' - ie. make sure you cover your ass religiously, all the time.

lifeisstillgood · on May 26, 2022

At what point is there something beyond a framework - a SaaS in a box perhaps, that just avoids many of these basic problems (oh and the HR, legal, etc problems) of starting a startup. Startups are not snowflakes apart from that one little core competency. In short most serial founders say the second one was easier, simply because they followed the template ground out in the first.

Would it be easier to start with that template?

czue · on May 26, 2022

There are a ton of products like this out there that build on popular frameworks:

Saas Pegasus (https://www.saaspegasus.com/) for Python/Django, Bullet Train (https://bullettrain.co/) and JumpStart (https://jumpstartrails.com/) for Rails, Spark (https://spark.laravel.com/) for Laravel, Gravity (https://usegravity.app/) for JS

You can find an even bigger list here: https://github.com/smirnov-am/awesome-saas-boilerplates though those are the market leaders (I make one of them and follow things closely)

gunnr15 · on May 26, 2022

There are some good open source options like https://getzero.dev/

scottharveyco · on May 26, 2022

In the Rails world we have https://bullettrain.co and https://jumpstartrails.com which both have open source templates for building SaaS services.

MathCodeLove · on May 26, 2022

I've seen boilerplate applications for <insert tech stack> but the open-source ones tend not to be great, and the closed source ones could be great - but I'm not willing to pay $XXX for code I haven't seen.

alfonsodev · on May 26, 2022

all that exists as SAAS products that target non-technical cofounders, and it is very hard to justify to co-founders, investors, advisors .. any investment in time in something that is not your core problem, and I think is for a good reason.

criddell · on May 26, 2022

Does learnings ever mean anything different than lessons? How did this enter corporate-speak?

quesera · on May 26, 2022

It pains my prescriptivist instincts to say so, but FWIW I do interpret them differently, frequently as the complementary sides of a single event:

A learning is a successfully learned thing. Or a received lesson.

A lesson is a taught thing. When effective, this would be one path to a learning for the receiver.

criddell · on May 26, 2022

I suppose that makes sense. Personally, I would write lessons learned rather than learnings in part to get rid of the red squiggle. My dictionaries flag "learnings" as a typo.

nescioquid · on May 26, 2022

"Lesson" also bears this meaning of something learned, though that would make "learnings" more precise, and therefore distinct.

My experience seems to be that people who use "learnings" are referring to the lessons learned by others, usually subordinates and is used instead of "lesson" because of -- being sensitive to how harsh it sounds to say "group X learned several lessons".

phren0logy · on May 26, 2022

Ugh, seriously. Like utilize instead of use.

quesera · on May 26, 2022

In earlier days, I thought it'd be fun to have two versions of my resume. They had parallel content, but one was fluffed up in corporate-speak, and the other was human English.

I included links, e.g. resume-fluffy.html or resume-direct.html, and (somewhat seriously) suggested that hiring managers read the first and tech evaluators the second.

It made for some light humor in discussions with hiring groups. And also some effectively-paralyzed recruiters, which added to the fun of the former.

vemv · on May 26, 2022

Learnings can be more easily interpreted as "something I learned', while lessons can come across as 'lessons for you'.

fluctor · on May 26, 2022

I came here for this comment.

_pete_ · on May 26, 2022

it's almost - almost - as bad as 'vinyls'

alexfoo · on May 26, 2022

Nit: "...or example because it’s so fast, it’s often used in automated testing to quickly generate a whole lot of sudo-random GUIDs."

ITYM: "pseudo-random"

Although I do like the mash-up concept of "sudo random"-ness.

mtVessel · on May 26, 2022

It's higher-privileged randomness. As in, all GUIDs are random, but some are more random than others.

neilv · on May 26, 2022

> All the really bad security vulnerabilities were obvious.

All the really bad security vulnerabilities that were found were obvious?

One is more likely to find things that are obvious?

lbriner · on May 26, 2022

But the auditers were experts and used all the latest and greatest tools. I think they are implying that if they couldn't find it with code inspection then a hacker wouldn't find it by probing.

Of course, they might not find zero-days but most hackers wouldn't find those either.

blenderdt · on May 26, 2022

When a team is so focused on the todo list they sometimes forget the obvious mistakes they still needed to fix.