Hacker News new | past | comments | ask | show | jobs | submit login
Learnings from 5 years of tech startup code audits (kenkantzer.com)
798 points by lordofmoria on May 26, 2022 | hide | past | favorite | 258 comments



That's a very interesting set of findings. What is important to realize when reading this that it is a case of survivorship bias. The startups that were audited were obviously still alive, and any that suffered from such flaws that they were fatal had most likely already left the pool.

In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them. But business case problems, for instance being unaware of the true cost of fielding a product and losing money on every transaction are extremely common. Improper cost allocation, product market mismatch, wishful thinking, founder conflicts, founder-investor conflicts, relying on non-existent technology while faking it for the time being and so on have all killed quite a few companies.

Tech can be fixed, and if everything else is working fine there will be budget to do so. These other issues usually can't be fixed, no matter what the budget.


I've found that slowdown from tech debt killed as many companies as any other issue. It's usually caused by business owners constantly pivoting, but being too slow on the pivot and too slow to bring customer wishes to fruition (due to poor technical decisions and tech debt) is probably one of the top 5 reasons for dead companies I've seen.


That's a good point, tech debt can be a killer. But the more common pattern that I've seen is that companies that accumulate tech debt but that are doing well commercially eventually gather enough funds to address the tech debt, the companies that try to get the tech 'perfect' the first time out of the gate lose a lot more time, and so run a much larger chance of dying. The optimum is to allow for some tech debt to accumulate but to address it periodically, either by simply abandoning those parts by doing small, local rewrite (never the whole thing all at once, that is bound to fail spectacularly) or by having time marked out for refactoring.

The teams that drown in tech debt tend to have roadmaps that are strictly customer facing work, that can get you very far but in the end you'll stagnate in ways that are not easy to fix, technical work related to doing things right once you know exactly what you need pays off.


Maybe once you get to that stage it doesn't really matter. Maybe if you're going for a billion dollar earth shaking idea, it doesn't really matter.

However, I've worked for a small company for quite a while now. We've had several successful projects and several failures.

In my experience, technical debt taken too early can easily double the time it takes you to find out if a project is a dud. That matters to us.

My general rule is: push off technical debt as late as you can. Aways leave code slightly better than you found it. Fix problems as you recognize them.

I think a big mistake developers make is thinking "make code better" should be on some roadmap. You should be making code better every time you touch it. Nothing about writing a new feature says that you have to integrate it in the sloppiest way possible.


> I think a big mistake developers make is thinking "make code better" should be on some roadmap. You should be making code better every time you touch it. Nothing about writing a new feature says that you have to integrate it in the sloppiest way possible.

I vehemently agree.

One of my first jobs was working for a mathematician at a bank, who could code well enough to compute his mathematical ideas, but not a software engineer so hired me to do more of the coding for his team.

He would say "Jim, just get this done but don't spend time making it fancy." In other words, don't spend time refactoring or cleaning up the code, expressed in his own words.

I would say "sure" and then proceed to refactor and clean up the code as I went. It took less time than it would have to write the code "ugly" then deal with the accumulated tech debt, and I finished everything on time so he was happy.


Even just #commenting on weirdness you discovered while working on tech-debt code can be invaluable; a little note on the two hours you spent on it could save days later when trying to figure out why something isn't working. Sometimes the problems can't be fixed right then, but you can mark them so later when it does break you have a hint as to what is going wrong.


What are the technical debt issues you've run into that've crippled your development velocity?


Technical debt has the second order effect of crippling morale.


Exactly this. People forget that the point of the metaphor is that debt is a tool you use to grow faster.

Credit card debt (e.g. sloppy code and test-free critical backend processes) is pretty bad and should be paid down ASAP.

Mortgage debt (e.g. no UI tests on the front-end) is quite safe and you can kick the can down the road.


In my experience when you don't design and deliver code with testing or accessibility in mind, you end rewriting entire components. This all drastically adds to the end costs. Most leadership thinks this is "efficient" but it's not really. If you do it correct the first time you can consistently deliver features throughout the entire year rather than having to take several months to quickly duct tape everything from falling apart.

I never liked the "debt" metaphor. If a housing developer neglected to build a proper foundation, would you call that "debt?" I feel like it's very similar, it's bad metaphor for a concept that has very little to do with finance.


That's missing the case where the tech debt results in lowered commercial performance, as things necessary to keep customers happy enough to provide the cash flow are getting harder and harder.


They’re saying that in their experience this isn’t nearly as common as it is typically portrayed.


> They’re saying that in their experience this isn’t nearly as common as it is typically portrayed.

How would you know if the poor commercial performance was due to tech debt or not though? It's the intangibility of tech debt that makes it so insidious.


Can't speak for the person I was citing. But I think I get what they're saying.

My personal experience has been that tech debt is more often caused by business level decisions and not engineering decisions. Deadline on this contract is next week so let's ship what we got and worry about it later. Hey, good news everyone we just pivoted 180 degrees so let's try to salvage what we've got.

So yes it very well might be that a mountain of tech debt was the final nail in the coffin. But why was that tech debt there in the first place? I was understanding the GP as saying they saw business decisions leading to poor engineering instead of engineering just doing dumb things on their own. I've seen plenty of examples of the latter but a lot more of the former in my travels.


The double challenge here is doing this all whilst essentially keeping it in the background out of any customer sight. Even if you know exactly what you need after a while from a business perspective, you still need to reimplement it in a way that doesn't cause your product / service / platform to lose customers. I find this always to be an extreme challenge. It's a bit of a treadmill too: doing it this way (without causing breaking changes) certainly takes longer too. So it all piles up into a big messy stack of work :)


I feel like the biggest problem from a business strategy perspective is all we have are these personal opinions and gut feels. Even this article mentions having done 20 code audits, but presents nothing but qualitative findings. Ideally, some business school out there would be embedding researchers in randomly selected startups to know for sure how often you fail because of tech debt versus failing because of worrying too much about tech debt. That's an empirical question, yet all we get are informed expert opinions, but no auditable, reproducible research evidence. It's all so unscientific.

Not to say you're wrong, but we have no real way of even deciding. All I can do is lean on my own experience, but I've seen nowhere near every product team out there and the ones I have seen are nothing close to randomly sampled or blinded.


It’s one of those systemic health type things. It’s really hard to die of tech debt on its own, but if you move slower, you’ll die more often from other shocks.

Another way of thinking about it is that you have N months of runway, and based on your velocity you can pull off a pivot in M months, and the more tech debt, the more time it will take to successfully pivot. If you don’t have a full pivot worth of runway remaining, and you need to pivot, you die. (Of course this oversimplifies by holding pivot magnitude equal but hopefully this illustrates the point.)

I do agree that away from the margin, companies that are incredibly successful can afford to punt harder on tech debt. I suppose “know thyself” might be useful advice here; it’s probably not good advice for the median startup to ignore tech debt completely IMO.

I think the main point though is to optimize for agility; tech debt can let you move faster in the short and even medium term, so sometimes it’s right to tactically take on some debt. But not so much that you get bogged down later; make sure you carve out time to fix the stuff that starts to be painful.


I’m not sure I agree. Technical debt is a symptom, it’s the consequence of bad management that leads to working on the wrong things.

If you’re running a startup and haven’t yet found your feet in terms of a product offering, and you’re building your product(s) in such a way that technical debt builds up through continuously layering half-baked on half-baked, it’s indicative that you’re not actually pivoting and not actually evolving, you’re just adding new half-baked ideas to a half-baked system… and being able to do that at twice the speed isn’t going to address the real problem: half-baked ideas don’t make a product, whether that’s 10 half-baked ideas or 100.

My experience is that any company in which evolution/experiments/pivoting is constrained within the boundaries of what already exists because of the sunk cost fallacy has made a grave error at a leadership level, not at a code level. If you can’t validate something without mashing more code into your system, that’s the problem to address.

I’ve seen companies with horrendous tech debt die, and you could certainly frame their death as being a consequence of the tech debt (“if they had just got the perfect system…”) but that assumes the perfect system would somehow prevent them from making the mistakes that got them there in the first place. It wouldn’t. The technical debt is an expression of their mistakes, not the cause. You could dump the perfect system at their feet and they’d be surrounded by garbage again a few years from now.


I worked at a company that was mired in tech debt. At least 4 different UI frameworks were in use, one of which was totally not supported. Multiple versions of the app were left accessible, with links from the new to the old, because the new version was not feature complete. "Feature flags" were expressed in at least 3 different ways. It was a nightmare to figure out if something was on or off, and why. The back end was based on a non supported language version, with several old, deprecated third-party packages as a result. The company appeared organized, superficially, but at the lower levels of implementation it was a total dumpster fire.

They were constantly "pivoting", but leaving the old junk around.


There’s tech debt and then there’s poor engineering leadership. There’s no valid reason for a startup looking for market fit to switch frameworks or feature flags multiple times unless you’re just being clueless and looking for silver bullets. Just pick a few “boring” technologies and you’ll be perfectly capable of building anything “web” for at least a decade without messing around .


You are right, but there's always folks pushing for a "better" framework, even if the same old boring stuff works. If one of them is fairly vocal and a little bit persuasive, a new project will start using it... on and on it goes.


Technical debt is a sensible strategy when you are a startup aiming for growth. If you become successful, you can hire enough developers to pay back the depth in due time. If you fail, the debt doesn't matter.

Take Facebook: They build an empire on PHP. Now they have built some clever compilers on top on PHP in order to make it safe and performant without breaking their existing piles of legacy code. Overall this is probably ridiculously inefficient compared to just using a safe and performant platform from the beginning. But using PHP in the beginning allowed them to move fast in the critical growth phase.


I really struggle with the analogy of technical debt as equivalent to financial debt. The analogy works great in theory, but it doesn't translate to the real world. The technical decisions we make today will influence the decisions we make tomorrow, and the decisions we make tomorrow will influence the decisions we make next week... and so the system we have a year from now will be layers and layers of deeply interwoven technical debt that you can't just have your accountant pay off at the click of a button.

If we're married to the financial debt analogy, then technical debt has compounding interest like a payday loan... and payday loans are typically used in very distressed circumstances, and are very dangerous. There's appropriate times to take a payday loan, and there's appropriate times to take on technical debt, but it has to be handled with great care and be an immediate wake up call to address the underlying cause.


Some tech debt behaves like a payday loan with usurious compounding interest.

The more insidious kind behaves more like a completely unhedged call option - fine, until it very suddenly isn't.


Yeah, compounding interest is part of the metaphor. As long as you grow faster than the compounding interests, you are good. If the options for a startup is the keep growing or die, then taking on technical debt is reasonable.

Of course it is different for a steady-state company or organization. You need to keep technical dept at a manageable level.


The developers you hire may hate you for it, though. And won't stick around. That'll create churn that creates even more debt.


Does Facebook have problems recruiting?


I think all companies are having problems recruiting in the current environment. FB for a myriad of reasons.


>If you become successful, you can hire enough developers to pay back the depth in due time.

How many successful companies actually do this?

At one point in my career I transitioned from a startup to one that had been acquired some 12 years before, and found it to be even more chaotic than the startup. Instead of playing a frantic game of whack-a-mole with all the pivots and feature ideas of the founders, you had a few dozen teams playing whack-a-mole with the pet projects of their respective product managers who were trying to make a name for themselves. Which was much worse because you had to coordinate with every other of those teams, and of course work with all the integrations with the parent company.

Charitably speaking, maybe these older successful companies are bad simply because the field of software engineering was still too immature when they came about, and today's startups will actually pay back their debt when they become successful in the future. Sure, we have better tools now than then, but we still don't have a static analysis tool that can determine if we built the right or wrong thing for an ever changing market.


Sounds like a version of the mythical man month. Throwing 100 or 1000 developers at it will not reduce tech debt alone. It is probably harder to eliminate debt with more developers.


I have not found working on the wrong things to be problematic so long as you take the time to eliminate the wrong things once they have established themselves as being wrong.

Not taking time is, at heart, where tech debt is born. That can manifest debt across all areas of the development process. Pressure to not take time can certainly come from management, but I have also witnessed on numerous occasions the reverse, where management asks developers to slow down and take the time to do the work well; sometimes to no avail.

Either way, your underlying thesis is quite true that given the perfect system an imperfect team will quickly reintroduce said problems into the system. This is why many software companies have become hyper-picky (even before the tech crash) about hiring. They want to try and avoid building a team that wants to shortcut that time.


What killed them was they never found PMF. Eventually the tech debt slowed them down so they couldn't take as many swings at finding PMF.

But in the counterfactual if they'd tried really hard to avoid tech debt that would have slowed them down at the beginning, not to mention there are plenty of organizations that will write very complex abstract code to avoid tech debt, and end up making the code base incredibly painful to work with. So overall did get they get less swings?

I've worked on a lot of old code bases and the biggest issues I've run into, issues that crippled development velocity, were 95% boneheaded decisions and overengineering. And never the types of code quality issues someone like Uncle Bob talks about in Clean Code.


Well, why are those companies pivoting so often in the first place? Isn't the root cause probably in GP's list?


And keeping 100% of all features instead of removing the least-used features as you add new ones to keep tech debt from growing indefinitely and reaching a point where new features take months to ship.


we need data on this


> What is important to realize when reading this that it is a case of survivorship bias.

This is totally true, but taken too seriously it leads to inability to learn anything from almost any information whatsoever. What’s more, whatever you do (whether you take the advice of those who have gone before or not), you will not be able to decide whether you made good decisions or merely “survived”.

How does one proceed when anything can be survivorship bias and determining cause and effect for large scale operations like running a business is essentially impossible.

(When I say “anything can be survivorship bias” I specifically mean that no matter the cohort you cannot decide whether you’ve accidentally excluded unknown failures, and hence you have no assurance of the actual strength of any analysis you do).


> In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them.

Not my experience..

What does a “tech failure” look like? Do the servers catch fire? Is the web site down? Maybe people are unable to login to their stations?

Hi-tech business is “Tech”, so the failure of the business is in fact the tech failing. More specifically, the business was unable to direct the tech to solve real problems and solve them well enough.. New hires took too long to onboard.. Engineers were only superficially productive.. Communication between the stake holders and engineers was lacking.. etc.. etc..

Take note that in all scenarios above “work” is being done, “progress” is being made.. ceremonies are everywhere and success is seemingly around the corner.. Or is it?

It’s just very hard to see these issues, they are hidden under layers of meetings, firings, hiring, pivots, milestones with little progress in actual business value.


I think the harder you scrutinize the distinction between a tech problem and a business problem, the harder it becomes to find it.

When there appears to be such a distinction, that's usually a manifestation of something like Conway's law, a symptom that there exists an unhealthy organizational divide between business and technology.


I suppose that is my point, sayings such as "I haven't seen a single start-up failing due to tech" is not possible to defend..


>relying on non-existent technology while faking it for the time being and so on have all killed quite a few companies

This doesn’t count as a tech problem?


Obviously not, it's a problem to decide to fake non-existing tech but more of a management decision than a problem with the technology itself, there is an infinite number of things that don't exist, no matter how much you want them to exist and if you are not capable of coming up with a working solution and rely on the world around you to move fast enough to bail you out then I would say that is a psychological problem more than anything else.

A common theme right now is 'AAI', using people to fake an AI that may not come into being at all, let alone before your runway (inevitably) runs out.


I saw one "secretary AI" that schedules meetings over email in your calendar. Just cc it to start using it (once you signed up). The idea seemingly being, fake it with low cost outsourcing to prove there's a demand for this and then make it.

The developers you'd hire to make it an actual AI and the developers you'd hire to make it a Mechanical Turk are very different skill sets.


I wouldn't count Theranos' failure a tech problem. I would consider it a fraud problem.


I'd hazard a guess that many cases of startup fraud start out as good-faith delusions of grandeur, and only pivot to bad-faith fraud when the founders realize it's the only option to keep the lights on. Because the product results aren't there.

That is, plan A is Stripe, plan B is Theranos.


You may well be on to something here. But I've seen a couple where plan A was Theranos.


I was in a startup that failed in part due to tech issues. The AI model just didn’t work. There were a lot of other problems but if the tech worked, they could have easily gotten paying customers.


> yet to come across a company where the tech was what eventually killed them

I would think that a poor quality product, or one not as good as competitors would be a big killer. Google, Facebook, Amazon have amazingly superior products. I think you're missing something.


> In 15 years of doing technical due diligence (200+ jobs) I have yet to come across a company where the tech was what eventually killed them.

How about the cases where it caused fines due to failed security compliance that didn’t help the situation. Thinking fintech companies especially.


I've yet to come across a company killed by fines, are there any examples of those? If anything I think the fines are still much too light.


The fines are trivial in the US.


Interesting. Have you written more about this somewhere ? If not you should.


Never really thought about it, I'm typically under NDA but in aggregate I could probably do something with this without breaking those NDAs.


> Simple Outperformed Smart. As a self-admitted elitist, it pains me to say this, but it’s true: the startups we audited that are now doing the best usually had an almost brazenly ‘Keep It Simple’ approach to engineering.

I wrote about this before that as an industry, we have made writing software complex for complexity's sake.

> imagine, in a world where there wasn't a concept of clean code but to write code as simply as possible. not having thousands of classes with indirection.

what if your code logic was simple functions and a bunch of if statements. not clever right, but it would work.

what if your hiring process was not optimizing for algorithm efficiency but that something simply works reliably.

imagine a world where the tooling used by software engineers wasn't fragile but simple to use and learn. oh the world would be a wonderful place, but the thing is most people don't know how to craft software. but here we're building software on a house of cards [0]

:[0] - https://news.ycombinator.com/item?id=30166677


Hot take: The current trend of writing code, AND hiring engineers, is the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start - with respect to scalability.

Have you seen the personal blogs of devs today? What should be a simple HTML + CSS website with the simplest hosting option possible, is now written in a framework with thousands of dependencies, containerized, hosted on some enterprise level cloud service using k8s.

That's great and all if you suddenly need to scale your blog to LARGE N number of readers, but the mentality is still persistent - when one should be focused on core features and functionality - in the simplest way possible, you're bogged down with trying to configure and glue together enterprise-level software.

Maybe it's a bit unfair to put it that way - a lot of engineers know the various systems and services in and out, and prefer to do even the simplest things that way. But I've lost count how many times I've encountered devs. that BY DEFAULT start with the highest level of complexity, to solve the simplest problems, for no other reason that "but what if" and "it feels wrong that it should be that easy".


So a couple of thoughts.

> Have you seen the personal blogs of devs today?

I don't know that this is a fair comparison, because side projects can and are often a way to explore ideas, understand tech, play around, etc. So I don't know that I'd agree that it's a great extrapolation to the way an engineer works based on side projects or a blog that may have different objectives.

I do agree with the sentiment though, that we want to be watching for indicators to how a team member approaches problems.

> the way it is because everyone thinks they're gonna be the next FAANG-sized company, and need to be able to write FAAANG-quality code and engineer FAANG-quality architecture from the start

I don't know it's fair to say everyone, but is something I agree companies, especially startups should filter for. When I acted as hiring manager, and was trying to build SRE as an example, I would remind candidates, and the team continuously that we're not google. So while we want to bring ideas and approaches in from what google has published as "SRE", we do need to consciously leave large parts out that are appropriate to our needs and stage of maturity.


I disagree that they go down the path they do because they think they’re going to be FAANG-sized, but rather it’s a case of cargo-culting, “we’ll use these tools/architectures because the best companies use them, therefore they must be the best tools/architectures.”


I don't even know if it's cargo culting as much as engineers using their day jobs as opportunities to learn marketable skills for job hopping.

Nearly ever new technology introduced at places I've worked was because someone was keen to get it onto their resume.


I believe this might fall under "resume driven development"


But why are the skills marketable, if no business needs them?

Why are companies hiring people who proudly put Overengineer as their job title on resume?


New and shiny > old and dull.

At least as far as marketing goes, when trying to hire young and hungry devs. More so at startups.

My old job was quite spread geographically and organizationally - lots of small offices with engineers that had more or less total freedom when it came to tooling. It was actually a gov. agency - so that might surprise someone, but it was one of those places that was transitioning to the digital age, and therefore, didn't really have much solid structure.

The various teams pretty much used the tools they wanted to solve the problems at hand - I think we had three different versioning control systems at play, and multiple different databases. Working with data across the organization was a total nightmare.

But we did have a common platform for communication, sharing stuff, and all that. I think we were around 250 devs. and engineers, and a survey shoved that we used over 20 different programming languages.

One thing I DO remember, was that some people in most teams were constantly pushing for the latest (as in 1-3 years old) tools. Someone's writing an API in Flask? No - screw that. FastAPI where it's at. Team x is still writing RESTful APIs? We're doing GraphQL. And that's how it went.

When some of these guys would end up in dev. blogs or being interviewed, they'd of course push the "See, we're not old and stuffy anymore. We've hired lots of young engineers, and right now we're using [trendy stack]"


Clearly there are businesses that serve millions or billions of users, and have a serious need for engineers with experience with the tools to do so. Engineers seeking those jobs are then motivated to use those tools to engineer systems serving orders of magnitude fewer customers simply so they can claim that experience.


I think your general point is true but the personal blogs of devs angle is maybe not the most illustrative one.

We tend to apply industrial strength tools to our personal projects because it's some combination of what we already know, or we're trying to learn or refine an unfamiliar skill.

If you just gave me a linux shell I would not be able to confidently provision a secure webserver for static hosting. But I do know how to write cloudformation and deploy it. Sure this is a personal moral weakness by the standards of HN whatever, but it's where my career has led me so these are the tools I have.


> If you just gave me a linux shell I would not be able to confidently provision a secure webserver for static hosting. But I do know how to write cloudformation and deploy it. Sure this is a personal moral weakness by the standards of HN whatever, but it's where my career has led me so these are the tools I have.

I wouldn't say it's a moral weakness, maybe more of a failing of the tech education ecosystem. It seems bizarre to me that in software, we teach complex high-level things before we teach simple low-level things. Programming students learn very complex high level languages in year 1, and then maybe by year 3 or 4 learn assembly, or what a CPU register is, or how RAM and cache works. It's like teaching a carpenter how to build a high-rise apartment building before teaching them how to measure or use a hammer.


Well I didn't have any formal tech education, just what I picked up on the job and through my own curiosity.

But I mean you don't teach a car mechanic metallurgy and aerodynamics, except to the extent they'll need to apply that knowledge towards specific goals. At some point the discipline is mature enough that people genuinely don't need to, and can't, know every level of it from the ground up.

I think coding is approaching or already at the point where "cs/fundamentals of computation" should be a different degree from "professional software development."


I don't think anyone using Jekyll or whatever for their blog is doing it because they use Jekyll at work.


side note: FAANG is an obsolete acronym according to The Economist, it's now Microsoft - Amazon - Meta - Apple - Alphabet, leading to the new acronym, MAMAA, which has the nice result that we can now talk about the outsized influence of Big MAMAA in the tech world.


I never understand this idea of picking an arbitrary set of language features and saying, “what if your code logic was simple functions and a bunch of if statements”. The complexity won't magically go away, it'll just appear in a different set of problems.


I think it's helpful to divide complexity into complexity in the business logic / problem you're trying to solve, which cannot be eliminated from a pure technical perspective (you should still try to simplify it through discussions with stakeholders though!), and complexity that isn't necessary to solve the problem.

Oftentimes the latter category could be necessary if you were at much higher scale, or if the business evolved in some way, etc., which is where this sort of stuff tends to originate. Just yesterday we were talking at my company about extracting a service in Go, since it's very high scale, very simple, and doesn't change much. On one hand, it's pretty likely we'll need to do that at some point, but on the other, it's not causing any issues right now, so there's not much point in doing it at the moment. Had we gone forward, that would have added complexity for a theoretical concern that may or may not happen in the future.


if this was the case we wouldn't have the problem of AbstractFactory that has plagued the Java ecosystem. if this was the case Golang wouldn't be here seeking to simplify things by not having classes. And having __err__ handling like it does. it's not pretty but it works. I pick on the Java because it's ecosystem is broad. However, the over-engineered complexity that resides there makes you wanna stay away.


One could also use this example for arguing the other way. AbstractFactory pattern would not be needed if Java had more rich feature set to begin with, in this particular example anonymous functions. (Which I believe it nowadays have). Patterns emerge when the foundation isn’t solid enough by itself to stand on.

People needed modularity, DI and callback functions (essential complexity) but since the only way to do that with the language was classes, you had to invent AbstractFactory pattern (accidental complexity).


Everyone does lip service to simplicity, but in reality simplicity is really difficult.

If you have seven conditions driving a decision, a bunch if's might be the simplest implementation. If you have hundreds of conditions, a tree of if's becomes impenetrable. There is no one-size-fits-all when it comes to simplicity.

Some problems are inherently complex. You can't design a payroll system or tax calculation system which is simpler than the set of rules and regulations it has to implement.


> If you have hundreds of conditions, a tree of if's becomes impenetrable.

I mean, it worked for Amazon. I saw the code.


Fair enough, but you probably wouldn't call is simple.


> a tree of if's becomes impenetrable

Even in that case, a tree of if's isn't that bad (it's not great), but far worse is when you have the same set of if statements copied and pasted around dozens of places. Because you will forget to update one of them at some point.


Thousands of classes with indirection is not clean code and write code as simply as possible is tautology. Of course it should be as simple as possible. The interesting question is what counts as simple.

Setting that aside though, the author seemed to mostly be talking about architectural simplicity in the article. He specifically called out "premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs" which I think is spot on. Distributed systems are fundamentally hard and involve a lot of difficult tradeoffs. But somehow we have convinced ourselves as a profession that distributed systems are somehow easier.


The hardest job as a software engineer is to come up with simple and obvious solutions to a hard problem.

Or you can stitch together eight different cloud services and let someone else debug that crap in prod. Not to mention subpar performance and an astronomical cloud bill.


It takes a lot of knowledge, experience and smarts to find simple solutions for hard problems


> write code as simply as possible is tautology

What? That has nothing whatsoever to do with tautology. It's just a statement you agree with. If everyone else agreed with it, it might at most be a truism or an uninteresting statement, but evidently they do not. (They might claim to, but reality shows they optimise for other things - in my experience the simplest work, which does not always mean the simplest code, especially when you're accustomed to the mystic rituals of the Javanese tribes.)


Fair enough, maybe tautology is the wrong word, but I do think everyone agrees with it. Who ever says "we need to have more complicated code"? The question is how do you define simplicity, because it is not always obvious. Every overly-abstracted mess I've ever seen was done in the name of "simplicity". Basically, let's add an abstraction so we can "simply" swap in another database in the future, or handle X hypothetical use case by only changing configurations. Likewise, I've seen 1500 line methods with dizzying, incomprehensible control flow that was nevertheless composed entirely of "simple" if/then/else statements. And a well chosen abstraction or two made things much simpler to read, understand and modify.


Thousands of classes with indirection is absolutely Clean Code. It's in the book.


PipeWire is an example of building a Linux audio daemon on "microservices, architectures that relied on distributed computing, and messaging-heavy designs":

- It takes 3 processes to send sound from Firefox to speakers (pipewire-pulse to accept PulseAudio streams from Firefox, pipewire to send audio to speakers, and wireplumber to detect speakers, expose them to pipewire, and route apps to the default audio device).

- pipewire and pipewire-pulse's functionality is solely composed of plugins (SPA) specified in a config file and glued together by an event loop calling functions, which call other functions through dynamic dispatch through C macros (#define spa_...). This makes reading the source less than helpful to understand control flow, and since the source code is essentially undocumented, I've resorted to breakpointing pipewire in gdb to observe its dynamic behavior (oddly I can breakpoint file:line but not the names of static TU-local functions). In fact I've heard you can run both services in a single daemon by merging their config files, though I haven't tried.

- wireplumber's functionality is driven by a Lua interpreter (its design was driven by the complex demands of automotive audio routing, which is overkill on desktops and makes stack traces less than helpful when debugging infinite-loop bugs).

- Apps are identified by numeric ids, and PipeWire (struct pw_map, not to be confused with struct spa_dict) immediately reuses the IDs of closed apps. Until recently rapidly closing and reopening audio streams caused countless race conditions in pipewire, pipewire-pulse, wireplumber, and client apps like plasmashell's "apps playing audio" list. (I'm not actually sure how they resolved this bug, perhaps with a monotonic counter ID alongside reused IDs?)

I feel a good deal of this complexity is incidental (queues pushed to in one function and popped from synchronously on the same thread and event callback, perhaps there's a valid reason or it could be removed in refactoring; me and IDEs are worse at navigating around macro-based dynamic dispatch than C++ virtual functions; perhaps there's a way to get mixed Lua-C stacktraces from wireplumber). I think both the multi-process architecture and ID reuse could've been avoided without losing functionality. Building core functionality using a plugin system rather than a statically traceable main loop may have been part of the intrinsic complexity of building an arbitrarily-extensible audio daemon, but I would prefer a simpler architecture with constrained functionality, either replacing pipewire, or as a more learnable project (closer to jack2) alongside pipewire.


> we have made writing software complex for complexity's sake.

I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore (like a virus killing its host and thus preventing further spread). This, in turn, means that the software industry tends to continually live on the edge of maximum complexity its members can (barely) handle.


> I think it’s rather that complexity naturally expands to fill up the available capacity (of complexity-handling ability). That is, unless conscious and continuous effort is spent to contain and reduce complexity, it will naturally grow up to the limit where it causes too much problems to be viable anymore

I disagree that this is something that "naturally" happens. A lot of this thread is about how adding complexity is either a deliberate choice made by software developers or just that the developer simply was never taught how to do it the simple way--both of which illustrate a gap in software development education. When the tutorial about How To Create a TODO App starts with "Step 1: Install Kubernetes", I'd argue we have an education problem.


I’d argue that the fact these choices are being made is natural (otherwise you’d have to explain what the “unnatural” root causes are), and preventing or counteracting them exactly requires the conscious and continuous effort mentioned.


The problem is simple means diffrent thing in a small codebase than in a big one. A bunch of if statements in a code that is small enough to understand everything is ok but when it become big it's hard to understand flow of data.

I do favor simple code but some complexity/abstracion is needed to make it easier to understand


But picking the right abstractions that aren't leaky in any of the aspects you really care about is critical, hard to measure (leakiness isn't obvious, nor what kind of aspects you care about), hard to get right, and hard to maintain (because your abstraction may need to evolve, which is extra tricky).

Obviously, getting that right makes subsequent developments much, much easier, but it's hardly a simple route to success.


I see tech debt and simplicity as a mixture between 'tyranny of small decisions' and each individual coders 'cleanliness' level.

Each individual coder has a code cleanliness level, similar to how every friend's Mom growing up would always remark "Sorry the house is a mess", when it was spotless. If your used to 9/10 and it's a 7, that looks like a wreck. If you are used to 5 and it's a 7, that looks great. I urge other coders to increase their cleanliness level, and to look for others with high cleanliness for guidance. If you are coding next to people that 5 looks good to them, no matter how much they try to pay down technical debt, they never will.

I think tyranny is ultimately showing us that the tooling that we currently have is making is much trickier than should be to evolve those abstractions. Partially this is because of bad abstractions that caused bad tooling and bad tooling that caused bad abstractions. Because it's so difficult, we don't do it. We take the small decision and work slightly harder in a slightly buggier environment to get the new thing done. But of course now the problem is bigger which means its even less likely for us to ever actually pay down that debt.

> “I’m sorry I wrote you such a long letter. I didn’t have time to write you a short one.” – Blaise Pascal


> there wasn't a concept of clean code but to write code as simply as possible

Sounds good "on paper" - in fact, is tautologically true - but it's hard to find two people who agree on the definition of "simple". You say "not having thousands of classes with indirection", and I've definitely seen that over-design of class hierarchies create an untouchable mess, but I've seen designs in the other direction (one giant main routine with duplicated code instead of reusing code) that were defended as "simple".


A lot of complexity comes from premature scaling due to cargo cult or ergonomics.

But I argue a lot of complexity and bugs comes from poor/unclear/conflicting thinking. Especially when it crosses boundaries between multiple developers who had to modify it but didn't truly internalize that part/design of the code.


Bunch of if statements can be described as not simple. Some things in code can only be described as simple. Do those things.


I've seen most of the architectural problems in consulting - it's amazing how a team of clever engineers can take a simple thing and make it sooo convoluted.

Microservices, craptons of cloud-only dependencies, no way to easily create environments, ORMs and tooling that wraps databases creating crazy schemas... The list goes on and on; if you're early, you could do a lot worse than create a sensible monolith in a monorepo that uses just postgres, redis, and nginx and deploys using 'git pull' ...


The worst architecture I ever saw came from consultants, who built the initial bits of a startup I was hired into. It was nice to have a no-longer-present scapegoat to shake fists at when frustrated, but over time I came to realize their most maddening choices were at the behest of one of our founders, who had no software experience.


I saw the same thing. Founders asking the world of consultants who would try to deliver and then fail to be a responsible engineer. I started my previous job by telling the founders they were asking for the wrong things and the consultants work needed to be thrown out. Thankfully they listened and we ended up with a TypeScript monorepo monolith deployed to Heroku.


Nitpick: no need for Redis if you have Postgres. It can have comparable performance when similar tradeoffs are used.


That’s just not true as a categorical statement. Performance aside redis has all sorts of interesting data types, operations and primitives that pg doesn’t that you might want to leverage. It fulfills a different role


Can you elaborate? Is postgres viable for caching?


> Microservices, craptons of cloud-only dependencies, no way to easily create environments, ORMs and tooling that wraps databases

So, Spring Boot you mean?


A Spring Boot service doesn't have to Microservice - you can happily fatten it up into a monolith. Cloud-only dependencies would come into play for Spring cloud (or something that is using cloud specific features) - for a "vanilla" CRUD app, they are not needed. Creating virtual/physical environments is out of Spring Boot's scope and better left to external tools though it has support for separate environments via profiles. ORMs/tooling that wraps database doesn't have to be part of Spring Boot - using Hibernate/JPA isn't mandatory; plain JDBC Template with hand-written SQLs would work fine.


>>> Business logic flaws were rare, but when we found one they tended to be epically bad.

oh yes ...

I always bang on to my junior staff that their job was known as "analyst programmer" for a reason. The analyst part matters probably even more than the programmer part. In large companies just discovering what needs to be coded is 90% of the job, (the securely coding it in the constraints of the enterprise the other 90% while the final 90% is marketing your solution internally)

Anyway .. yes


> In large companies just discovering what needs to be coded is 90% of the job

Yes, but that is quite massive dysfunction of those companies. Meaning, we can yell at analysts-programmers as much as we want, what really needs to be fixed is the process that makes finding out requirements so ridiculously hard.

And yes, I work in one of those companies, it very clearly is dysfunction.


I think this can only change when we, as a society, expect code literacy from every person that finishes high school.

I don't mean expert programmers, but at least being able to read basic pseudocode algorithms.

It's hard to describe a problem if you don't even understand any language.


oh hell yes. Software literacy in my book (30,000 words still no end in sight) is literally, literacy.

Look I automate almost everything i can see. And where I put effort and focus the software that is a force multiplier for my brain (or a bicycle of the mind if you like).

But so often in a large company or normal life, there is a great gulf that the virtual world cannot - yet - cross. ut more and more we shall.

One thing that's just silly is I take photos on my iphone of bills and letters. I cannot be arsed to navigate the awful dropbox API but I would like to store them under "insurance" or whatever. Fuck having some AI monster read the bill. so I played with Pythonista and can just run an action after a photo - and it gets moved. It's my solution, not an app. And that's software literacy - where you can, write, not on paper, but on the world.


The "magic AI" has undone years of coaching management about software expectations.


> discovering what needs to be coded is 90% of the job,

But you still have to predict based on a two-sentence description in a JIRA ticket how many "story points" it's going to take with 95% accuracy a dozen times within the span of a single "sprint planning session" every two weeks.


Oh my god - I hadn’t heard the phrase “story points” in a few weeks and now I will have nightmares tonight!


This goes with doing the first 90% of the work, then the second 90% of the work then the last 90% of the work.

And engineers multiplying their initial estimate by 3, the project manager then multiplying that by 3 and rounding it up to be ten times more than the initial estimate.


>I always bang on to my junior staff that their job was known as "analyst programmer" for a reason.

I can't help but think about Tobias Fünke. Especially with you banging on your junior staff.


I suspect it's a British (perhaps commonwealth) colloquialism - 'to bang on [about something]' is to go on and on and on talking about it, with some implication of 'too much' or obsessiveness.

(Also, notice it's 'bang on to' the staff, not 'bang on' them. That is, the staff are the indirect object; the thing which is being said - banged on about - is the direct object.)


Yes, I bang on to my staff (talk endlessly to them) rather than bang my staff (have sex with them) ... or another colloquialism, to "bang my staff" which is a solitary activity that frankly you can guess from here.


I'm not sure you watched Arrested Development, but I was thinking about this scene where the character says he sees himself as the world's first analyst and therapist, analrapist for short.

He always says things that are non-sexual but have an almost sexual ring to them. And while I did understand the bang on to from context, it was exactly the kind of thing he would be saying. Together with the analyst and programmer, analpro for short, or something of the sort.


And I explained it because (as a native BrE speaker as I suspected the one you replied to was) it didn't have any such ring to me at all, it read perfectly naturally.

(I have seen Arrested Development fwiw. Didn't care for it, but I've seen it.)


The world's first combined analyst and programmer -- an Analrammer for short.


Nice, I was thinking Analpro, but that's also good!


> ...just discovering what needs to be coded is 90% of the job,...

Absolutely. The tech part is relatively easy. Deciding what to build, that's where the friction and magic happens.


Your wording is ambiguous.

Are senior staff also analysts? Why or why not?


> Generally, the major foot-gun that got a lot of places in trouble was the premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs.

Finally, someone said it


It is interesting, I've been at a company for a few years now and we've been slowing trying to break apart the two monoliths that most of the revenue generating traffic passes through. I am on board with the move to microservices. It's been a lot of fun but also a crazy amount of work and time and effort has been spent to do this.

I've pondered both sides of this argument. On one hand if this move had been done earlier it might not have been as difficult a multi-year project. On the other hand, when I look at the Rails application in particular, it was coded SO poorly that I if it was just written better, initially, it wouldn't even need to be deconstructed at this point. Also, if the same engineers that wrote that Rails app tried to write a bunch of distributed, even-driven microservices instead of the Rails app, we would probably be in even worse shape. (ᵔ́∀ᵔ̀)


Have you considered a serious refactor instead of a migration?

I mean, just start with a cleanup session and proceed from there. Work on bit at a time and don't get too far from a working system.


Are you me? o_0. Shockingly similar situation.


You two might be colleagues lol


Usually a link to a humorous YT video would be inappropriately uninteresting on HN, but this classic and brief satire of microservices is actually quite on point about precisely what is so dangerous about a microservices architecture run amok

https://www.youtube.com/watch?v=y8OnoxKotPQ

Summary: really trivial sounding feature requests like displaying a single new field on a page can become extremely difficult to implement and worse, hard to explain to stakeholders why.


This was 100% true for that startup I worked for as a side job. They would have been so much better off just building a standard java, PHP or .NET back end and calling it a day.

The head engineer (who had known the guy funding the thing since childhood) had no clue how node, stateless architecture, or asynchronous code worked. He had somehow figured out how to get access to one particular worker of a node instance, through some internal ID or something, and used that to make stateful exchanges where one worker in some microservice was pinned to another. Which goes against everything node and microservices is about.

I tried to talk some sense into them but they didn’t want to hear it. So for the last six months I just did my part and drained the big guy’s money like everyone else. I hate doing that - way more stressful than working your ass off.


Its kind of discouraging to see the part where he says almost no one gets web tokens right the first time. Working on projects as someone entering the industry, its pretty clear that security is the most important part of a web app, and its so seemingly easy to get woefully wrong, especially if you’re learning this stuff on your own and trying to build working crud projects


It's a chicken egg problem. Developers use JWTs because it's what they think they know. Companies build libraries to support what developers are using. Security engineers say JWTs are easy to screw up [1]. Newer frameworks offer ways to move off of JWTs. New programming language comes out. New frameworks built for that programming language. What is someone most likely to build first as an integration? What developers are using. JWTs become defacto for a new framework. Security engineers report the same bugs they've seen. Even more languages and frameworks come out. Rinse. Lather. Repeat. Write up the same OAuth bug for the 15th time.

[1] http://cryto.net/~joepie91/blog/2016/06/19/stop-using-jwt-fo...

Edit: I was actually writing this code tonight myself for a project instead of it already being baked into the platform framework because SSO is only available as an "enterprise" feature and it's $150 a month for static shared password authentication. So market forces incentivize diverging standards.


That flow chart in the shared link is very funny! Just this year, I was forced to migrate to a new internal authentication framework that... drumroll... uses JWTs for session management. Google tells me that it was already discussed on HN here: https://news.ycombinator.com/item?id=18353874


JWTs solve problems about statelessness. Most companies don’t have these problems and are better off with stateful basic auth tokens/cookies that are widely understood and supported and can be easily revoked.

Also, signed and/or encrypted communication is usually easier to implement without involving JWTs.

Best thing to do in security is to not roll your own and instead use trusted libraries that have industry-reviewed sane defaults. One way to check: look at the issues and PRs in the public repo and see if security-focused issues are promptly addressed, especially including keeping docs up-to-date. Security professionals are pedantic (for good reason).


Asymmetric cryptography solves problems of statelessness: i.e. encrypt your sensitive|read-only data with your public key, decrypt it with your private key, beep boop, you can now use your client as a database. JWTs are a whole other unnecessary lasagne of complexity – not good complexity but random complexity, like the human appendix – which invites bugs and adds nothing above the former in most implementations. (Hell, my current company generates JWTs and then uses them as plain old 'random' keys to look up the session data in a database. It's hilarious but also awful.)


Well, asymmetric cryptography is not even needed in the most common case, i.e. when you are using the client as a database. Symmetric crypto is enough, because it's your server that both encrypts/signs and decrypts/verifies. Asymmetric crypto may be strictly needed only if the sender and the recipient are different. And there is still an issue that the malicious client can return old and outdated but validly signed data - which you can't solve without either a server-side database or accepting old data up to a certain limit.


Yeah, that's true, actually. As best I recall, I just meant that that is what people use JWT for regardless, and I wanted to convey that the only part doing the useful work there is the 'asymmetric crypto' part. I didn't want to get into the territory of providing alternative suggestions, only breaking down what is useful about JWTs when used for that purpose.

As for old and outdated data, I should think that's easily solved by having a 'created' and 'modified' stamp in the encrypted data, much like you have on an inode.


> Most companies don’t have these problems

Can anyone cite a single real world example of a fully stateless system being run for the purpose of business? I ask this every time JWTs come up and no one can answer it.

As soon as you tap the database on a request for any reason, whether it's for authorization or anything else, you might as well kiss JWTs goodbye.

Then again, just don't use them anyway, because they have no benefit. Zero. Disagree? Prove it. I'm sure there's some infinitesimally small benefit if you could measure it, but the reality is that JWTs are saving you from an operation that databases and computers themselves are designed to be extremely good at.

Don't use JWTs. They're academic flim-flam being used to sell services like Auth0.


They can be helpful if you have services that need to call other services on behalf of a user request.

For instance, user A calls Product service for Product information but that response also includes Recommended Products and Advertisements from those two services. Product service can pass the JWT from the client to Recommended Products and Advertisements which removes the need to establish trust between those internal services (since authentication and authorization info are just passed around from what the client provided).

You can also use them in federated auth schemes where the issuing system is separate from the recipient. I think the use cases are pretty similar to SAML for this type of system but with a smaller "auth token" size.

Just because you're accessing a database on a request doesn't mean you're accessing the database that stores the authorization and authentication info.


The problematic word is "THE" database. The subsystem that you hit can be not stateless, but can use a separate database that doesn't contain authentication data.


I can only provide verification of the counter example.

Having worked on some VERY large web services, the session was tracked on the back end and instantly and trivially revocable.


It's nuts to me that so many companies have moved off cookies for web app auth state. They're simple, they're well supported, they require very little work on the browser side, and the abstractions around them on the server side are basically bulletproof at this point.

I see all this talk about authentication, and it's just literally never been a problem or concern for my company.


Aren't JWTs just fancy cookies?


JWTs are frequently stored in LocalStorage which means that any XSS is able to leak the JWT.

Cookies, on the other hand, can be configured to be HTTP-Only and inaccessible to JavaScript on the page. That prevents somebody with XSS from leaking the value without a second server-side vulnerability or weakness.

In addition, JWTs are impossible to revoke without revoking _all_ sessions. This is the biggest weakness, imo, and the reason that they shouldn't be used client-side.

I'm a huge fan of the approach the Ory is taking with Oathkeeper and Kratos: https://www.ory.sh/docs/kratos/guides/zero-trust-iap-proxy-i...


Can't you revoke them the same way you revoke any other auth token and put the token ID in a database somewhere?


it won't be stateless then, and you might as well just use traditional sessions then


Sure, but browsers have done a lot of work to make cookies far more convenient (they're automatically sent with requests, you have browser APIs to work with them), and secure (Secure, HttpOnly, SameSite, etc.)


Cookies are a storage and transport mechanism and JWTs are signed JSON blobs. You could put a JWT inside a cookie.


Why not look into an open source auth solution such as supertokens? It's almost free and you can self-host. That way you implement your own auth system but the security issues are mostly dealt by them.


Yesterday I was working on updating code that implements Microsoft Open ID Connect (produces a JWT).

Their documentation [1] is exceptional - all the gotchas and reasons for practices are clearly explained and there’s a first class library to handle token validation for you. I even ended up using the same library to validate tokens from Google.

Perhaps not all vendors produce equally well written documentation but I think it’s a lot easier to get it right today than it was 5 years ago.

1. https://docs.microsoft.com/en-us/azure/active-directory/deve...


That's usually because security is a bolt-on instead of bake-in within the control and data structures themselves. Too many people interpret "Make It Work Make It Right Make It Fast" to mean security is implemented at the "Make It Right" stage, when it should be at the "Make It Work" stage. That's if they're the lucky ones who get security designed in from the beginning into the architecture.

We're paying for the sins of that in Unix these days, the kernel attack surface is in-feasibly large to remediate to correctness anytime soon (if ever?).


I think there is still more to it that just not taking it seriously or planning for it.

JWT in particular has the weird quirks you need to know to prevent encryption swapping attacks, and I'm sure there's more traps I myself am not aware of. At this point I think security can be seen on the same plan as legal: assuming a random dev will be able to plan and navigate out all the issues by sheer common sense hasn't been a viable approach for long now.


> At this point I think security can be seen on the same plan as legal:...

Considering how Uber ignored legal ramifications of ride sharing intersecting with incumbent regulations until they were dragged into courts, that paints a potentially rather grim picture of the equivalent in software security. But your gist sounds more along the lines of, "include the experts along at the beginning of the ride".

When I said security as a "bolt-on", I should have been more clear. Most of the time when I see it happening, it has been at the behest of the business stakeholders overriding the earnest developers trying to include the security teams from the beginning, but waved off with "it can be added later".

The business stakeholders see in their real life housing contractors walk into finished houses, attach some doodads, pop in some batteries to wireless sensors and the central base station, and ta da!, they "have security"! And think, "just how hard can it be to do the same in software?", dismissing what their tech leads try to tell them.

There is a large element of the principal-agent problem here as well. Shiny proofs of concepts and shallow implementations get immediate bonuses and promotions. Taking 1.1-2.0X as long to implement the right way, the result of which is no drama and no discernible difference to the casual business user, get no or even negative recognition. The incentives structure the choices. There are no incentives that structure payouts over the long-haul tying back to original historical choices, with an increasing gradient of the payout the longer the original choices prove sound. Naturally, since measuring that accurately would be impossible.

The closest I've come to an analogy that works in these discussions but not as often as I'd like is this. I don't throw together four tilt-walls, top off with a roof, move in with a 20-ton safe, open the doors for business and call it a regional bank depository. There are bedrock anchors, sensors, inner reinforced concrete walls, SOP's, audits, man traps, insurance reviews, and on and on, that get designed in before the foundation is even poured.

Clients who didn't find this convincing wave it off with a, "haha, this isn't that important lol". I want a better analogy.


That's an interesting angle. Uber ignoring legal ramifications had wildly different effects depending on the countries, some completely shutting out Uber as a result, and more lax places accepted dealing with the consequences that surfaced one after the other.

I'm in a country from the former block, and see a bunch of naive projects pitched by the business side that gets shut down pretty fast by the legal team as nightmares in the making (e.g. stuff that boils down to "shouldn't it be easier to take money from a variety of sources and move it to other users ?") that would sink the whole company when shit hits the fan.

My hopes would be on more security issues slowly becoming legal issues (not unlike GDPR, breach disclosure duty and associated penalties etc.) but I can understand how dire it feels in countries where legal grounds were shaky in the first place.


That's how a software implementation by a newbie works. You can't expect a newbie to take security into account before the software is implemented. Instead, there should be a custom to rectify all the security errors in the end before the software is pushed to the server.


That’s an almost impossible task. Code gets immensely more expensive to understand or modify based on its age. If you don’t bother thinking about security until the 11th hour, it’s too late. Things will slip through.


This is an interesting write up!

The only question I have is around your point on monorepos - every monorepo I’ve seen has been a discoverability nightmare with bespoke configurations and archaic incantations (and sometimes installing Java!) necessary to even figure out what plugs in to what.

How do you reason about a mono repo with new eyeballs? Do you get read in on things from an existing engineer? I struggle to understand how they’d make the job of auditing a software stack easier, except for maybe 3rd party dependency versions if they’re pulled in and shared.


Monorepos do require upkeep beyond that of single-product repositories. You need some form of discipline for how code is organized (is it by Java package? by product? etc). You need to decide how ownership works. You need to decide on (and implement) a common way to set up local environments. Crucially, you need to reevaluate all these decisions periodically and make changes.

On the other hand... this is all work you'd have to do anyways with multiple repositories. In the multi-repo scenario, it's even tougher to coordinate the dev environment, ownership, and organization principles - but the work isn't immediately obvious on checkout, so people don't always consider it.

Regarding auditing, I have always found that having all the code in one place is tremendously useful in terms of discoverability! Want to know where that class comes from? Guaranteed if it's not third-party, you know where it is.

Not to minimize the pain of poorly-managed monorepos - it's not a one-size-fits-all solution, and can definitely go sideways if left untended.


Probably because:

1) It's easy to miss a repo, if you don't have a list of them all somewhere.

2) It's easy to get out of sync with what version of your software corresponds to what branch/tag in each repo.


> 2) It's easy to get out of sync with what version of software corresponds to what branch/tag in each repo.

I'd like to hear how others solve this. The way I've addressed this is I bake into the build pipeline some way to dump to a text file all the version control metadata I could ever want to re-build the software from scratch. Then this text file is further embedded into the software primary executable itself, in some platform-compatible manner. Then I make sure the support team has the tooling to identify it in a trivial manner, whether a token-auth curl call to retrieve it over a REST API, or what have you. This goes well beyond the version number the users see, and supports detailed per-client patching information for temporary client-specific branches until they can be merged back into main without exposing those hairy details into the version number.

While this works for me and no support teams have come to me yet with problems using this approach, it strikes me as inelegant and I'm for some reason dissatisfied with "it ain't broke so don't fix it".


In our case we abandoned individual repos and went back to a monorepo to solve this issue. In theory the separation of code was nice, but in practice it was a real pain when a service added new APIs you wanted to update another service to use it.

All of our services do also print out in their startup logs what version they are based on git branch name and commit. Monorepo or not this was useful.


We have a releases repo that takes in the git version SHA for each application and handles deploys. It's... ok I guess. Just another example of complexity to meet the growing complexity of the system.


> 2) It's easy to get out of sync with what version of your software corresponds to what branch/tag in each repo.

That's what the `[dependencies] my-lib = "1.0"` was supposed to solve.


The thing I'm working on has 5 main repos that all run (yarn start) for the app to be fully functional.

I need to put that down somewhere the order/matching branches.


  find / -type d -name .git


As an auditor you don't have anything checked out locally yet, so no .git will exist. If you ask an individual developer or randomly picked developers, they will only have their specific repos checked out. If you look at the server hosting the repos then yes you may get them all. Assuming they are all on one server...


Once I worked on a team that none of the engineers knew that jwt payload was readable on the frontend. They were in shock when I extracted the payload and started asking questions about the data structure.


It's kinda baffling that JWTs are unencrypted by default, to be fair.


It's the whole point - they're signed, not encrypted.

You should use opaque tokens instead if you don't want the frontend or other services that have access to the token to read it.


In many cases, the front end doesn't need to read the JWT, just pass it on to some API.

An encrypted JWT is still convenient as it can be decrypted and deserialized into a common data structure using existing libraries.


One benefit of JWT as specced is that those APIs you pass it on to don't need to share an encryption key, which makes rolling the key without causing downtime impractical. With OIDC, for example, frequent key rotation helps you create a better security posture.

The benefit of signing versus encryption is many services are able to verify the authenticity without needing a shared secret. That includes untrusted services, which is frequently the case with OAuth 2.

You can encrypt a JWT token, but at that point it's not semantically a JWT anymore. It can be any JSON at all and doesn't need to match the JWT structure. The first and last parts of a JWT are a signing algorithm and signature, respectively.


I, for one, enjoy not needing to coordinate an encryption key between my service and my IdP.


I also enjoy not worrying about how the next field I add to my JWT can be exploited after a base64 decode :-)


How else could frontend read them? If you don't need this then regular cookies are better.


It's the other way round - the front-end shouldn't need to read JWTs, just pass them on.


if your frontend is interrogating the jwt you're doing it wrong


Isn't it pretty common to read the expiration so you know when to refresh tokens?


It is, among other things like username or user e-mail address.

This is also, together with backend scalability, a major selling point for JWTs. Otherwise one might just as well use regular session ids in cookies.


I mean, I'd be rather surprised too. What were you using JWTs for, if not asymmetric crypto? Presumably you weren't using it to sign the tokens, if they were surprised the client could access them? And I can't see many contexts where you would use it with a shared secret, where just sending JSON over HTTPS wouldn't suffice. (I'm assuming 'frontend' here denotes a client on the other side of the trust boundary.)


I'm not getting your comment. The payload is not encrypted. I think you refer to the signature. The payload can always be decoded. It's just JSON into base64.


Ah, sorry, that was what I was referring to when I said "Presumably you weren't using it to sign the tokens, if they were surprised the client could access them?". I classed that as too obvious for it to be what you meant.


For SSO? The biggest advantage (besides being stateless) about a JWT is that it is signed with an asymetric key and the client can validate the authenticity of the content. You can encrypt the content of the token, but that does not make to much sense (because the client anyway needs to decrypt it).


> For example because it’s so fast, [MD5 is] often used in automated testing to quickly generate a whole lot of sudo-random GUIDs.

Actually, it’s because programmers are lazy. GUIDs or UUIDs are 128-bits and MD5 produces 128-bits. A string like “not-valid” is not a valid UUID, but MD5(“not-valid”) is both possible to format like a UUID when output as hex (with dashes) but also self-descriptive - so you can name the token when generating it in a fixture function and know how to regenerate it later in a test, for example.

All the normal ways of generating UUIDs, including v6 and v7, are about trying to make them unique and collision resistant. But that’s nonsense when you want deterministic, reproducible tests. Hard-coding 32 characters is too much work, ain’t nobody got time for that. Magic numbers? Pfft. Just MD5 and write your own text…

Pro tip: have data model creator helper functions include a counter that resets every test (every time the database resets) and then assign a UUID like MD5(`InsertTableName-${counter}`) that way you have a unique ID that’s also easy to predict/regenerate.

That said… I’ve always personally preferred simple database IDs to be generally preferable over using UUIDs. It’s easier to understand THING 20 as an ID than 32-odd characters. But UUIDs are an industry standard, so they end up in your test code everywhere anyway…


> I’ve always personally preferred simple database IDs to be generally preferable over using UUIDs.

Unless you start migrating data between environments and want references to be alive.

Anyway, if you need a hardcoded GUID for tests or what, paste this into PowerShell: [Guid]::NewGuid()

Not arguing, just developing for a system that uses guids as primary IDs and writing tests for that system. I don't even need to hardcode GUID, as within test bootstrap I'm creating objects with generated IDs I can reference later for comparison.


I’ve done that before too - but it’s always possible if you run tests often enough that you’ll get an ID collision that randomly fails a test and causes a developer some grief. Easier to not use random sources of data as a rule of thumb within your unit tests.


Re. Security, predictable identifiers are often a vulnerability. Hence, don't present database IDs in public (ie. anywhere). Instead, generate unique non-predictable identifiers at creation time, and use a UNIQUE constraint (or similar). https://cwe.mitre.org/data/definitions/340.html


It’s true that in production, if it’s a security risk that IDs can be guessed, don’t make them predictable. But by that same logic you would have to stop using REST because it can let you guess an ID?

This advice is classified as varies by context because it doesn’t always apply. In test cases, predictable behaviour is better than randomness. There are exceptions, of course. Chaos monkey, fuzzing, and literally testing algorithms for uniform randomness, etc.

That said, you could get the best of both worlds if you used MD5 HMAC to create a UUID from a predictable number and a secret preventing guessing. If that’s your goal…

Of course, the secret could be trivially reverse engineered with MD5 if someone knew the ID number and algorithm to generate it, but I’m not sure we have the patience or need to use PBKDF2 or similar to create predictable, unguessable ID numbers… after all, it would be just as easy to use regular guessable numbers and put strong authentication so it doesn’t matter if you guess correctly.


Clean separation of concerns is good architectural practice. Whilst you are of course correct that you can potentially rely on mitigations (eg. authenticated APIs) if those subsystems change in future you have an emergent scenario producing undocumented vulnerabilities. Security people call this 'defense in depth' - ie. make sure you cover your ass religiously, all the time.


At what point is there something beyond a framework - a SaaS in a box perhaps, that just avoids many of these basic problems (oh and the HR, legal, etc problems) of starting a startup. Startups are not snowflakes apart from that one little core competency. In short most serial founders say the second one was easier, simply because they followed the template ground out in the first.

Would it be easier to start with that template?


There are a ton of products like this out there that build on popular frameworks:

Saas Pegasus (https://www.saaspegasus.com/) for Python/Django, Bullet Train (https://bullettrain.co/) and JumpStart (https://jumpstartrails.com/) for Rails, Spark (https://spark.laravel.com/) for Laravel, Gravity (https://usegravity.app/) for JS

You can find an even bigger list here: https://github.com/smirnov-am/awesome-saas-boilerplates though those are the market leaders (I make one of them and follow things closely)


There are some good open source options like https://getzero.dev/


In the Rails world we have https://bullettrain.co and https://jumpstartrails.com which both have open source templates for building SaaS services.


I've seen boilerplate applications for <insert tech stack> but the open-source ones tend not to be great, and the closed source ones could be great - but I'm not willing to pay $XXX for code I haven't seen.


all that exists as SAAS products that target non-technical cofounders, and it is very hard to justify to co-founders, investors, advisors .. any investment in time in something that is not your core problem, and I think is for a good reason.


Does learnings ever mean anything different than lessons? How did this enter corporate-speak?


It pains my prescriptivist instincts to say so, but FWIW I do interpret them differently, frequently as the complementary sides of a single event:

A learning is a successfully learned thing. Or a received lesson.

A lesson is a taught thing. When effective, this would be one path to a learning for the receiver.


I suppose that makes sense. Personally, I would write lessons learned rather than learnings in part to get rid of the red squiggle. My dictionaries flag "learnings" as a typo.


"Lesson" also bears this meaning of something learned, though that would make "learnings" more precise, and therefore distinct.

My experience seems to be that people who use "learnings" are referring to the lessons learned by others, usually subordinates and is used instead of "lesson" because of -- being sensitive to how harsh it sounds to say "group X learned several lessons".


Ugh, seriously. Like utilize instead of use.


In earlier days, I thought it'd be fun to have two versions of my resume. They had parallel content, but one was fluffed up in corporate-speak, and the other was human English.

I included links, e.g. resume-fluffy.html or resume-direct.html, and (somewhat seriously) suggested that hiring managers read the first and tech evaluators the second.

It made for some light humor in discussions with hiring groups. And also some effectively-paralyzed recruiters, which added to the fun of the former.


Learnings can be more easily interpreted as "something I learned', while lessons can come across as 'lessons for you'.


I came here for this comment.


it's almost - almost - as bad as 'vinyls'


Nit: "...or example because it’s so fast, it’s often used in automated testing to quickly generate a whole lot of sudo-random GUIDs."

ITYM: "pseudo-random"

Although I do like the mash-up concept of "sudo random"-ness.


It's higher-privileged randomness. As in, all GUIDs are random, but some are more random than others.


> All the really bad security vulnerabilities were obvious.

All the really bad security vulnerabilities that were found were obvious?

One is more likely to find things that are obvious?


But the auditers were experts and used all the latest and greatest tools. I think they are implying that if they couldn't find it with code inspection then a hacker wouldn't find it by probing.

Of course, they might not find zero-days but most hackers wouldn't find those either.


When a team is so focused on the todo list they sometimes forget the obvious mistakes they still needed to fix.


Yeah this was a great article overall but that stood out as sus. Also the “last few hours found the most stuff”. Seems like they could probably stop the audit once they found enough problems, which skewed hard to easy to find and or last time to look.


Although I strongly agree with it in principle, I'm growing seriously tired of the "simpler is better" argument. It hides all the nuance, hard work and, guess what, complexity, that goes into making something simple.

Simplicity is different to each person. What seems like unnecessary abstractions with complex inner workings often exist to actually hide other complexity away.

Know the in and outs of Kubernetes? Maybe it's easier (simpler) for you than directly provisioning different pieces of infra.

Have a team of over 10 [1] working on the same monolithic codebase? Productivity while maintaining sane separation of concerns might increase going for a more domain-service-oriented architecture [2].

How can we teach what simplicity is instead of just calling it better or saying arrogant platitudes like KISS?

[1] yes, the number is that low, and often lower [2] yes, "micro" services does seem like a mistake in most cases


(op here) I actually completely agree - you're right: "simple outperformed smart" doesn't point to a useful, nuanced solution. I wrote more in-depth here about slightly-more-specifically where there are problems, curious your thoughts, feel free to DM me or comment on the blog (this thread is kinda dead)! https://kenkantzer.com/5-software-engineering-foot-guns/.


As developers, I've come to believe that complexity is the worst sin we commit. Everything we talk about can be traced back to this issue.

This is largely due to paying attention to Rich Hickey and learning Clojure.

https://www.youtube.com/watch?v=SxdOUGdseq4


Ah, yes, I’ve felt the pain of an unnecessary microservices migration. It ate time for years and the core was still a mess


I think people really exaggerated with the microservices trend. Today, I recommend to keep code in the same executable unless there is a good reason not to. Good reasons include:

- Stateful vs stateless: databases and message queues should be your first (hopefully off-the-shelf) "microservices".

- Different lifecycles: API serving vs background task

- Different security needs: Frontoffice vs Backoffice code

- Different teams: But make sure to introduce a clear customer-vendor relationship.


> Custom fuzzing was surprisingly effective. A couple years into our code auditing, I started requiring all our code audits to include making a custom fuzzers to test product APIs, authentication, etc.

Any recommendations for a good fuzzing tool for testing both web-based APIs and language specific APIs (C and Java in my case)?



Paid but integrates with CI/CD - https://www.code-intelligence.com/


I'm thinking about introducing fuzzing too. And property based testing. Manual testing only is too limited.


> Surprisingly, sometimes the most impressive products with the broadest scope of features were built by the smaller teams.

Would probably not be surprising to Fred Brooks author of the _The Mythical Man-Month_, but as much as we think that book is famous/impactful, it still surprises us!


"All the really bad security vulnerabilities were obvious."

I used to work for a company that did a lot of acquisitions and I often involved in working with teams at newly acquired companies - although it wasn't my main focus I did used to ask some simple security questions and it was remarkable what these uncovered. I literally had people run from the meeting to fix services after I had asked a simple question....


Can you give some examples of some simple questions you would ask?


By the nature of that particular domain a lot of systems delivered important documents (often containing data of rather extreme commercial sensitivity) to customer organisations.

A standard question I always asked was "given a URL that links to a document how do you authorise access" i.e. what happens if someone who is logged in to the site in question gets a link to a document and passes it to a friend via instant messaging.


Ha recognisable. A very annoying problem to solve with web tech too - there’s no perfect solution to this problem (that I know of).


Interesting that he feels the default state of software security has improved a lot in the last few years.

Anecdotally I'd also agree with that. Certainly better defaults and more secure libraries is a major factor. I haven't noticed a huge increase in developer security awareness, although I'd say it is also better than 10 years ago.


Unfortunately, I get the feeling that that is compensated by increasing risk. Attackers have found clever ways to monetize their work beyond just "fun". Hence, I feel the overall "security damage" has kind of stayed constant.


For sure, the threat level hasn't dropped. What is different is that attackers have to use different techniques, since the software isn't as easily exploitable as it used to be. Ten years ago, any pen test of a web application revealed loads of vulnerabilities. These days I rarely find anything really significant (although maybe I work at better places!).

This is not to say that software isn't exploitable any more, only that the cost has been raised sufficiently to make cheaper attacks more attractive (e.g. phishing).


yes. When I get called in as a senior consultant for some business app, it's always for the same reason: development speed has crawled to an almost stop. And it is always caused by unnecessary complexity.

I blame the fact that design patterns and specific architectures are being taught to people who don't understand the problem those things are trying to solve and just apply them everywhere.

Any senior dev or architect should always live by this maxim: make it as simple as possible.


A lot of unnecessary complexity comes from the use of library-like objects instead of plain functions + data.

A recurring theme is "refactoring" specific functionality away into a generic object, and the consequence is a disconnect between the problem you are solving and the problem the object is solving. I often see objects that handle every possible input, ignoring that the business is only concerned with a small subset of inputs. You end up with a lot of "if impossible_condition_if_you_actually_look_at_your_data { /*some_dead_code*/ }".

Another side-effect can be similar/identical input validation done at different levels of the stack. If you have object A calling object B calling object C, you sometimes notice how each one of those does the same exact thing in isolation of the others. You end up with a lot of extra checks and error handling because developers insist on writing their code in complete isolation from the context, pretending they don't know how it will be used.

Of course, everything I described can also be "achieved" with plain functions + data, but (anecdotally) they usually produce better results, perhaps because it helps the devs not think in terms of objects.


Great article.

To expand a little on why “Keep It Simple” is so powerful: less code = less bugs and less security issues. Less code = easier to change.


Also cleverness is overrated. You might be able to be clever once, but mid-term you will struggle to keep up with "collective cleverness". Sure today you might implement a better authentication code than the one offered in your favorite framework, but will you keep up with the new cleverness that will pour into the framework tomorrow?


I don't think less code = less security issues. Often using those secure-by-default frameworks require more code.

The simplest example in PHP (highlighted in the article for its default-insecurity):

    echo '<h1>Hello ' . $_GET['name'] . '</h1>';
is vulnerable to XSS.

    echo '<h1>Hello ' . htmlentities($_GET['name']) . '</h1>';
is not vulnerable


I’ll go much further…

I think it creates severe cultural problems. It creates the belief that problems are more difficult than they might be, it creates the belief that a particular solution may be more valuable than it actually is, and then it biases future team expansion and retention. Perhaps more ultimately, if the complexity creeps in before the real challenge gets do, it radically affects the team’s ability to reason about it.


Less code and simple are not often related.


Thanks for writing this down @Ken. You're another example that learning the failure modes is the main benefit of being a consultant for many clients. Since I'm sure you began each audit meeting with the CTO/VPE and possibly others like senior devs/architects, how much of what you ended up finding in the audits was predictable based on those meetings? (I'm guessing almost everything).

My follow-up question is that once you heard about their snazzy microservices architecture, were you ever surprised by it being a good decision based on the product type and how well it was engineered?


Honestly, early on in our code auditing days, there were surprises - a lot of the more meta-lessons in here fomented in the last few years, looking back, and would NOT have been something I’d have thought early on.

On the other hand, regarding micro-services question: no, not even one surprised us positively. Now keep in mind, we didn’t audit absolutely massive FANG companies where mice services are probably necessary for org reasons(though a few unicorns/near-unicorns).


Tangentially, I'm also guessing you can learn a lot by asking if they have an API for partners/customers, and if their application developers use the API internally, and then by looking at the API to see how well it is architected. When we integrate with 3rd party systems it's pretty easy to detect the well engineered systems from the ones built with baling wire and duct tape.


I've been a part of 3 startups, 2 of which failed and are no longer around. What they all could have benefited from was a business audit.


Interesting to see the JWT issue. I have recently found a vulnerability in a publicly traded CRM SaaS that was also about JWT claims validation. It’s also quite amazing that popular Auth SaaS rely so heavily on JWTs with 1 hour expiry times, making it impossible to log users as you can’t invalidate the token for the next hour.


I think this causes so much confusion but it really shouldn't. A bearer token means just that, if you have this token (JWT or otherwise) then it proves you have access to something period. Unlike opaque tokens, JWTs have a built-in expiry mechanism so they can be used for time-limited operations, which is why people use them for authentication.

Yes, if you issue a long-lived token, you cannot normally revoke that after-the-fact but that is the point of the token, to avoid multiple lookups to an auth service for every single API access. In a distributed/scaled/microservices architecture, this would be unmanageable.

Now people often proffer some kind of backend system to try and maintain expired lists etc. but what is the problem you are trying to solve that couldn't be mitigated with a reasonably short-lived JWT like 1-2 minutes? Issuing a new one every 2 minutes while the user needs to do something is relatively painless compared to, perhaps 100+ calls to APIs each needing an auth call in the same time.

When you logout, the tokens should be deleted by your system. If someone copies the token before it is deleted, then they had access to the system anyway so that doesn't present a risk imho. If they gave the token to someone else, they are delegating their access so they lose out.

All of that said, if you do not have a heavily API-based system, it might be easier to just use creds that need checking with each call and do it the traditional way.


Cool write-up of centralized vs decentralized access control!



"...making it impossible to log users as you can’t invalidate the token for the next hour."

I have no idea what you are talking about here, can you explain this?

I work with systems that have a minute expire time. The only issue is that the clocks on all clients should be in sync with the auth server.


I believe they are referring to the fact that most JWT-based auth systems use one-hour token expiry and have no ability to remotely revoke tokens. You can only revoke the user's ability to get the next token. This often leaves a one hour window between when you want the user locked out of your system and when they are practically logged out.

The only way I know of to implement instant revocation in a system like this is to keep a blocklist of users/tokens that is constantly checked, which can be slow and removes some of the benefits of JWTs in the first place (that they carry all the auth information you need).


Ah! Yes this is why we use an expiration of one minute. For us the extra load that the refreshes give is not a problem.

Keeping a blocklist seems unnecessary to me, you can just lower the expiration time.


> Simple Outperformed Smart. As a self-admitted elitist, it pains me to say this, but it’s true: the startups we audited that are now doing the best usually had an almost brazenly ‘Keep It Simple’ approach to engineering.

My hunch is that this is related to the nature of product-market fit. If a company is very successful, there's a decent shot that market demand became overwhelming at some point early on. That demand, in turn, becomes a strong motivator to keep things simple and ship quickly, instead of writing code The Right Way.

Facebook using PHP might be one of the best examples of this: if their user base didn't explode, maybe they would've taken the time to carefully rewrite their code in Java or Python. But the fact that they would've had time to do that would've made it less likely that they'd become $500b company today.


You're inverting the relationship. Simple solutions (technologically) can approach product/market fit much faster.

Businesses fail for reasons besides tech, but on the tech side when businesses fail (in my experience), it's usually either from unwillingness to serve the sales cycle, or creating a technological solution that is not malleable.


Guilty as charged. I bootstrapped a our startup with just myself and 2 junior engineers in the last year during Covid. Junior in the sense that they are young. But actually they outperform many older engineers I've worked with. We are starting to close some pretty big deals and I'm dreading the moment where I have to turn this into a normal development team. In my experience velocity drops when you do that and you lose a lot of momentum. 3 people can do a lot. 6 people don't do that much more. I'm not so young myself and quite experienced. But I lean heavily on my team for doing the work. I'm the CTO, the CPO, and I need to worry about team management, sales, and a few other things. So, less than half of my time is spent coding. This it the reality of startups. You have to do all of it.

I made some technology choices early on. We use docker but not Kubernetes. There is one server, it's a monolith. There is one language, it is Kotlin. And we even use it on our frontend (web only).

The latter is not something I would do normally or recommend. But both my junior engineers only knew Kotlin and we just went with it and never actually ended up regretting this. This surprised me and at this point I don't feel React/typescript have anything that I need or want. We're doing a lot of asynchronous stuff, websockets, maps (via libremaps), etc. And it all runs smoothly and responsively. Kotlin is kind of awesome for this actually.

Originally our frontend was Android only. We ditched that app in favor of a Kotlin-js based web app that started out as a proof of concept that just more than proved the concept and became the actual thing. At the time we had a demand for IOS and web and no IOS or web developers on the team. Hence Kotlin for the web. When this looked like it was workable we actually lost our Android developer. So the decision to forget about that app was pretty easy. At that point it was half working and full of bugs and technical debt. Fixing that would only half fix our problem because we'd still need IOS and Web. So we did web first. And we are packaging it up with Cordova for those people that want something from an app store.

It's a good lesson on prototyping. If it works, do more of it. At the same time, I normally recommend minimizing risk and not building too many things in parallel. Like building 3 apps for 3 platforms instead of just a web app.

Our server is Kotlin/Spring boot and we use a lot of Elasticsearch because that's what I've been using for the last decade. A little bit of Redis and I've so far found no execuse to use a relational database. But I'd probably end up with mysql if that ever comes up. Done right, Elasticsearch makes for a nice key value store without transactions but with optimistic locking on documents. If I get some time, I add a database for safety at some point. But less moving parts means less headaches. Having just one language means the distinction between backend and frontend is a bit blurry. We have an api client that we use in our spring tests that also compiles to kotlin-js. That library contains a lot of code that we use in our front-end. Model classes, caching layers, functions that call various of our APIs, etc. And it's all covered in tests. All the business logic basically. If we ever need to do native apps, we'll use that there as well.

On the devops front I'm a combination of very pragmatic but also focused. We use stuff that works that doesn't distract us. So, no terraform for a setup we only create once; in 20 minutes. Not worth spending weeks automating but worth documenting. But we do have CI/CD via github actions. So we don't manually deploy anything. And we have lots of API integration tests. If it builds, it ships. No rollbacks; roll forward only. Keeps things simple.

We use Google Cloud and keep our cost low. A couple of VMs, a loadbalancer, a managed redis, and a managed elastic cloud cluster. That's it. Nice and simple.


Hiring and building fast can lead to huge costs, loads of bugs and performance issues

Hiring and building slow leads to multiple rounds of performance tuning early on, which can also lead to lower costs, and gives you a chance to focus on improving the product by focusing on your user experience because you're not in panic mode to raise funds, overhire and conquer the world

We could have many more good software products if companies were focused on long term quality and didn't obsess over growth


Conways law.

The teams and management structure will immediately become a technical debt.

If we let the product ”decide” where boundaries actually exist and team up accordingly there’s a chance to scale and maintain a bit of velocity.

It requires constant introspection, monitoring and scrutiny though. Something I’m constantly thinking about is how to scale that beyond 20-25 developers. Gitlab have a nice section[0] in their handbook on releases and flow of small bits and pieces, and internalizing something like that together with clear domain boundaries could be a ticket.

Basically - never try to resource optimize, always figure out what good flow looks like and find ways to keep it flowing.

[0]https://about.gitlab.com/company/culture/#freedom-to-iterate


> So, no terraform for a setup we only create once; in 20 minutes.

While this may be right. If you do not have a way to "bootstrap" from scratch in a small enough unit of time (minute, hour, day -> whatever you find as acceptable disruption) then you are gonna get screwed badly.

You dont have to have infrastructure up/down everyday. Just this one time will freak you enough to not just have it in the docs. Now this doesnt mean you have to have crazy infra, I just have 3 docker hosts, running a compose.yml each -> but if I lose docker/compose files its gonna take 2 weeks for me to get back.


It's about 30 minutes. All relevant files live in git of course. And I tend to be diligent about documenting things because having to figure out the same shit months later really sucks.

I have plenty of experience doing this stuff; so I know what I'm opting out of. IMHO the price of devops automation can be unreasonably high for small teams. You quickly hit the point where you start considering having somebody do this full time. IMHO that is too high of a price in most small startups. In my case, either I do feature development or devops. Meaning that if I have to pause development on a project for some massive open ended devops project, I might lose weeks/months on a tight schedule. It's never simple. You always get blocked on weird shit for hours/days on end. So, I try to take as much of the pain away. Terraform is a bit of pain that doesn't solve a problem I have. Having to manually recreate something in the case that it somehow blows itself up is OK with me. Unlikely to happen very often. Not worth spending 3 months automating something that might take me hours to figure out. I have better uses for those 3 months.


If people would just admit, and adapt to, the fact that the browser won over native, and Oracle won over Sun, we would avoid this situation with armies of Java developers making rube goldberg variations of basic relational, sysadmin, and gui programming tasks. But then again, how would we employ all the people with these extreme productivity multipliers, as long as our politics and economic system is still pretending that we just had the industrial revolution, and need to man the assembly lines, then Parkinson's law will apply for tech work just like everything else.


When you say browser won over native, are you referring to the fact that software is more commonly accessed via web instead of software actually downloaded and installed on a user’s machine?


I assume this isn't content marketing because PKC doesn't seem to exist anymore. But this post made me really want to get Ken to audit our code.

Are there any vendors that do similar work that people here recommend?


> the major foot-gun that got a lot of places in trouble was the premature move to microservices

I sometimes wonder if the move to microservices isn't just a weird consequence of Conway's law in reverse: make a department of each developer, let them have their thing.

(See also this amazing video about Conway's law: https://www.youtube.com/watch?v=5IUj1EZwpJY )


This is absolutely what microservices are about. It's arguably their strongest strength, because (at least in theory) I can decouple my team from your team and we can _only_ communicate over a strict interface.


You can do that without introducing a HTTP/RPC boundary.


You can, but it requires discipline and/or tooling. With microservices you are very incentivized (I would say forced).


>You can, but it requires discipline and/or tooling.

Pretty much every language comes with a way of exposing a limited API to other parts of the application. Java, as an example, requires you to specifically export the parts of your module that other modules are allowed to consume. If you only export a public API then you've achieved the same benefit as a microservice except now it's type checked and doesn't encounter the pitfalls of a network call.


I agree with you. There ara ways, and they work. If you have different teams stepping on each other toes, they might be disincentivized to keep the separation. Ideally they will not, but without someone enforcing it a team might end in this situation. I see it as a potential social question (like Conway’s law).


Exactly! To be clear to parent commenter, I'm not endorsing microservices to solve this organizational problem, just pointing out it's part of the reason to choose microservices.


At what point should I push for a code audit?

I don't think any of the codebases I worked on ever had a "real" audit. Best case was reviews pre/post acquisitions. An external audit seems like a good thing, but I have no idea how to argue for such a thing.


Point 2. I think this is a common misunderstanding in what engineering is. To come up with good simple solutions to complex problems often takes a lot of experience and domain knowledge.


> the major foot-gun (which I talk about more in a previous post on foot-guns) that got a lot of places in trouble was the premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs.

Preach! Micro services are a solution to a problem that affects effectively 0 startup sized systems. That problem is scale. Micro services are hard. WAY harder than monoliths. They become necessary only once your physical hardware can no longer keep up in a monolithic fashion and parts of your system need dedicated compute and/or storage.

And no, they are not automatically necessary once your engineering team reaches N size either. Introducing network boundaries as a way to scale your engineering organization is a bad idea.


This is a great list.

One minor criticism on...

> Monorepos are easier to audit.

> Speaking from the perspective of security researcher ergonomics, it was easier to audit a monorepo than a series of services split up into different code bases. There was no need to write wrapper scripts around the various tools we had. It was easier to determine if a given piece of code was used elsewhere. And best of all, there was no need to worry about a common library version being different on another repo.

This is much more dependent on the auditor's personal workflows (as well as the relative hygiene of any team's monorepos), rather than being universal. I've found the opposite to be true for e.g. the current orgs that I am auditing: individually split up repos tend to be idiomatically structured, and "just work" as expected more often than monorepos, which more often than not have a lot of custom glue or unusual monorepo-management init scripts.

Comments on the other (generally very good) points in the list:

> Writing secure software has gotten remarkably easier in the last 10 years. I don’t have statistically sound evidence to back this up

I suspect compiling such statistical evidence would also be impossible as detection of security issues has also improved, so any data would never be comparable over time.

> The counterargument to this is that heavily weighting discoverability perpetuates ”Security by Obscurity,” since it relies so heavily on guessing what an attacker can or should know. But again, personal experience strongly suggests that in practice, discoverability is a great predictor of actual exploitation.

This is a tough circle to square because security by obscurity works. It's probably the best security measure you can have in place. But it's bad for two reasons:

(1) The process of obscuring often (doesn't need to, but very often) obscures auditing, which means you end up relying upon obscurity solely. It's not a worthwhile trade-off.

(2) In a simplistic marketing world, the idea of obscurity as a standalone measure is so tempting to non-technical decision makers that I believe it requires a bit of innocent dishonesty about it's effectiveness to dissuade.

> (on auditing dependencies) Node and npm were absolutely terrifying in this regard—the dependency chains were just not auditable.

I agree with the overarching bullet point this is said within, but I see this point about NPM said a lot, and I'm not sure how people are going about auditing or how many language ecosystems they're looking at. I have found Node/NPM to be the best / second best popular system for auditing dependency chains. I have significant experience in this area: the relative consistency of package management config across the JS/TS ecosystem is enormously helpful for software composition analysis - the only package manager configs I've found that may be slightly better is Composer, but the inconsistent usage of Composer by many PHP devs still makes it a little worse than NPM in practice. PIP /PyPi / setuptools is an inconsistent moving target of requirements.txt (is it a lockfile?), pipfile.lock, setup.cfg -vs- setup.py, pyproject.toml, and whatever else. Maven is a nightmare of multiple registry endpoints, and issues parsing custom <dependencyManagement> directives, extensions (without even starting on maven wrappers and pom.xml templating strings). Don't get me started on Gradle. Go's idea of package management is: just pull it from Git; good luck automating it if you've got private repos with any kind of secure ssh auth. I have less personal experience with Rust/Cargo.

> for some reason, PHP developers love to serialize/deserialize objects instead of using JSON

PHP serialize/deserialize predates the existence of the JSON spec., so that might have something to do with it. A lot of PHP code is old.

> Almost no one got JWT tokens and webhooks right on the first try.

Nor the second try...


> All the really bad security vulnerabilities were obvious.

Isn't that just tautological? They are bad because they are obvious?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: