You shouldn't use Kubernetes as a startup

f0e4c2f7 · on Feb 22, 2023

I've spent a lot of time over the years trying to think up the perfect analogy for software development and I've never quite found one.

It seems like they all tend to break down but cooking is ok for some parts.

To that end I see these conversations as

"never use stainless steel to cook a hamburger"

"never use cast iron on a food truck"

"Always use an crock pot for soup"

Especially for startups it just doesn't matter that much. Use whatever you know how to cook with. If your DevOps guy can deploy kubernetes into AWS managed with GitOps designed by GitHub copilot and ChatGPT in terraform in 40 minutes and manage without waking up in the middle of the night...that's fine.

Or if you have a die hard linux admin who wants to buy servers on eBay and host them in the basement and do hardware updates at 2am...that's also really fine.

Stackoverflow literally uses windows servers.

These are all just different methods of cooking. Sure we each kind of prefer our way and insist it makes the food come out best, and sure there is truth to that at a point - infra does eventually wind up mattering.

But mostly we're just nerds arguing about what brand of chef's knife we use and that sort of thing.

moonchrome · on Feb 22, 2023

> If your DevOps guy can deploy kubernetes into AWS managed with GitOps designed by GitHub copilot and ChatGPT in terraform in 40 minutes and manage without waking up in the middle of the night...that's fine.

So your bus factor is one ?

I think it's good to have articles like these. I see overzealous intelligent people over engineering things so often.

Just recently I saw a project where silly overgeneralization made development and testing way harder than it needed to be for MVP, it stretched development by a few months, by the time they launched they missed the investment window (last year with market contracting). If they launched garbage code that worked for MVP they would have had enough users and caught the next investment round to fund whatever refactoring they needed.

twosdai · on Feb 22, 2023

Bus factor doesn't really matter for a startup. Because literally everything early on is a bus of one, maybe two. Scaling is a good problem to have later on.

To your deeper point being that if you work at a startup to make your own life easier and those who eventually will work with you; write down what you're doing and get good at teaching/explaining process to people.

rektide · on Feb 22, 2023

Fwiw, bus factor seems like an extremely good reason to use Kubernetes.

One can absolutely invite in a lot of complexity, but if you take a reasonably well integrated starting point/KISS (and you absolutely can with Kube), there shouldnt be a ton of additional complexity.

And given how self-describing via manifests-for-everything kubernetes is, someone spinning up who already is sort of familiar with Kubernetes ought be able to get a good picture of what pieces are in play & the shape of how things are setup extremely quickly. Unlike choice-making in most devops, there's still huge commonalities & massive shared base. If you & another startup make totally different choices about how you will Kube, the superstructure will still be the same. Porting people in is way easier.

j16sdiz · on Feb 22, 2023

You always start with 1 in startup

moonchrome · on Feb 22, 2023

If that 1 is DevOps/employee you are doing things wrong.

hk1337 · on Feb 22, 2023

Caveat to that, IMO, your DevOps guy needs to make it so it can easily be understood and run by other people in case they get hit by a bus.

The last thing you need is for the new guy to come in and rewrite everything because it only really worked for the previous guy.

rektide · on Feb 22, 2023

Reasonable-ish counter concerns but way overinflated in each case & loses it's persuasiveness by trying overhard to sell fear.

Plenty of people run their etcd on their control plane. If you are indeed a startup you might even consider not having dedicated control plane nodes. The idea of side-cars taking up another 100% more resources is perposterous, absurd. Running ingress takes up some overhead, but it also allows you to consider running a wholeot less load balancers if you have multiple services, letting kube do all the routing; this can save a good chunk of change on infra costs if you go for it.

Things like errors devops, ci/cd are presented in the scariest possible terms. But it's not like these are non-issues without kubernetes. I've seen man-year of time go by building AWS Beanstalk and AWS ECS scripts, just to deploy stuff (build wasn't any better!). There's such a universe of mature & solid tools for Kubernetes that encompass a much more end-to-end handling. Plenty of gitops starter kits. Many of these you can try out in a day or less, see if you like the look & feel. And abandon and swap-out just as quickly. There's a much better base of options here than there is anywhere else.

asciii · on Feb 22, 2023

I also read the author's other article [0] and it seems that they are encountering engineers / teams who have not answered why they chose Kubernetes versus others options. Any initial project will have some legwork before any gets productive on it.

> Reasonable-ish counter concerns but way overinflated in each case & loses it's persuasiveness by trying overhard to sell fear.

My buddy uses K8s at their company for many clients(while they are small to medium sized) and they aggressively screen for the best engineers who can cut through the mud quickly and be productive with containers, Kubernetes, logging, etc.

So it can work if the engineering are able to say No and work with the tools / expertise they have.

[0]: https://thetechtrailblazer.blog/2023/02/14/dont-build-micros...

msm_ · on Feb 22, 2023

> Plenty of people run their etcd on their control plane. If you are indeed a startup you might even consider not having dedicated control plane nodes. The idea of side-cars taking up another 100% more resources is perposterous, absurd.

I run my personal blog using k8s on a single-node "cluster" with a low-end VPS server [0]. So when I hear how hard and expensive it is to host k8s I die inside a little. This actually lets me save money, because when I find a better/cheaper server I just migrate: I copy my data files, install k8s, apply my yaml, and I'm done with the migration of my 20+ hosted services in less than an hour.

[0]: Yes, this is obviously an overkill, I wanted to have some personal playground for k8s. And I like how clean my server is thanks to containerising everything.

rektide · on Feb 22, 2023

I too have been running single node k8s on tiny $5/mo vps and it works fine. Less than 500MB ram used for k3s.

It also takes unbelievably little effort & is remarkably secure & useful out of the box. There's no intra-cluster permissions setup, but if you're a startup & especially if most devs are leveraging paved-road tools or gitops for ci/cd, maybe just trusting your team is enough, at least for the first years.

Metrics, logging, postgres operator, lets-encrypt storage: all just worked. I had https webapps in under an hour. I did burn some significant time writing my own helm charts for dns (coredns & bind), but there were good existing options I would have been happy to use, but I wanted to learn helm & jave my specific idea done this particular way. Then my webapp had great dns integration too!

As you say, the whole state of the world is pretty readily exportable & re-loadable. If someone does really goof up, as long as volumes & backups arent wiped out (often shipped to s3 compatible endpoints by many tools), it's bad & there's downtime yes, but the road to recovery can/should be incredibly fast. If you have history over time for backups, be easy enough to spot what changed that made things go bad.

brian-bk · on Feb 23, 2023

There's so many of these "k8s is too complex" blogs I have been thinking about where they come from. I feel like if a person strictly has an application-developer background, they just don't understand how much kubernetes has 'solved' from the time before kubernetes. Not that k8s is perfect, but they've never had to think about app runtime because ops teams helped. It can be complex if you had never finagled with webservers, config management, app segmentation, internal DNS, scaling, etc.

As for trying to get at the real problem, perhaps there's room for tools that could make k8s better and understandable for these app-focused engineers (something like lens, but without stupid licensing).

erulabs · on Feb 22, 2023

Articles like this always sound the same: That standard tool is too complex - you should roll your own home-brewed solution with no docs instead!

No thank you. Standardization brings its own simplification, even if the one standard is more "complex" than a hodgepodge of non-standard tools.

"If you add Kubernetes to this setup you will need to make your machines double the size". What? No.

"Kubernetes is not stable". This article is ridiculous.

roncesvalles · on Feb 22, 2023

Every single point in the article is generally wrong. The author clearly doesn't have experience with modern Kubernetes platforms like EKS, Fargate, AKS, GKE, Rancher, Openshift et cetera that solve many of the stated issues.

While in general it's true that startups should be wary of expensive cargo cult infrastructure designs, a) if your application doesn't run on a managed PaaS, it should probably be running in a container, b) if it's running in a container you probably need a container orchestration platform, c) although you can get away with using simpler or fully-managed container orchestration technologies early on, it won't be long before Kubernetes starts to look appealing.

stonemetal12 · on Feb 22, 2023

There is a saying in Kubernetes "containers are cattle, not pets". That is from a software perspective, the actual saying should be "hardware is cattle, not pets, and your containers are fleas on the cattle's back". If you aren't running on enough hardware for servers to be cattle not pets, then Kubernetes is solving a problem you don't have, and probably won't have for a long time.

msm_ · on Feb 22, 2023

Well, it's both. Kubernetes lets you "just build and deploy" your containers without worrying about micromanaging your servers.

Is it complicated? Sure. Are there simpler solutions to that problem? Yeah, for example docker swarm. But I like k8s, it does everything I need, it prevents "server rot" because the state is explicit, and it's not that hard to admin if you know what you're doing.

bradwood · on Feb 22, 2023

That saying predated k8s.

spmurrayzzz · on Feb 22, 2023

While I mostly agree with the implied sentiment of "you probably are better off without k8s", this article makes no persuasive (or at least not objective) arguments to that end. I worry a bit for the HN flaming that is likely imminent here.

Suggestion for OP: expand on your ideas with concrete examples instead of hand-waving in the abstract. There's plenty you can cherry pick from (https://k8s.af has a bunch)

e.g. You say at one point

> "It makes perfect sense with all the network encapsulation over the cloud network virtualization over the real network well things tend to get crazy"

I'm not sure what that sentence is meant to convey. k8s networking is indeed complex, but what is "things tend to get crazy" supposed to mean? It would be helpful if you provided examples for what you seem to treat as obvious for the rest of us.

JamesSwift · on Feb 22, 2023

If you know kubernetes, it lets you punch _way_ above your weight in terms of running lean.

BlackjackCF · on Feb 22, 2023

This. Ran K8s at a startup and knew it well. Really helped us along in terms of shipping velocity as a company.

MuffinFlavored · on Feb 22, 2023

What constitutes knowing k8s well enough for it to be a net positive at a startup?

For example in my mind, whether I tie a bunch of services and a database or cache together on a virtual machine with Terraform, Docker Compose, or k8s yaml (or kustomize or helm charts or whatever), where do the k8s benefits come from beyond that? To me my use case seems entry level/basic, but I'm not sure where the more advanced features come into play.

JamesSwift · on Feb 22, 2023

Its not even necessarily that you know k8s. Its more that you know and understand the underlying ops principles / components. K8s is just a formalized layer of putting those together. So, if you can do all teh things in AWS that k8s provides for you, e.g. know when/why to setup a load balancer, or blob storage, or network storage, or domain routing, etc, then k8s lets you do those things in a more efficient way.

Bonus points if you wrap it with something like pulumi, because now as a solo dev you can massively leverage your knowledge to be able to handle a massive amount of ops while keeping up with app development.

MuffinFlavored · on Feb 22, 2023

> e.g. know when/why to setup a load balancer, or blob storage, or network storage, or domain routing

When it comes to performance and cost, I've been able to basically get a $100/mo OVH dedicated server and do everything I need on that.

I obviously understand why somebody might need blob storage, load balancers, etc.

I just haven't ran into the use case I guess? I once worked on a project where like 80 people during peak hours needed to do some relatively "write heavy" stuff to an API (1 row per scan of a barcode per like 500 orders a day for an e-commerce site)

I'm guessing that's not a good use case for k8s?

JamesSwift · on Feb 22, 2023

Yeah I mean if you are aiming to optimize for running on single node clusters then something that is first and foremost an efficient bin-packer isn't a great fit. I'm also assuming the alternative is running on the cloud rather than a colo/dedicated server.

EDIT: to be clear K8s is not good for "POC" phase of development. Its definitely good for "ok this is a worthwhile product and we are now officially building this out for production".

MuffinFlavored · on Feb 22, 2023

I guess a better question is, how far can a single node get you performance wise (I guess you have to ignore disaster recovery/automatic rollover based on regions) without needing k8s? Like, "really far"? What's a good rule of thumb on when you should introduce k8s?

JamesSwift · on Feb 22, 2023

Its highly variable based on the specific workloads. Its going to be bottle-neck based. Thats different for everyone. And again, for moving from dedicated to non-dedicated its more of a cloud vs raw metal consideration. K8s is a "better" way to do cloud for most of the use-cases we are debating in this comment section.

MuffinFlavored · on Feb 22, 2023

> raw metal consideration

devil's advocate but couldn't you technically (if you were trying to keep an entire software business's cost under say... $500/mo) run k8s (or k3s or whatever) in a production setting on a single node "raw metal"?

i know that's frowned upon but, i feel like performance/dollar comparsion, moving it to the cloud + k8s gets wildly more expensive quickly?

freedomben · on Feb 22, 2023

This and all the other similar posts that way overconfidently state "you don't need k8s" seem to miss the most important factor IMHO: Do you know it or would you have to learn it?

If you have already been using K8s and you build a side project or launch a new startup, then K8s is probably a great choice for you even if you only have one Deployment and Ingress route. I can take a new app (assuming it already runs in docker) and have it running in a brand new k8s cluster is 20 minutes. I could have it running on a static ec2 instance/linode/droplet/whatever in 30 minutes, but then you have to solve a ton of problems yourself such as CI/CD and deployment automation, etc that k8s gives you for free. K8s is a great choice if you already know how to use it.

This also seems to miss that for startups the best path to k8s is a managed k8s solution. Linode (for example) has one that is wonderfully easy and affordable, and will scale with you a long way.

Lastly, as a person making decisions on tech to use, you are investing in your own future as well. Learning k8s is one of the most useful skill I think a dev can do, behind things like learning bash and standard *nix tools like grep/sed/awk.

Edit (and self-plug): speaking of awk, if you're interested in learning it I put together a free "course", originally a talk I gave at Linux Fest Northwest on awk that I've gotten a lot of positive feedback for. Github: https://github.com/FreedomBen/awk-hack-the-planet Youtube: https://youtu.be/43BNFcOdBlY

JamesSwift · on Feb 22, 2023

As I said in a sibling comment, its not even necessarily that you know k8s or not. Its if you know the underlying _ops_ or not. If someone understands the mechanisms that are being abstracted in k8s then its mostly just a mental mapping of terminology to figure out k8s (not 100% true since there _are_ some k8s idiosyncrasies, but true enough in practice).

dustedcodes · on Feb 22, 2023

What a load of rubbish. If you already know k8s then you pick a managed cluster like GKE has on offer and everything is set up and up and running in less than a day for less than $200/month.

JamesSwift · on Feb 22, 2023

Exactly. Its a larger upfront cost but thats mostly a static chunk (and to be fair in terms of apples to apples, the vast majority of startups are not running an equivalent system without k8s). I would not recommend managing the control plane however, just bite the bullet and go with a managed offering.

a_c · on Feb 22, 2023

Good call out. There are a lot of unnecessary complexity plaguing the tech scene.

Scalability, microservice, SRE, SPA, unit testing, design system, Restful/graphQL everything. Not saying there isn't a place for them, there is. But probably not at day 1 when the thing you intended to make your living didn't even exist.

ch4s3 · on Feb 22, 2023

I agree in general, but think that unit testing has a place even in small scale settings. Clearly chasing 100% coverage is dumb, but if you have tricky business logic you should clearly unit test it and automate running the tests. The cost/benefit ratio seems super obvious to me on this.

ansgri · on Feb 22, 2023

Unit testing is an obvious choice for something like a parser or algorithm library, for common business logic integration tests provide most of the benefit, and then some. Also, unit tests slow down development as they need to be refactored alongside code, while integration tests can use a more stable interface.

This is typically true at smaller scale, one smallish team; when build and test times or team isolation becomes a concern, it's time to rethink your testing strategy.

BizarreByte · on Feb 22, 2023

There’s definitely a value to unit tests in the right situations. When I start a new project I don’t aim for any specific coverage percent, I just write them where I know for sure I need them.

Right now I’m working on some reasonably complex reclusive logic in a personal project and unit testing that has saved an unbelievable amount of time when I inevitably mess something up.

JamesSwift · on Feb 22, 2023

> not at day 1 when the thing you intended to make your living didn't even exist

This is the most valid criticism of k8s use. Do not skip square one and roll it in k8s when you haven't even built the product POC and tested the consumer appetite. Definitely consider it after getting validation of the POC and deciding to build it out "for real".

intellectronica · on Feb 22, 2023

I manage an engineering team at one of the big 3 cloud providers advising startups on their cloud deployments. We often get earlier stage startups asking us for help with building their deployments on Kubernetes. We often suggest they consider a simpler solution, so that they can focus on building their product and not worry about managing k8s. Some of them do. Most stick with k8s. Rather than assume that they all don't know what they're doing, I tried to take their point-of-view and understand why they're making this choice. What I learnt makes a lot of sense:

1. You won't get fired for choosing K8S. Tech leads of startups, just like anyone in a position of responsibility, need to be a bit conservative with their choices. As CTO, if something went wrong with your deployments, or you're having scaling issues, the last thing you want is to have to explain to your CEO or your investors why you've chosen a less popular and recognisable product.

2. Startups are conscious about vendor lock-in (or any kind of lock-in). That makes a lot of sense - early in the days of a startup you don't really know what direction it will take, so you want to keep your options open. Make a choice that limits your freedom later down the road and it could be fatal to your startups as it grows and adapts. It's reasonable to agree to pay a price for this flexibility.

3. Hiring good engineers is always hard, and you don't want to make it harder for yourself by choosing technologies that are less familiar. A common objection would be that basic technologies like containers, VMs, and serverless functions are easy to adjust to for most engineers, but as a startup you really need to optimise for the common denominator or you risk wasting time on onboarding engineers on tech they are not familiar with.

4. Startups often don't spend as much time and energy on familiarising themselves with every new technological development and option. They are busy building a product. It could be that there's another, better option for them, other than the most recognised brand in the industry, but if they spent their time on learning and evaluating all these options that's time they wouldn't spend building and shipping features to their users.

5. Startups are optimistic by nature. They have to be. Tell them that they don't need the complexity of K8S at their current scale and they'll just remind you that they're aiming for ∞X, any day now. Most of them will not reach even 10X any time soon, but they have the right attitude to behave as if they will.

twosdai · on Feb 22, 2023

This is spot on. Source: cto at a small tech startup.

theptip · on Feb 22, 2023

I disagree with most of the claims presented here as facts. I believe that they are true experience reports for some companies, but I don’t think they generalize as strongly as claimed.

No, k8s doesn’t double the number of machines you need. You don’t need pods running more than what would be on a dedicated node.

No, you don’t need a full time engineer to run your cluster.

No, k8s isn’t unstable, I have used it in prod since v1.5 and have only seen a small handful of issues. Recent releases are boring.

I moved my startup onto k8s about 10 months in, and was our only infra engineer through series B raise @10 engineers. It took a small fraction of one expert’s time. This was widely regarded as a great idea by everyone who came later.

Maybe if you don’t have a clue how k8s works it’s going to cost that much time.

And there is the key advice where I probably agree with the author on most situational recommendations (while disagreeing on the generalization): don’t learn k8s to deploy at an early stage company. But, if you know it well already, it’s a fine choice that is unlikely to hurt short-term and might help a lot later.

Things you get: infra-as-code, with a (subjectively) way nicer API than Terraform, that you can plausibly run and test on your local machine. Easy scaling setup for unexpected growth spikes. Review apps within reach if you need them. Ditto GitOps. Managed infra because you are using GKE or EKS like a sane person (rarely need to think about patching your nodes; GKE patched the Spectre/Meltdown vulnerable for me weeks before the public announcement). Flexibility to easily run any new service (say, you want to spin up a Jupyter Hub cluster).

There are so many infra options that work for small companies that don’t have real infra challenges; if you don’t know one well already, just use Fly/App Engine until that gets too expensive. Or a single VM running a docker container; that gets you surprisingly far.

Meta: for a startup particularly pre-PMF, you should minimize the amount of learning you are doing and just iterate as fast as possible with the stack you know best. Even post-PMF this holds for quite some time until you start to hit cost/scale pain.

drewda · on Feb 22, 2023

It depends upon what your startup is doing. Maybe Kubernetes is overkill for a basic "CRUD" web app backed by a relational database and some object storage.

It's very helpful at our business that runs a variety of web services that are powered by batch data workflows. Kubernetes enables us to spread both the process workloads and the batch workloads across sets of servers, with redundancy.

For others in a similar situation, I'd highly recommend the combo of Kubernetes and Argo Workflows. We wrote a blog post about one of our workflows at https://www.interline.io/blog/scaling-openstreetmap-data-wor...

iLoveOncall · on Feb 22, 2023

"I saw people moving from, the following tools have these benefits and these drawbacks, to it is trendy so lets do it."

What is that supposed to mean? I was interested in reading, but when the very first sentence is impossible to understand it promises a painful read.

lloydatkinson · on Feb 22, 2023

I saw people moving from, "the following tools have these benefits and these drawbacks", to "it is trendy so lets do it".

jbnorth · on Feb 22, 2023

“I saw people moving from ‘the following tools have these benefits and drawbacks’ to ‘it is trendy so lets do it’.”

dpkirchner · on Feb 22, 2023

No offense intended[0] but if it takes 6 hours to debug a coredns issue why would it take less to debug a similar issue with a different dns?

I can get down with not using operators. If you're unfamiliar with k8s, adding another layer of something you're unfamiliar with makes little sense. But regular boring k8s is pretty straightforward stuff that often drives existing, established tools like ip/nftables, nginx, etc.

0: seriously, I've spent what I felt was excessive time trying to resolve bugs

kube-system · on Feb 22, 2023

If you have complex orchestration needs, then none of these five criticisms apply, as any one-off solution would be even worse. If you don't have complex orchestration needs, then there's no need to use k8s.

> Lets consider you are a small to medium size startup your application consists of a couple of backend services a database a caching service and a load balancer.

This is very squarely on the "you don't have complex orchestration needs" side of the spectrum.

hellcow · on Feb 22, 2023

Totally. Most companies will never need k8s. But tons of companies deployed it when they shouldn't have.

Now we're on the MongoDB side of the hype-curve, so people are realizing that k8s is a bad fit for managing 10 servers and a DB in the cloud.

bluehatbrit · on Feb 22, 2023

This sort of article is spot on for lots of standard CRUD apps, but doesn't mention the fact that there are plenty of cases where k8s (or similar) really would be a big benefit. If your application requires a lot of scaling up and down to be viable for instance, k8s can be a good fit. I think a better article title would probably be "you shouldn't default to using kubernetes as a startup".

flanfly · on Feb 22, 2023

How do I get continuous deployments without k8s? I value being able to deploy new code quickly, easily and fearless, especially in high speed low drag environments like startups. I don't see how I can do GitOps like workflows with any other tools that have the same large mindshare and breath of commercial, managed offerings.

acedTrex · on Feb 22, 2023

you can do continuous deployments with anything lol, hell you can do CD with a runner and an ssh key lol

bluehatbrit · on Feb 22, 2023

I'm not saying this is right for you or anyone else, but I just use a repo with some ansible / terraform in it. The repo has a CI job which runs the IaC and jobs done. Kubernetes does have a few tools which take this out of your hands but I guess the trade off is a few scripts you have to manage vs the complexity of kubernetes.

ly3xqhl8g9 · on Feb 22, 2023

As a start-up*, you shouldn't use Kubernetes managed by GKE, EKS, etc; you should use bare metal Kubernetes: "server rack" somewhere near your desk, 3 virtualized nodes with Ubuntu, MetalLB as load balancer, local-storage StorageClass with cron backups on another machine, Prometheus with Grafana for monitoring, CertManager DNS01 with Let's Encrypt for SSL, Contour/Traefik for routing, self host Docker registry, self host CI, Minio for object storage, Mongo/Postgres/etc. for database, Redis for caching. First setup time: 1-2 days, maybe one week to work out the kinks. Initial cost: ~$500 for a refurbished Dell/HP server, or go extreme with 3 RaspberryPis. Monthly cost: local electricity price, internet monthly fee. Current uptime with the above setup: 139 days (had a power outage 139 days ago, previous uptime was around 200 days).

* start-up: endeavour started with negative, $0, or close to $0 in bank counting time to market in years; not a start-up (maybe a start-middle): company with millions in VC money hoping for time to market yesterday.

JamesSwift · on Feb 24, 2023

I wouldn't recommend anyone manage the control plane except after hitting massive scale and being able to dedicate engineers to it full time. For a startup, I would doubly recommend against it. Thats the fastest way to give back all the time you've saved in using k8s in the first place. Yes its more expensive, thats the tradeoff and I 100% believe its worth it.

wikibob · on Feb 22, 2023

Is this satire?

A startup should ruthlessly prioritize doing only the one single thing that drives revenue, which is not bespoke infrastructure configuration.

ly3xqhl8g9 · on Feb 22, 2023

Check the given definition for start-up. A start-up doesn't have $200+/month for managed Kubernetes, cloud storage, cloud CI/CD pipelines, CDNs and all the other niceties. Also, not all start-ups are born in Silicon Valley or nearby, nevertheless 90% of all start-ups fail, some of them because they waste money on cloud infrastructure while hallucinating time to market tomorrow and are unable to be lean, duct taping and WD-40-ing (soon ChatGPT-ing) some disposable infrastructure for pennies.

JamesSwift · on Feb 24, 2023

I can guarantee you that they are wasting more than 200/month of an engineers time and mental energy to manage the bare metal setup vs the cloud managed offering. Its penny wise and pound foolish.

ly3xqhl8g9 · on Feb 25, 2023

I am a one-man start-up, well, trying to be for some years now, and I run a similar bare metal setup as the one above. Haven't touched it in 142 days and it costs next to nothing vs. the $200/month that I couldn't afford even if I wanted.

Again, maybe the word 'start-up' has too many meanings. Sure, once you have 1,000+ paying clients you will be able and even you will need to rely on cloud infrastructure, but 90+% of start-ups never reach 1,000+ paying clients. And again, it's not that complicated, pretty much everything is a Helm chart, and when you are merely trying things out, not even close to product-market fit or monthly recurring revenues, you can afford to have the cluster down for days on end: there is no one to care.

wtfbbqsms · on Feb 23, 2023

These are all industry-standard tools that you could also run outside k8s in docker-compose, I fail to see anything bespoke here.

If your team cant recognize some of the most common proxy, s3 storage, database, operating system, cron job, backups, load balancing tools from a few feet away or manage a server then you have a problem with vetting candidates.

mattnewton · on Feb 22, 2023

How does they author (or anyone else) propose that you do autoscaling without locking yourself into a single cloud’s apis? K8 is not something we want to invest in but it seems like the only cross-platform game in town, (which we want to take advantage of gpu prices and credits)

spmurrayzzz · on Feb 22, 2023

If multi-cloud is actually something an org cares about, which I'll concede ought to be exceedingly rare for an early stage startup, there ways you can design your infrastructure to mitigate that problem.

There is some up front work of course, but there's a ton of cloud provider-agnostic library tooling out there these days that you can use to build something like autoscaling (pulumi, terraform, et al.). You might not end up with full feature parity of what k8s offers, but you'd get 90% of the way there.

runlevel1 · on Feb 22, 2023

Using Pulumi and Terraform don't really make your setup cloud-agnostic. The config becomes cloud-specific very quickly due to the differences between the providers.

Kubernetes, in and of itself, only makes a few core things somewhat cloud agnostic (compute, LBs, and volumes). You can use those primitives to run cloud-agnostic alternatives to managed services within your cluster, but for most startups that's going to be a premature optimization.

spmurrayzzz · on Feb 22, 2023

Definitely agree that the multi-cloud thing is a premature optimization for sure.

But k8s isn't cloud agnostic either, at least not if you hold it to the same standard that you're holding pulumi/terraform.

Many of the lower level native abstractions, like CNI plugins, aren't 100% interchangeable and don't always just work depending on your use case. There's a reason AWS had to build its own VPC CNI plugin to get EKS fully functional across all of its networking services (particularly any service involving peering like DirectConnect etc).

julianlam · on Feb 22, 2023

The real question here is what makes you think you need a product that automatically scales?

If you have to think about it, chances are you don't need it. It's often the case about a lot of optimizations. I've had clients ask for help sharding their database, when that's almost never the correct course of action.

paulgb · on Feb 22, 2023

GP mentioned GPUs. GPUs + bursty traffic = either you autoscale, or you burn through a bunch of credits and VC cash on idle GPUs.

mattnewton · on Feb 22, 2023

This, we burn a lot of cash for every idle machine. Manually scaling up and down resulted in both worse performance and a lot more cost.

thundergolfer · on Feb 22, 2023

For GPUs startups can now leverage serverless GPU cloud providers: https://ramsrigoutham.medium.com/the-landscape-of-serverless....

Much simpler than setting up K8s with scale-to-zero autoscaling nodegroups.

mattnewton · on Feb 22, 2023

Will have to revisit this, when we originally evaluated banana.dev and similar platforms they lacked the ability to mount a network drive to quickly load a bunch of model weights after spinup, which is a weird requirement we had with a previous pipeline that we don’t need with other products.

thundergolfer · on Feb 23, 2023

Modal.com has a network filesystem feature called 'shared volumes'.

Disclaimer: I work for modal.com, but honestly Modal is the most comprehensive of the serverless GPU platforms because it didn't start as just a 'serverlessly run the latest open-source model' platform, it's aiming to be an end-to-end cloud platform.

LunaSea · on Feb 22, 2023

I believe that being cloud independent is a bit of a pipe dream for startups.

Sure, Kubernetes gives you some independence but then you still depend on a lot of vendor specific services like S3, RDS, SES, SQS, etc.

kube-system · on Feb 22, 2023

You don't have to use those services, and there are some abstraction you can do to make things portable. For instance, each platform may have its own block storage, but you can have different storage provisioner configurations for each platform so that you can move your application smoothly between them.

LunaSea · on Feb 22, 2023

> You don't have to use those services

Sure, you don't have to but that is the whole point of Clouds.

Otherwise, using regular old-school hosting providers is much, much cheaper.

> and there are some abstraction you can do to make things portable.

I would disagree with this point.

I'm sure that there are some APIs that try to abstract out the Cloud service used but in the end you are tied to the pricing and technical specificities of each service.

If I want to use a file storage service, I need to know how to authenticate to it, handle the access control, host static sites with it, handle CDN integration, configure access logging, etc.

All of this is possible in multiple cloud services but will be different for each provider. That is sufficient enough for it to be a leaky abstraction.

kube-system · on Feb 22, 2023

There's plenty of value in just the block storage and compute at the large cloud providers, and these are not difficult to abstract. I know because I've done it. Yes, some of the abstractions are a bit leaky, but all those leaks are variables in our helm chart. My application code is written so that it doesn't care where it's running, nor does it need to know.

> in the end you are tied to the pricing and technical specificities of each service.

That's one of the primary driving factors behind our decision to design our application to be portable.

mattnewton · on Feb 22, 2023

Ah, by cloud independence i mean we can easily switch where the machines with gpus come up, not the whole stack. We might give up on this flexibility but so far it seems like it will save us a ton of cash.

LunaSea · on Feb 22, 2023

If you're simply using EC2 hardware I agree but then you might as well go for a lower-level hosting provider which will be much cheaper.

The point of Cloud Services is to provide all these additional services.

If you don't use those services then the flexibility is relatively trivial to achieve.

kube-system · on Feb 22, 2023

I don't use cloud providers because they have their branded value-add services. I use them because they're reliable, automated, and they have APIs. I can't point terraform at whatever random IPMI a traditional hosting provider gives me. The last time I spun up a new dedicated instance at a traditional hosting provider, it wasn't an API call. It was a few emails, an invoice, and a week wait.

mattnewton · on Feb 22, 2023

Gotcha. That’s exactly the problem I am trying to solve, using just the ec2 style hardware, and being able to spin up on smaller clouds like coreweave to take advantage of availability and prices.

BadassFractal · on Feb 22, 2023

Being locked into a specific cloud is usually nowhere near the top of threats to a startup.

mattnewton · on Feb 22, 2023

True; I really mean in terms of being able to quickly move the worker machines to a new cloud to take advantage of price differences. So far we have manually moved around which is painful but saves us a bunch of money.

frankfrank13 · on Feb 22, 2023

Do you actually need autoscaling? Or is it something you're worried you might need one day

mattnewton · on Feb 22, 2023

Yes. 100% need to scale up the number of gpu workers and scale them back down based on request queue size, which is bursty. Otherwise we could spend 5 figures/month on gpus doing nothing for half the day and then still have unacceptable waits during traffic spikes

mikedelago · on Feb 22, 2023

The implication of using K8s to avoid vendor lock-in might also lead to self-hosting your databases and other supporting software.

In this case, you're just standing up normal servers in the cloud, which is pretty trivial in all of the major clouds' terraform providers. As long as the OS on the servers is standard (say, CentOS or ubuntu), your Config as Code stack (think ansible/puppet) won't really care which cloud provider you're using.

If they need to scale, they can add in another server to their infra as code, and make sure the config as code can handle more than one server. Not really that difficult with a little experience with terraform and ansible. There's also room for Auto scaling groups, which all of the major cloud providers support as well.

mattnewton · on Feb 22, 2023

Gotcha, it sounds like terraform by itself might be enough too handle the autoscaling groups, that’s mentioned a few times here and something I was looking at with k8.

alykafoury · on Feb 22, 2023

For all the people who shared, I am humbled and thankful. I am trying to make it sane for small business to work with the currently quickly evolved tech stack this is a part of mindset I am trying to reduce costs and stay agile and I am doing that in my own business.

bradwood · on Feb 22, 2023

I agree with most of this except the part at the end that advocates "normal hosting".

Don't do that either. Just go all in on a hyperscale cloud's serverless offering.

No patching, no servers, no upgrades, no networking, so much easier to manage than even a single VPS.

burcs · on Feb 22, 2023

I completely agree, there are enough PaaS out there that run containers, that are much better starting out points, that is until you need and can afford all the intricacies of k8s.

frankfrank13 · on Feb 22, 2023

Basically just an argument against premature optimization. Which is fine. In my experience, small startups tend to use whatever arch. they think is fun + gets spun up easily.

mc4ndr3 · on Feb 22, 2023

You can go beyond Kubernetes into serverless, but other than those two, your tech department would be deviating from industry standards and paddling upstream by hand.

invalidname · on Feb 23, 2023

I think https://doyouneedkubernetes.com/ said it better...

rvz · on Feb 22, 2023

TLDR: Because if you did, you will likely blow through that VC cash you were given, then you'll ask for yet another round of funding.

Maybe they will throw more money at you after blowing it up on K8s operational costs for your startup. But what if they say no this time? That is always a possibility and even today is a high likelihood than previous years.

Perhaps not using K8s as a startup might save your startup in speed-running into bankruptcy, especially when you are not making any money and chasing free users for years.

deathanatos · on Feb 22, 2023

> If you add Kubernetes to this setup you will need to make your machines double the size just the extra side-cars that you need make things functional add 3 machines to be the control plane of Kubernetes, add 3 more machines for etcd cluster, add multiple internal services like ingress controllers and log and metrics collection.

No, you really don't. I run multiple 3-node (i.e., the bare minimum to survive an AZ loss) clusters, and they are just fine. We run multiple workloads on them. It's three entire VMs of RAM and compute. If that's not sufficient, you need to dig into which of the services you're running on it are eating too much of whatever resource is falling short, and conclude through debugging whether that's a bug, or you need to scale up. That's the exact same procedure as a service on a VMs.

If your counter-example is not doing "log and metrics collection" — that's an apples-to-oranges comparison. I run both k8s & raw VMs, and … both run these "side-cars". Not only do both run it, they're often the exact same process.

Even the ingress: in most "service is deployed to VM" scenarios I've seen, often the service is fronted with nginx acting as a reverse proxy. That's not only fulfilling the same role as something like ingress-nginx, it's practically the same process.

Most startups should probably go with a managed k8s, in which case they're likely not going to even have etcd in the cluster, nor more of the rest of the control plane. (IMO: don't choose AKS, but IMO the problem there is the A, not the K.)

> You already are paying double of what you would have if you just excluded Kubernetes out of this because in a medium to small business there isn’t any feature of Kubernetes you are really using.

You shouldn't be, because again, I think the k8s example is grossly over-provisioned.

But I also think k8s's ability to move workloads between nodes easily lends it an advantage here: if I need to change the underlying makeup of the VMs, I can. That means that I can shrink-to-fit them. In my career eng spinning up VMs are far more likely to over-provision than to right-size or under-provision.

> The normal setup would cost any CI/CD engineer a couple of days to setup and normally not more than an hour or 2 to debug infrastructure related issues.

> Deploying on Kubernetes usually involves rendering templates, setting CI/CD runners, generating on the fly kubeconfigs for security and so much more that setting up CI/CD would take not less than a month to have something that is production grade ready.

I've spent time on both. "Normal" (to a VM) CI/CD flows have pathological corner cases dependent on the state of the VM. Flux or Argo is not hard to set up, and the benefit of YAML is reproducibility: pods consistently start from a clean state.

Much of the deploy work I've done with deployment scripts on VMs is re-inventing features that are part of k8s.

YAML generation is straight-forward from the need to have it be configurable. One you understand the need for it … IDK what the problem people have here is. "setting CI/CD runners"?

> The idea of deploying that many components together with different versions and different maintainers makes it impossible near to have an error free Kubernetes cluster, from network CNI to controllers to custom operators things tends to breakdown most of the https

… this is always going to be a problem…? On a VM, I've got nginx, whatever handles certs, logs, metrics, security team's spyware^Wprotection etc. … all from different vendors, at different versions.

"things tends to breakdown most of the https" ???

> It makes perfect sense with all the network encapsulation over the cloud network virtualization over the real network well things tend to get crazy.

???

> To be able to keep track of the versions of all components deployed on your cluster

(same complaint)

> And it should you are opting in for a setup intended for huge enterprises that have huge budgets and don’t mind having Kubernetes because it is actually benefiting them.

This is begging the question.

> Bigger attack surface

While the API server is certainly a non-zero attack surface, this blows it out of proportion. People run salt on VMs, … see it's security track record: k8s fairs a far bit better. I've witnessed many orgs not set up some for of SSO, and try to manage VM access via SSH keys, and old keys do not get cleaned from nodes.

The standard tools work for the API server: you can firewall it off, if it scares you?

I would caution a young startup against running k8s: if you want to do it, you either need to have someone familiar with it who can teach, or eng who are capable of learning. If you have the all-too-common eng who don't like learning and who aren't familiar with it, you are going to have a bad time. But it isn't hard to learn: k8s has great documentation. But I've witnessed a lot of frustration from people who haven't RTFM.

JamesSwift · on Feb 22, 2023

> don't choose AKS, but IMO the problem there is the A, not the K

What were your issues specifically? I've used the managed offerings on DO and AWS but not Azure. Its a possibility in the nearish future.

hnthrowaway0315 · on Feb 22, 2023

We even use K8S in local dev env...wth?