Hacker News new | past | comments | ask | show | jobs | submit login
Infrastructure Mistakes Companies Should Avoid (firstround.com)
227 points by takinola on Sept 30, 2016 | hide | past | favorite | 75 comments



There are basically three tiers of startup infrastructure needs, dependent business model. Your decision process should be driven by which one you are operating in.

(1) Consumer ad-supported. Pinterest, Instagram, Buzzfeed, etc.

Your CPM is going to be pretty low, so you probably want to run your own infra -- it's all gonna come down to the margins. Dropbox notably transitioned to running their own infra after years on AWS.

(2) Freemium software with heavy data ingestion needs, eg enterprise messaging or CRM. Slack, Streak, etc.

You have pretty high value per customer, but you still have a ton of data streaming in to your system all the time. Probably use a public cloud provider, but monitor your bill somewhat carefully lest it get out of control.

(3) Typical B2B workflow SaaS, or very high CPM consumer site. Airbnb, Zenefits, Gusto, etc.

You store a relatively low amount of data, on the order of megabytes per customer if not less. Use public cloud infra and make it widely available. Eliminate "how much will this increase our AWS bill" from discussions about eg event sourcing, proposed ML experiments, etc.


At my day job, we receive a real money for processing each transaction, which adds maybe 1kb of information to the database. This makes the scaling story laughably easy. By the time we're maxing out the biggest database you can buy, it's IPO time.

For most of my life I worked in games, which is almost the exact opposite problem. Tiny CPMs, insane traffic. Multiple providers of cheap bandwidth have gone bankrupt and left me high and dry; but the savings were worth it, and I continued to chase those deals. It was the right call.

Engineers love to design architecture that (they believe/hope/pray) will scale to Uber sizes. But if you're not having a conversation with the business as to whether that's a sensible goal, it's negligence bordering on fraud. You aren't being paid to needlessly teach yourself cool new technology to solve imaginary problems.

Which is all to say: ignore parent's advice at your peril.


> it's negligence bordering on fraud

That's the first time I've heard designing scalable software called fraudulent. I'm not sure it means what you think it does. ;)


Parent spoke about 'sensible goal' that is needed by business. If engineer delivers something that is not sensible but overengineered, then maybe it could be called fraudulent.

Business needs wheelbarrow, receives a tesla instead.


Agreed, there are different solutions that make sense for different businesses. The smaller your margins, the more important it can be to run your own stuff. Or maybe you just want to cut out a layer of someone else's profit from your own revenue to give you more flexibility in your business. If you can get costs to $5/unit at the same margin when everyone else in your industry is $10/unit, that can be a big motivator.

It also depends on how your service is positioned to grow. If you have any sort of reliable, foreseeable growth then you can save money by building infrastructure yourself. You are paying for Amazon's overhead to allow you to spin up any instance anywhere at any time (and there's no guarantee it will really be there) -- even if you're buying hardware 6 months ahead of when you actually need it, as long as you're growing and not missing the mark too badly you can do a better job on cost.

Most importantly: are you planning for success? If you sign a giant customer, will it help you grow your margins by getting you to better economies of scale with your vendors and your expenses? Or will it crush you because now your burn rate just doubled? There are economies of scale available with AWS, but they aren't great and you have to be pretty huge to get anything meaningful.

Two of your examples stuck out to me:

Dropbox was never entirely run in AWS. That complicated things operationally to a degree, but ultimately made it a lot easier to move pieces of the service in or out of AWS as it made financial and technical sense.

Pinterest was built in AWS. Since they waited so long to start seriously considering alternatives, they're in a tough spot. Migrating out will require a substantial capital commitment (not to mention engineering and product commitments), and they'll have to pay for both environments during what is liable to be a lengthy cutover period. You have more resources to leverage, but the scale of the problem just gets harder over time.


The problem with onsite data centers is that it's rare for the total cost to be tracked effectively, they often don't include salaries for example.

Additionally to say that building something outside of the cloud ensures you're cloud agnostic is too simplistic. There are many more shops that would like to move to the cloud to reduce their costs, but can't due to on-site lock-in. They then run into much more expensive problems around expanding datacenters, network or power limitations, or staffing to maintain physical machines which have real world MTBFs you have to do something about when they fail.

I think the issue isn't as much lock-in as much as apps being designed poorly, reliant on a specific feature that's not necessary, but then means their app will only work in one cloud as a result. However in the cases where there is no way around that feature, then you're getting a huge benefit of an instant solution so you can focus on your product rather than infrastructure. And if that cloud does provide a single killer feature you need, then just run that feature there.


I agree that there is no one right answer for all cases. Companies have definitely implemented their own datacenters poorly and been trapped as a result. You can avoid that fate, but you have to hire right and plan right.

In my experience, the difference in personnel between a service built in AWS versus one built in datacenters isn't wildly different. If you have a five person team for the former, you might need a six or seven person team for the latter. You aren't dealing with hardware and vendor problems in AWS necessarily, but it's also not as simple as 'just run your app and forget the rest' (at least once you start to scale up).

Again though, this is if you're smart about how you build the datacenter. There are a lot of tricks of the trade and a bunch of gotchas that aren't necessarily apparent until you've experienced them for yourself. Hiring experience here is the key.

While you don't necessarily want to limit yourself to the lowest common denominator of cloud providers or your own hardware, any time you make use of something unique you need to make a decision. Is it worth getting locked in? What will you do if that solution goes poof in the night?

To use a real recent example: what do you do if AWS runs out of X1 instances and you really really need some more right now to do your work? It sucks to get into a situation where you have no control over the outcome and even money can't fix it, either in your own datacenter or in AWS.


You just end up wasting your time maintain relationships with hardware vendors, troubleshooting low level issue, upgrading firmware. It wastes a lot of people's time in the company, not just the team, and you're building one off tooling to work with some vendor's half broken management API. No thanks. :)


Sure, that happens. If you can save millions of dollars a month and move your business from non-viable to viable, I think it's a good tradeoff. That's what I do for a living, so I see value in it. Not everyone should have to deal with it though.

Also, AWS is not problem free. When low level issues (like a firmware problem) strike, which they sometimes will, you often cannot identify or fix the issue yourself. AWS instances still run on that same shitty hardware, and while your odds are better that Amazon figured out most of the issues before you encountered them, your odds are also worse that they tested your particular special use case.

So you can wind up playing the 'debug by support ticket' game. Check everything on your end. Check your deployment times for correlations to the problem. Finally give up and reach out to AWS support. Insist that the issue isn't on your end. Find out that the issue can't be resolved until their deployment next week. It's all understandable given the scope of what Amazon is doing, but they will never be as incentivized as you are to solve your problem.


I don't think non-viable to viable is exactly what's going on however, at least in most cases. I think the costs are not being fully understood, the impact to the mindshare of the team, the quality of service and redundancy, the additional headcounts and quirky management code that someone's going to be toiling away on when a cloud provider can handle it at a much lower cost.

Yes, you still have to open tickets, and yes, they still have problems that suck up a lot of time, however they spend the brunt of the time burning folks out fixing those issues, not you. If you talk to folks working with physical systems, that consumes their mindshare to the point that they're not focusing on their core product anymore.


> If you talk to folks working with physical systems, that consumes their mindshare to the point that they're not focusing on their core product anymore.

All I can say is, it isn't always the case and it doesn't have to be with the right organizational and design decisions.


That story isn't much different from my interactions with the on premise infrastructure team.


Which is why often the best tradeoff can be in colocating in existing datacenters, and just putting a few dozen of your own racks in.

Or in renting dedicated servers.

I’ve done some checks for the same performance between colocating, dedicated, VPS, and stuff like container engine + cloud databases, and with each level of abstraction the cost per month usually goes up by an order of magnitude.


Absolutely, there's a whole spectrum of options. "Building a datacenter" doesn't have to mean finding an empty field next to a river. Get a dedicated or shared cage, rent servers, do what fits your needs.

It makes sense that each layer of abstraction costs dramatically more, because these services aren't vertically integrated -- they operate as value adds on top of other services. Heroku runs on AWS who buys server hardware and network hardware from vendors and puts it in racks that sit in buildings operated by datacenter providers who supply power from the electric company.

So when you run something at the abstraction layer of a Heroku dyno, you're contributing to the profit margins of at least:

  1. The actual service you use (Heroku, and Salesforce)
  2. Their instance provider (Amazon)
  3a. (maybe) Their server hardware manufacturer (HP / Dell / Supermicro)
  3b. (maybe) Their network hardware manufacturer (Cisco / Juniper / Arista)
  4a. Their server ODM (Quanta / et. al.)
  5a. Their server components manufacturer (Intel / Hynix / Samsung / HGST / Mellanox / Adaptec / Avago / etc.)
  5b. Their network ODM or ASIC designer (Avago)
  6. (maybe) Their hosting provider (Peer1 or someone)
  7. (maybe) Their retail datacenter provider (Equinix / CoreSite / etc.)
  8. Their wholesale datacenter provider (DuPont Fabros / Digital Realty / etc.)
  9. Their suppliers and contractors (electrical contractors / Square D / security guards /
  10. The electricity and the taxes paid to the municipality
Each of these companies intends to make a profit on the services they provide to the company up the food chain, and that all gets bundled into what you pay.

Sure there are economies of scale here that you can't get, and smart companies are trying to eliminate some of these layers -- but that doesn't necessarily mean you are paying less.

The margins increase as you go up the stack. Electric companies make peanuts for margins (8-10%) compared to a company like Mellanox that makes $1200 network cards (20+%).

But who cares about selling you a $1200 network card when you can charge people $100/mo to use a dyno?


Right, but you're just offloading lots of the work to staff then. So you prob can do it cheaper, but is is really going to be good? Or just a pile of hackery one guy put together?


But, similarly, a lot of people new to the clouds thinks that it runs itself and you need zero staff. You still need somebody to set it up for you!


Good points.

On the cloud lock-in, it is important to keep in mind that the companies offering that want to lock you in. The article mentions but I think it needs to be emphasized. They are not passive agents but more like really sophisticad drug dealers who study addiction and know how to profit from it. "Hey, psst look you get a 6 month free trial, just give this a try it costs nothing...".

But cloud providers started to compete with each other harder. As part of that many are lowering costs and open sourcing cloud orchestration tools which 10 years ago was the super secret sauce. Article covers this a bit as well and I noticed it too. Running your own cloud on bare metal is becoming more viable. AWS might be good today, but Google wants your business as well, and Kubernetes and some bare metal provider might save serious money in the future.

As for HN driven development, yeah I have seen a couple projects ruined switching from Python to Go (during last few years there was a story like that every week on HN). It wasn't that Go was bad, it is just that it destabilized an existing product without delivering enough benefits.

> One key emerging type of tool Freedman advises looking into implementing is a ‘distributed tracing’ system, often modeled after Google’s “Dapper” system.

The secret sauce for me is using Erlang (Elixir works as well). Sometimes it feels like cheating as in "this shouldn't be that easy": distributed tracing, hotpatching to add a log statement while everything is up running, or restarting small parts of the service. Imagine say C++ being able to confidently run gdb on a process, kill a thread, reload new code, let the thread restart without fear of causing some memory corrupt or leaving a lock acquired. Like the article said, you can do that with many tools, but having it be solid, and built it is a huge advantage. Money-wise it just means having less people and less ops pain. Because if there is one thing that's right up there with infrastructure costs, it's people's time.


> On my butt lock-in, it is important to keep in mind that the companies offering that want to lock you in. The article mentions but I think it needs to be emphasized.

It's funny that companies trying to lock people in their clouds have to be warned about themselves being locked in a cloud by their service providers. To follow your example, it's like drug dealer getting addicted by another drug dealer. They should have known better...


I didn't find it so offensive (adblock), and the advice actually seems pretty spot-on with my experience.

1) yes it's easy to prototype in cloud, and it's also easy to fall into the trap of vendor lock-in. Instead, if you are based in the USA (which sadly I'm not), check on ebay, there's plenty of refurbished or liquidated equipment at a fraction of brand-new pricing.

BTW, the latter should also be an indicator that not all is fluffy in cloud-land...!

2) 100% agree on this. Fintech is rearing its ugly head, with 100's (1000's?) of startups all trying to get a pie of the consumer.

I subscribe to the GNU/KISS philosophy. Keep it simple, keep it as a set of known tools which speak a common "API" (whether that's just plain and simple text, XML, JSON etc), train my guys to understand and use them, and you will achieve far more productivity than jumping every few months to yet-another-you-beaut toolset guaranteed to solve your CI problems (until after 6 months, you find out their business strategy is to get acquired, at which point in time you spend another 6 months adopting another tool... and another...)

3) If anybody has ever seen the power of dtrace, or even what a straightforward systems/network monitoring system can capture e.g. zabbix, then they would definitely agree that monitoring is key to ensure the health of the system(s). Once you get past a whole bunch of scripting alerts on one server, and wonder how to scale it, then bump into nagios/zabbix etc, you will kick yourself for not having done so sooner!


Please don't roll your own hybrid cloud or colo if you don't know what you're doing, particularly from the start. It likely is a distraction from your core product and, as the article states, can easily tie up 3 or so solid engineers.

The takeaway really is that you should be aware of the trade-offs and lock-ins you're signing up for, as with anything.


Unless you're doing something stupendously horsepower- or data-intensive, don't use a cloud (roll-your-own or outsourced) at all. Sit a spare PC under a desk somewhere and run your servers/services on that. Once you figure out what you're actually doing and you know how your core technology scales, then you look at what's required to serve it to your target audience.


I don't agree with this strategy. A spare PC under a desk might work perfectly for years without any trouble. Alternatively you might spill coffee over it next week and destroy the server - and your business.

If you don't know how you'll need to scale then chuck up a `t2.nano` instance on AWS and use that. In return for a tiny monthly cost you get: - solid network connectivity - disk snapshots for backups - geographically redundant storage (S3) for static resources so you can survive the server hosing itself - monitoring of CPU load / status / disk usage - the ability to scale the server up vertically with the click of a mouse and 60 seconds' downtime - (with a tiny bit more work and cost) automatic scaling so you cope _automatically_ if the server falls over or load increases


Or connectivity intensive. AWS (and other cloud providers) provide a network that is difficult/impossible to compare to a "PC under a desk somewhere".

They're managing the daily deluge of DDoS attacks and you're paying for less than 0.0001% of that because a million other customers are sharing the burden.


Amazon absolutely can survive a DDOS attack. But can your wallet? AWS published a white paper on how to survive a DDOS on AWS that amounted to "out scale the attack." Doing that could very well be a business ending proposal right there.

Sure, your website never went down, but now you have an infrastructure bill you'll have do do another round of funding just to pay off.


I was more thinking of the DNS amplification and UDP flood type of attacks that are transparently handled by a cloud provider, but even for an application attack, you still have a choice to let your site go down rather than scale up infinitely. (You can cap the scaling.)


They do manage those - though mainly for protecting themselves, not the point customer being attacked. In AWS the people I've talked to recently as well as historically say you'll get pretty uniformly rate-limited? vs. actually doing per /32 DDoS mitigation type limiting. Has your experience been different (for volumetric attacks)?


Our experience has been that the "collateral damage to us" DDoS attacks vanished entirely from the "set of things we think about" which was not at all true in some of the colo's we were in.

In terms of application-specific attacks, we have used proxies in AWS to mitigate attacks against our colocated servers from time to time. AWS handles some of the volume and some of the types of attack traffic, and we scale and cache to handle others. This was much cheaper and easier than some of the Prolexic type solutions.

Agree that they aren't doing anything specific on a host or customer basis, but just inherent in protecting all of their customers, some of the specific problems also go away.


Absolutely agree that collateral damage vs many small-mid-sized hosting providers is 0 in Amazon, though you do still have to deal with the normal 'noisy neighbor' problem by re-creating instances in a different neighborhood.


> https://www.versatel.de/business/kompetenz/glasfaser/ausbau/

At only 7200€/month for a 100Gbps line, the network won’t be an issue in case of a DDoS for you anymore.

DDoS’es over 100Gbps will be irrelevant anyway for you, because at that point your ISP will already getting issues.


At the very beginning, when you're trying to figure out exactly what you're trying to do, there is a lot of value to building in an environment like AWS. After that point there's still some value, but in certain ways it is much easier to start in your own datacenter environment than to try to switch later.

Unless you're careful you'll probably build dependencies on a number of AWS (or Google Cloud or whatever) services that you now have to replicate yourself in order to move out. That's possible to overcome, but the extra burden pushes a lot of companies to delay to the point that they're spending tens of millions a month for something that could cost them a million or less. That could buy a lot of manpower (or give you a lot more flexibility in your business).

It's also totally possible to have vendors, networking, datacenter, server, and OS stuff handled by one person. It can't last forever, but you definitely don't HAVE to have 3-5 people just to turn up a cage and buy hardware in some sane fashion (if you make the right hire and empower that person).


Right. 3-5 engineers could be 50-100% of some people's entire team.


This seems way too biased against the cloud. It doesn't mention things like a solid sales relationship with your cloud provider which can help you unearth all kinds of breaks and incentives. I've been using AWS in dedicated and hybrid modes since its inception. If you are hitting a pain point cost wise they will work with you to try to keep you from leaving or migrating services to on-site.

He also doesn't mention the huge benefit of cost drops that cloud providers will give you that you will not see when you're on a 3-yr lease and a long-term DC/bandwidth commit.


Even those breaks and incentives can’t change a 2 order magnitude difference in cost between containerized products and bare metal, or the single order magnitude difference between virtualized and bare metal.

If Amazon gives you the same service for a tenth or a hundredth of the price, sure, but that just doesn’t happen.


Can you please give an example where you can see those cost savings because I will migrate to those services tomorrow.


I’m comparing products like https://www.hetzner.de/us/hosting/produkte_rootserver/ex41 with similar performance at DigitalOcean and AWS’ EC2 cloud to get those numbers. The linked example is an Intel® Core™ i7-6700 with 32 GB DDR4 RAM, 2TB HDD storage (in a RAID, so raw capacity is 4TB, but usable is 2TB) and 1Gbit/s network connection (30TB traffic inclusive), for 40 bucks a month.

Compare with DO: The closest comparable product at DO runs at 320 bucks a month, and you need to buy extra traffic (you get 23TB less traffic).

On EC2, the best comparable model would be the m3.2xlarge, for 280 bucks a month. (plus another 100 bucks for the 24/7 phone support).

Now, let’s try getting the same performance with service-as-a-service things: At heroku, 24/7 phone support alone is 1000 a month. Let’s assume a standard workload for that machine, we’ll use half of the RAM for Postgresql, and about 512GB storage for the database. With Heroku, that adds 750$ / month.

If we take the cheapest solution – a single dyno of the most powerful type – we end up with 1250$/month + support. If we use seperate dynos for our services, as many as the original example server could run, we get 17 Performance M dynos at 250$/month each, overall reaching 4250$.

And Google’s Firebase and Container Engine prices are at the same costs, same with Amazon Lambda, etc. And Hetzner isn’t especially cheap – any dedicated hoster will provide you their services at those prices. It’s just the nature of virtualization and higher abstractions that they are expensive.

If you can handle servers failing in your infrastructure, go with the cheapest possible option – like KimSufi, Online.net, scaleways, etc. If you want the standard quality and price, use Hetzner, OVH, and all the other industry-standard hosters.


Agreed - LeaseWeb, OVH, Hetzner, and even SoftLayer if you call and negotiate can all be great options and have been very stable for many folks for dedicated servers. Generally I recommend that people not make long term commits, as it gives more leverage if there are network hot spots or other issues you need their help resolving.


> If you discovered the tool on Hacker News and it's less than 18 months old — 'Danger, Will Robinson!'

You shouldn't fear new things. Keep it simple, keep it smart, and embrace what works for your team. Don't shun things because they're old, don't shun things because they're new. Shun complexity.

How about a real-world examples? Slack, Google Docs, and ZenHub. All of these added value right out the gate.

I first read about Slack in February 2014 on Hacker News.

* We Don’t Sell Saddles Here – Medium || https://medium.com/@stewart/we-dont-sell-saddles-here-4c5952...

Started playing with it, then started using it. It's helped my team move much faster. Slack added value. No reason not to use it. Same for ZenHub, same for Zapier, same for Docker, same for a bunch of other tools where I can say, first-hand, that being an early-adopter paid off vs. using something "tried and true."

Oh, and on that note, I freakin love Marker! (=

* Marker - Annotated Screenshots Sent to any Bug Tracking Tool || https://getmarker.io/


From the article, it doesn't seem like he's talking about tools like Slack. If that goes down, maybe it affects your productivity, but probably not your production systems.

What he warns against is betting too early on core technologies like service discovery, deployment systems (Docker etc), database systems (MongoDB) And from reading HN posts, it sure seems like there's no shortage of people being burnt by that.


Yep, sorry if it didn't come through clearly enough.

I was talking about (without trying to pick on any particular projects/vendors) infrastructure glue components - db, deployment/orchestration, storage, discovery, ...

Maybe 'tool' was the wrong word, and 'component' would have been more clear.


How do you feel about PaaSes like Cloud Foundry or OpenShift?

They are, to greater or lesser degrees, able to present a uniform platform to developers across various backends (CF runs on OpenStack, AWS, Azure, GCP, vSphere or raw hardware via RackHD).

So long as you deploy your own services using the same tooling (BOSH), it's possible to hoist and relocate a lot more easily than relying directly on the IaaS's services.

Disclosure: I work for Pivotal, the majority contributor of engineering to Cloud Foundry and BOSH.


I think they can be pretty efficient and if a handle is kept on what's deployed on top, can be not that much overhead over the cost of the infra (whether it's owned or cloud). But that keeping a handle on things is key - starting w/o a DBA is nice but if no one is tracking tables or how they're used things can get pretty expensive :)

Specifically re CF - have seen a few companies use CF to do multi-infrastructure, but a lot of the companies we work with have 5-10 roles and just run them via config mgmt to deploy or now docker +/- k8s, and don't use PaaS at all.


Thanks for coming back, I was worried that a late reply would be overlooked.

> But that keeping a handle on things is key - starting w/o a DBA is nice but if no one is tracking tables or how they're used things can get pretty expensive :)

At Pivotal we ran into this problem in building service brokers that worked by interacting with single, large, efficient shared services. Most services lack the strength of isolation that you are getting for the apps themselves. So on a shared database, queue, cache etc, the noisy neighbour can really begin to hurt.

In our 2nd generation of these service brokers we changed our approach. Previously asking the service broker for a service returned almost immediately (create account/endpoint/schema/queue/bucket/whatever). Now we actually go and provision an entirely new, isolated service instance. Luckily BOSH makes this relatively easy to do.

Essentially we've recreated the journey that led to containers: realising that while the efficiency of shared instances is nice, it's more important to be able to enforce functional and non-functional isolation. So now the services are on par with the apps in terms of their platform behaviour.

The outcome is the same: ops no longer have heavily gateway against bad developers, because only those developers will be affected by their errors. I have a very long analogy involving sharehouses that I will skip on this occasion.

> but a lot of the companies we work with have 5-10 roles and just run them via config mgmt to deploy or now docker +/- k8s, and don't use PaaS at all.

Yeah, the jump from nothing to all-the-things is a pretty big one for people who are solving the partial problem they see directly in front of them. Dynamic languages, NoSQL etc are all much more approachable than their alternatives, because you can build in smaller steps.

We're working on it -- PCFDev and BOSH bootloader are two main prongs and there's more to come. If you want to give any more feedback of kvetching, please feel free to email me (jchester@pivotal.io) and I'll connect you to the right people.


I wouldn't consider Docker a deployment system (at all).


> You shouldn't fear new things Keep it simple, keep it smart, and embrace what works for your team. Don't shun things because they're old, don't shun things because they're new. Shun complexity.

Often times, new things haven't fully mastered the complexity of the target domain. Also, they might work now, but will it work next year, if the project or service is abandoned? Using new things is fine, but we should be aware of these potential drawbacks.

The examples you give are great, but how many other tools were posted in 2014 and since have gone kaput? Gotta watch out for survivorship bias.


I like the idea of innovation tokens: http://mcfunley.com/choose-boring-technology

Basically you get 3 things per project where you can use something that's not tried and true.


> You shouldn't fear new things.

I found one easy way is to just ask the person using the technology why they are using it. As long as they understand the trade-off and understand how it works, that's fine. If they say because they've seen stories on HN or at latest meetup someone mentioned, or it's "async so therefore faster", then be worried.

I used to be that person out of college who didn't know and just went by whatever seemed more popular or whichever project had the coolest front page with most futuristic looking hexagons and such on it.


This! If they just repeat the marketing blurb from s/Mongo's/<hip new tech>/ website, then they generally don''t have a clue what they are doing.



I can't say I agree with the first two points.

For point #1: cloud services are fairly competitively prices with each other and using the tools they provide will lock you into a vendor but also drastically reduce cost. For example, we used to roll our own MySQL and Postgres now we use AWS RDS and it has saves us so much money I can't believe we didn't do it in the first place. Does that mean it will be more work to switch off of AWS? Yes, but it was worth it for us.

For point #2: with that attitude we would have never adopted Docker. And adopting it early put us well ahead of the game. Now almost everyone seems to use Docker or something like it but if we waited for it to mature it would have taken longer to get the rewards.

I completely agree with #3, though. Although back to #1, taking advantage of cloud provider specific monitoring tools can save a lot of time and money.

Edit: someone is way too downvote happy. Or maybe I'm using downvotes wrong. I use it for "you're an ass" not "I disagree with you"... I'd love to hear the opinion of the person who downvoted me and why they feel that way.


> drastically reduce cost.

How did you reduce the cost by switching to AWS? That’s basically impossible.

Usually renting dedicated boxes and running your own instances on them is the cheapest solution.

I guess that might also be the reason for downvotes: You making such an extraordinary claim that it seems like trolling.


Ahh, I see now where the confusion was.

I reread part #1 of the article and I was arguing something slightly different. My argument is that using something like AWS RDS has lower startup cost than having a DBA manage a DB server manually not that it is cheaper than running on bare-metal.

I guess what I am saying is running your DB manually on the cloud protects you from vendor lock in (Cloud Jail as he calls it) but at the expense of greater upfront costs. It is the worst of both words.

I think startup shouldn't worry about managing their database, it's a distraction in the early/mid stages. If a cloud provider has a managed version the break even point where building it out yourself is cheaper is actually surprisingly far out and if you haven't validated your idea yet, I just don't think it the ROI justifies it.

Replace database with other infrastructure pieces.

With that said, after re-reading I do agree with the author on the point that you should consider your options and have a plan.


I don't agree with you, but I'm tired of this fight. I'm replying to your point about downvoting in case it helps:

https://news.ycombinator.com/item?id=117171

Speaking of tired fights, and only slightly related, I've always thought removing the vote count display was a mistake


Popup free version: http://archive.is/DQJHI


Am I the only one who couldn't get rid of the pop-ups in the bottom third of the article? Every time I scrolled the same pop-up that I had just dismissed popped up again.

It was a very interesting article that made numerous valid points but I came away thinking a lot less of a VC that couldn't successfully configure a blog. Someone there proof read the articles?


If you can get away with it, start out running multi-cloud

That seems like a pretty bad idea to me. You don't need to run multi-cloud at the start. Especially if you just use basic services like EC2 and S3 or something container-based, maybe a database service for something standard like Postgres, you can avoid getting locked in for quite a while. Early on, there's just so much to do, going "multi-cloud" is a waste of your limited time and energy.


People always forget that there is a world between the "cloud" and leasing collocation space where you manage your own servers and routers.

Renting dedicated servers is what everyone did before AWS, and it's still the most affordable hosting.


Exactly. As I showed here [1], there’s almost an order of magnitude between EC2 and dedicated servers, while EC2 provides no benefit (unless you have highly variable load – but in that case, you can just use EC2 in addition to your existing servers).

[1] https://news.ycombinator.com/item?id=12627864


As for hosting costs and hip tools: I see a lot of cases where they go hand in hand.

And example is Magento. This is maybe the hippest webshop with all the bells and whistles you could ever need. But it's also slow as hell. The amount of money some companies throw at it to make it fast is insane.


> First do no harm. Protect your user experience at all costs. Make their trust sacred.

Why isn't this first?


The three mistakes:

1. spending hundreds of thousands of dollars per month on internet services ('They land themselves in Cloud Jail.')

2. choosing technology based on hype rather than maturity ('They get sucked in by “hipster tools.”')

3. not understanding what your computers are actually doing ('They don’t design for monitorability.')


>1. spending hundreds of thousands of dollars per month on internet services

The problem with this is you don't know if your business will be popular or not. If it isn't and you spend money on well thought out infrastructure well you wasted time and money. If your product is successful then you can buy infrastructure later. Also from a business perspective you don't want PP&E on your balance sheet. That is why you will see so many creative leasing schemes which fsba cracked down on.


Which is why... (paraphrased) Freedman advises startups to watch the following indicators as a measure of whether they may be approaching the danger zone:

- When always-on / constantly-growing workloads cross the $100,000/mo. mark, you may hit the danger zone sooner than you think.

- Keep # of lockin services in check.

- Monitor for performance and look for cases where someone else's cloud starts to cause issues.

This definitely matches infrastructure progression I've seen too!


The most invasive popup I have ever encountered.

Also the three points made in the article are, shall we say, dubious advice. I wanted to use more extreme language but I think it's frowned on at HN.

Here's a sample: "If you discovered the tool on Hacker News and it's less than 18 months old — 'Danger, Will Robinson!'"

Ugh - that what you get for reading advice from venture capitalists.

Don't read this post.


Try it on mobile. Doubly worse on an iPhone 4s screen when you have the top and bottom fifths covered with an ad.

I also agree the warnings were apt but the accompanying advice didn't really address the issue well. I deal with other people's infrastructure issues every day, startup and old business alike, and rapid adoption isn't the problem the article makes it out to be. My experience is that you can barely get most people to apply a simple critical update to a storage device much less major infrastructure changes without the person getting an official guarantee from every vendor in their infrastructure that the change won't disrupt their workflow.

And the advice on cloud jail just seems premature for most start ups. It's talking about the owners answering to boards when it seems unlikely to me that most people needing this advice would even be so far along as to have a board to answer to. The advice of "don't put all your eggs in one basket" is great, but that second basket costs money. If anything, I'd imagine a board concerned about costs would want to consolidate costs not spend more. Boards tend to make irrationally fiscally conservative choices.


I agree it's not something people are likely to hit int heir first year, but I do think some sensitivity to trade-offs at the beginning is good.

Re: board dynamic - the interview was about things to watch for for people thinking they might get into decent growth. At those stages, especially in 2016, many boards are watching gross margin and unit economics. But ours has been very supportive from day 1 of having SaaS offerings be HA and DR.


I didn't encounter any annoying popup. I don't have JS disabled but I do use uBlock Origin.


That made me want to disable javascript.


ScriptSafe for Chrome. I don't leave my home network without it.


If you are on Firefox, you may want to try NoScript https://noscript.net/


I started reading this article and it has some good points. Then the giant ad when you barely scroll. Amazing an article that seems well written is obliterated and unreadable.

A big point I find myself trying to convey to developers who start devops was well summarized.

When it comes to infrastructure components, keep it as simple as possible. (And have a healthy amount of skepticism.) “When it comes to your infrastructure, especially the core components that glue everything together — storage, load balancing, service discovery — you really need to be using things that are not, themselves, going to cause problems. You probably have enough problems with the rest of your application and components.”

I wish I could have finished and shared the article but, sadly the ad as I scroll down made it unreadable.


I'd bet lots of money that this is exactly the sort of PR puff piece Paul Graham describes [0], where a PR firm writes a pseudo-advice column that drops Kentik's and Avi Freedman's names a lot. Bland, generic infrastructure advice phrased in an insidery tone that leaves lots of Google trail for the company that paid for this sponsored advertising. Don't waste your time, it's nothing you can't figure out on your own.

[0] http://paulgraham.com/submarine.html


As to the specific advice, sorry you didn't think it was valuable.

But re: it being a PR fluff piece -

Nope... Not in this case, and I would bet against that being the case with First Round Review in general.

I'd bet a reasonable sum that every article int the First Round Review was spoken and/or written by the person being interviewed/quoted.

For this article, Camille prepped me with some questions, then we spent an hour on the phone, and she got me a draft that I made suggestions on (especially the monitoring section, which was much weaker originally).


There's some really, really good stuff here — well worth the read.


Clickbait title.


How is it clickbait? He lays out 3 pretty clear mistakes, and ways to avoid them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: