Hacker News new | past | comments | ask | show | jobs | submit login

There's a wonderfully blunt saying that applies here (too): you are not in the business you think you are, you are in the business your customers think you are.

If you offer data volumes, the low water mark is how EBS behaves. If you offer a really simple way to spin up Postgres databases, you are implicitly promising a fully managed experience.

And $deity forbid, if you want global CRUD with read-your-own-writes semantics, the yardstick people measure you against is Google's Spanner.




Where does the misalignment between what the customer thinks they want, and what they actually want fit in to your philosophy? Google Spanner is a great example of this because who doesn't want instantaneous global writes? It's just that, y'know, there's a ton of businesses, especially smaller ones, that don't actually need that. The smarter customers realize this themselves, and can judge the premium they'd pay for Spanner over something far less complex. What I'm getting to is that sales is a critical company function to bridge the gap between what customers want, and what customers actually need, and for you to make money.

The first releases of EBS weren't very good and took a while to get to where we are. Some places still avoid using EBS due to bad experience back in 2011 when it was first released.


> who doesn't want instantaneous global writes

I want to gently note since I see a lot of misunderstanding around Spanner and global writes: Global writes need at least one round trip to each data center, and so they're still subject to the speed of light.


Like most things, it's more complex than that, and as a result it can be either faster or slower than 'median(RTT to each DC in quorum)'.

It's a delicate balance based on the locations that rows are being read and written. In the case where a row being repeatedly written from only one location and not being read from different location, the writes can be significantly faster than would be naively expected.


> Like most things, it's more complex than that,

Sure, no doubt. My point wasn't really about the particularities. It was around the mistaken idea that I see sometimes where people believe that TrueTime allows for synchronized global writes without any need for consensus.


The speed of light in vacuum is a hard upper limit. Most signal paths will be dominated by fibre optics (about 70% of C) and switching (adding more delay).

But, yes TrueTime will not magically allow data to propagate at faster-than-light speeds.


[flagged]


I think a Microsoft shill might choose a less suggestive name


> so they're still subject to the speed of light.

I giggled. Good witty comment, bravo.


I get the impression that you think "still subject to the speed of light" is some kind of hyperbole or something, like if you were on a freeway and saw a sign that said "end speed limit" and thought to yourself "welp, still can't go faster than c".

But when you're working on distributed systems that span the planet (say multi-master setups where ~every region can read and even write with low latency), you start thinking of the distance between your datacenters not in miles or kilometers but in milliseconds. The east coast and west coast of the US are at least 14 milliseconds apart:

  % units "2680 miles" "c ms"
  2680 miles = 14.386759 c ms
and that's not counting non-optimal routing, switching delays, or the speed of light in fiber (only 70% of c). Half of the circumference of the earth (~12500 miles) is likewise 67 milliseconds away absolute best case (unless you can somehow make fiber go through the earth).


In a nutshell if you offer cloud services you need to be better than the MAG clan, Digital Ocean too. And people will want it dirt cheap. It’s still hard to be a profitable web host as it always was (MAG has the advantage that none of them were web hosts at first base)


I am willing to pay a little extra for a nice dev/ops experience and simple/easy solutions that doesn't require spending days reading docs and diving into dashboards with thousands of options.

Usually this results in me jumping on new platforms and then abandoning them once they add too much complexity.


I suspect, in general, acceptability (or desire) for complexity in the cloud solution, and budget are positively correlated in customers.


The ridiculously overwhelming complexity is stickiness.

Think it’s bad to potentially technically move your solution from $CLOUD vendor? Wait until you turn around and realize you have at least one full time hire who’s entire role is “$BIGCLOUD Certified Architect” (or whatever) and your entire dev staff was also at least partially selected for experience with the preferred cloud vendor. At any kind of scale you have massive amounts of tooling, technical debt, and institutional knowledge built around the cloud provider of choice.

Then there’s all of the legal, actually understanding billing (pretty much impossible but you’re probably close by now), etc elsewhere in the org. At this point you’ve probably utilized an outside service/consultant or two from the entire cottage industry that has sprung up to plug holes in/augment your cloud provider of choice.

After realizing their cloud spend has ballooned well beyond what they ever anticipated plenty of orgs get far enough to investigate leaving before they realize all of this. Most decide to suck it up and keep paying, or try to somehow negotiate or optimize spend down further.

Cloud platforms are a true masterclass in customer stickiness and retention - to the Oracle and Microsoft level (who also operate clouds).

It’s interesting here on HN because while MS and Oracle are bashed for these practices AWS and GCP (for the most part) are pretty beloved for what are really the same practices.


This is really an oversimplification. MS and Oracle have licensing that's explicit in the way that it wants to lock you in, although in different ways. AWS and GCP posting public pricing that can apply all the way until you reach an absurd spend goes a long way, and the ability to turn off a workload tomorrow incentivizes these platforms to provide a high quality of service.

When working at AWS, a large part of the convincing for an MS shop would be around showing that we can offer a lower price than the 'discounting' that MS provides. Oracle was all about contract expiry.

While there's some complexity around migrating a workload, regardless of where it's at, many places are going into cloud migrations hoping to remain relatively platform agnostic. I've seen many successful migrations to and from different vendors, and often at an SMB or ME scale, in weeks not years.


Sadly I think you are likely correct.


MAG?


From context, I'm assuming Microsoft / Amazon / Google, referring to Azure / AWS / Google Cloud respectively.


Yep.

Because when I think reliable cloud infra, I think Azure.


I’m assuming Azure, AWS, Google Cloud, but it’s new to me too


Microsoft (azure) Amazon (aws) Google (gcloud)


They should have gone with GAA


microsoft apple google


if you add Akamai (Linode ) or Alibaba Cloud - then it will be come MAAG


Linode is not the same scale as the top 3. I believe even Digital Ocean is bigger than them (for now).


MAGA?


That's for top 4 companies by market cap.


MAGA cap?


oh no, not again.


Geez chill.


> if you want global CRUD with read-your-own-writes semantics, the yardstick people measure you against is Google's Spanner.

I’m trying to build more of an intuition around distributed systems. I’ve read DDIA and worked professionally on some very large systems, but I’m wondering what resources are good for getting more on the pulse of what the latest best practices and cutting edge technologies are. Your comment sounds like you have that context so any advice for folks like me?


Not sure it's what you are looking for, but how Spanner mitigated CAP to delivery a relational DB at scale is a really interesting read[1]

[1] https://research.google/pubs/pub45855/


The best practices are built on solid first principles, and you can get a pretty good grip on them from the High Scalability archives. Back when they posted actual tech articles, their content was some of the best available anywhere. Since you've read DDIA you will probably get quite a lot out of the archive. In fact, you should be able to identify at least some of the unstated problems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: