Three years later, Mr. Moore is still letting us punt on database sharding

shanemhansen · on Jan 27, 2012

A lot of startups engage in a sort of cargo-cult architecture. Their reasoning goes something like this:

1. Amazon/Facebook/Google have a lot of traffic.

2. Amazon/Facebook/Google use X to scale horizontally. ergo:

3. My little startup should use X and scale horizontally.

What they fail to realize is that most of these companies would be ecstatic if they could scale machines vertically, if they could focus on great user features instead of having to figure out how to shard in the application layer. You should never forget that Amazon, Facebook, and twitter all started out as pretty basic LAMP stacks and built the tools when it was obvious that no other tool would do. I think google's an exception because their MVP was in fact a web scale application. So by all means, vet your idea, get some customers, get traction, and scale the cheap way by buying more ram for as long as you possibly can.

marshray · on Jan 27, 2012

Yes. But let's look at the other side too.

Before Google came along and showed the business people the benefits of horizontal scaling, any software engineer would be automatically considered crazy if they suggested an architecture that wasn't built on a central RDBMS.

So you have to weigh it against the other cargo-cult. How many startups along the way have failed due the inability to scale horizontally?

How many have failed due to too much cost and complexity associated with re-engineering an architecture in which the assumption of fully ACID transactions permeates the entire codebase? (While the phone is ringing off the hook because production systems are falling over under load.)

j_baker · on Jan 27, 2012

I realize this may not be a popular opinion on HN, but there's something to be said for planning ahead. I've seen this story before and already know how it ends: you wait until the last possible moment to switch to a more horizontally scalable system and next thing you know you're spending more time and money maintaining the "cheap" solution than it would have taken to switch to something like Cassandra beforehand. To make matters worse, your service is crashing and the short term fix takes a day and requires two or three people to do the replica switch shuffle. The long term fix will take a couple of weeks, if you have the time for it of course.

Long story short, I have to grant that you shouldn't worry about scaling up too soon or too quickly. But that don't go to the opposite extreme by putting it off until the last possible moment.

spudlyo · on Jan 28, 2012

Amazon did not start out as a LAMP stack, it was more like a "DNBC" stack.

D igital Unix

N etscape Commerce Server

B erkeley DB (NoSQL!)

C code (linked into the HTTP server)

cperciva · on Jan 28, 2012

B erkeley DB (NoSQL!)

I'd call BDB "pre-SQL" rather than "NoSQL".

shanemhansen · on Feb 2, 2012

Granted I'm speaking from second hand experience, but for the last couple years I worked closely with some former and current amazonians who worked on their frontend and order workflow team. I do know for a fact that much of their frontend is/was in perl, and that oracle databases have their place. But I guess that wasn't what they started with.

krupan · on Jan 27, 2012

He mentions Fusion-io drives getting faster. What's interesting about iodrives is that you don't have to buy a new one to get more speed, just updating your CPU or RAM will speed up an iodrive:

"Because VSL leverages powerful host CPU cores and memory for block mapping, ioMemory performance automatically improves as organizations upgrade host CPUs or memory" (http://www.fusionio.com/overviews/vsl-technical-overview/)

VSL (Virtual Storage Layer) is the Fusion-io software that makes an iodrive look like a block device to the OS, among other things.

(Yes, I'm an engineer at Fusion)

ohashi · on Jan 27, 2012

For some reason, I don't imagine BaseCamp's use is really exploding like some other startups might in terms of database usage.

It could very well be a case of their growth is scaling in-line with Moore's law, so it just happens to work out well.

patio11 · on Jan 27, 2012

Even among startups, web scale data requirements are the exception, not the rule. Facebook and Google are ginormous. There are many, many very impressive applications whose database wouldn't tax a single commodity server. (Similarly, there are applications that make terrible businesses but which consume computing resources like losing a byte of information would doom humanity.)

I mean, go through a list of YC companies or other startups you respect, winnow it down to the ones that exited or otherwise achieved some level of success, and play guess-the-size. How many terabytes of storage do you think e.g. Airbnb needs?

InclinedPlane · on Jan 27, 2012

There are a ton of high-traffic websites out there that don't need an architecture any more complex than a standalone DB server + PHP + varnish (or the equivalent).

More so, if devs spent as much time tuning the performance of their apps as they did fantasizing about "web scale" architectural pivots they would typically be farther ahead. StackOverflow.com is a perfect example of this. They run on tiny handful of windows machines, support gobs and gobs of traffic, and have absolutely fantastic performance. And as much of that is due to paying attention to performance and making sure to find and remove the bottlenecks where they exist as it is to using cutting-edge architectures like database sharding, map+reduce, eventual consistency models, etc.

j_baker · on Jan 27, 2012

One thing I've found is that scaling rarely means solving difficult problems. Rather, it means putting more time into finding optimal solutions to problems that are trivial at smaller scale. For example, should your startup use Apache, nginx, or HAProxy as a load balancer? If you're just launching, the answer is "Who cares, just ship the fucking thing!". If you reach the point where you start measuring page views in the billions (and yes there are start ups that are at this point), it matters a great deal. Or should you use Postgres, MySQL, or some shiny NoSQL thing? Again, probably doesn't matter for small websites. But for larger services, it matters.

Also, don't underestimate how large log files can grow in a data-driven business (like AirBNB seems to be). I could easily believe that they have many terabytes of data just from logging actions their customers have taken.

adgar · on Jan 27, 2012

> Also, don't underestimate how large log files can grow in a data-driven business (like AirBNB seems to be). I could easily believe that they have many terabytes of data just from logging actions their customers have taken.

Logs don't have remotely the same access requirements as the databases used to serve a product.

j_baker · on Jan 27, 2012

Indeed, but it's worth pointing out that in this case "different" doesn't necessarily imply "easier". Instead of having to access the data across many concurrent connections, you have to be able to store the data efficiently so that it doesn't take up too much space and you can do jobs on them that don't take 3 weeks to complete. And let's not get into how you collect and merge them together. There are open source tools to do these things, but you're still looking at a decent amount of infrastructure to make it work.

rbranson · on Jan 27, 2012

Perhaps for the applications of yesterday (like Basecamp) this is the case, but the real innovation taking place is around collecting massive amounts of data and processing it in interesting ways. These systems are used every day to make quantified business decisions rather than best guessing based on someone's hunch. 37signals builds questionably good UIs on-top of a database, something people have been doing for decades now. The future is in augmenting intelligence by gathering massive amounts of data and reducing it for human consumption.

ohashi · on Jan 27, 2012

I don't disagree with anything you've stated. Explosive growth and requiring massive amounts of data storage are surely the exception not the rule.

That said, the blog post talks about enormous growth and it still fits inside Moore's Law's growth. I guess my gut is just saying it's not really that enormous in terms of startup scaling if it's still within those limits. Not to take anything away from 37Signal's success, but it feels like nothing of value was really added by this post. I present the post of a picture of 864GB of ram as supplementary evidence that is near the top of HN right now.

ams6110 · on Jan 27, 2012

I think most "other startups" would LOVE to be in the ballpark with BaseCamp in terms of usage. No BaseCamp is not big like Facebook, but you've heard of them right? Most "ordinary" business people who do project planning work have probably heard of BaseCamp too. Most people never hear about most startups, fewer try their services, fewer than that become regular users.

The point of the post is that in many/most cases, it's still easier and cheaper to throw hardware at a performance problem than to devote scarce engineering effort to optimization. And it's only getting more and more so. If you are a startup and you can throw $10,000 of hardware at a problem you can then keep your $100K engineers working on things that hardware alone can't solve.

untog · on Jan 27, 2012

Most "ordinary" business people who do project planning work have probably heard of BaseCamp too.

I think this might be the tech bubble showing. I've never heard of any non-tech, non-startup companies using Basecamp. I'm not saying that they don't, of course, but I'd be interested to hear some case studies in its use outside of tech-savvy crowds.

dhh · on Jan 27, 2012

Last I checked, I believe something like 20% of our customers were from the tech/startup scene. The vast majority of our customers are regular businesses.

But yes, we're still tiny compared to behemoths like Sharepoint. All the more reason to be excited about the next 20 years!

whatusername · on Jan 28, 2012

Fun example I heard of basecamp recently: for "Internet Marketing Masterminds" as an alternative to setting up a forum --- http://www.freedomocean.com/internet-marketing-podcasts-jame...

And there was a great press quote for you in the comments: "$50 is peanuts for what you get in Basecamp. Any business doing $1000 a month should find huge leverage from it."

tsunamifury · on Jan 27, 2012

Exactly. As horrible as it is, sharepoint rules this area and probably has an install base that dwarfs base camp. When the average corp person thinks project and file management they think sharepoint.

jiggy2011 · on Jan 27, 2012

This is possibly another bubble.

Most programmers probably work at either BigCo enterprises (banks, insurance companies, telcos etc) or at "startup" type tech businesses or freelance agencies.

There are other industries that contain a lot of small businesses and probably employ very few programmers, think restaurants , local shops , small law or accountancy practices etc.

These guys probably aren't using sharepoint, most of them are probably using excel spreadsheets combined with paper.

I'm not sure how many of these guys are using things like basecamp but they a lot of them probably should be.

Aqua_Geek · on Jan 27, 2012

> I've never heard of any non-tech, non-startup companies using Basecamp.

They've got logos for a few non-tech, non-startup companies on the Basecamp homepage: http://basecamphq.com

foobarbazetc · on Jan 28, 2012

Who doesn't have logos of random companies on their front page?

That doesn't always mean those companies actually use it. It just means someone with a @foocorp.com email address signed up.

hello_moto · on Jan 27, 2012

I thought 37signals website has videos of "satisfied" users? and some of them don't seem to be "tech" and "startup". Probably small businesses, but I don't think we can lump all "small businesses" == "startups"

jonknee · on Jan 27, 2012

> Most "ordinary" business people who do project planning work have probably heard of BaseCamp too.

I don't think so. Maybe we just have a different opinion on ordinary, but I come across people all the time that would think Excel is the normal thing to use for this (and pretty much any other task...). Using web applications for day to day workflow is still alien to a lot of "ordinary" business types.

ams6110 · on Jan 27, 2012

Granted. s/most/some/

moe · on Jan 27, 2012

I agree. Without any figures this is a pretty meaningless blog-post...

kyledrake · on Jan 27, 2012

I did a presentation on a game I made at Geoloqi that talks about the Fusion IO drives and why I think local hardware with SSD is the best solution for persistent data stores right now: http://www.slideshare.net/KyleDrake/building-mapattack

kyledrake · on Jan 27, 2012

The one warning I want to provide though is that not all SSDs are created equal: Make sure you get one that writes its cache to the disk on power failure, or you're going to be in a world of hurt.

krupan · on Jan 27, 2012

Can you explain more? You aren't talking about writing data that is cached in the SSD to a regular hard-drive, are you?

tesseract · on Jan 27, 2012

I think he's talking about writing data in the SSD's DRAM cache into the actual flash.

krupan · on Jan 29, 2012

That's what I thought too. Fusion-io drives all flush their write buffer to the nand flash on a power cut event, I wasn't aware that some SSD's didn't. People use iodrives as a caching layer in conjunction with tradition spinning hard drives, so wondered if possibly he was referring to that kind of setup.

xolox · on Jan 28, 2012

To guarantee on-disk consistency, programs like database servers call fsync for every transaction. By definition an fsync involves events on the hardware level and even in an SSD this gives a slowdown. The SSD can use a write cache to speed this up. However should you lose power and the write cache is not backed by a small battery, you lose your most recent write(s) even though the latest fsync call guaranteed that the changes were on disk. The battery allows the SSD to successfully flush the write cache to disk in the event that the computer shuts down.

groggles · on Jan 27, 2012

Pedantry -- that isn't Moore's Law related, aside from the loosest interpretation that computers as a whole get faster.

This is so true, though. Much of the horizontally scalability need comes from the land of extremely underpowered VPS machines on platforms like AWS. Yet the mind-boggling scale of performance you can inexpensively* (*-term used relatively) acquire for database servers is astonishing. SSDs (and plug-in flash drives) and boundless memory have changed everything.

masklinn · on Jan 27, 2012

> Pedantry -- that isn't Moore's Law related

Actually, RAM would be the perfect application of the exact meaning of Moore's law: more RAM is an almost direct function of more transistors, and Moore's law is:

> the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years

Moore's law precisely predicts you can double your amount of RAM every two years for the exact same price. Which is pretty much what TFA is about.

SSD storage size is the same thing.

_delirium · on Jan 27, 2012

Given all the problems they've had with AWS, I wonder what the overall cost/benefit would be for someone like Reddit moving from the flexed-EC2-servers model to just one high-powered db server and some caching webservers in front of it.

adestefan · on Jan 27, 2012

Reddit's probably was using trying to use EBS as a backend for a transactional database. The issue is that EBS makes no guarantees on latency, only that they'll never lose your data.

bad_user · on Jan 27, 2012

     they'll never lose your data

Amazon is not saying it, but EBS is probably lying when doing an fsync().

In this post [1] explaining why Reddit was down, one problem was that the DB slave got ahead of the master, with the most probable explanation being that the master flagged the data as being safe for replication, before committing it to disk and I trust PostgreSQL more than I trust EBS.

In this forum answer Amazon is giving about this potential problem [2], they are dodging the question by saying that fsync() guarantees durability for instance failures, but not for volume failures, with the anual failure rate (AFR) being given as 0.1% - 0.5% for volumes (how accurate that is, it remains to be seen).

So EBS is probably lying about fsync's success, especially since the behavior of fsync in virtual environments is always a surprise. So you can definitely lose your data more frequently than if you had your own hardware.

[1] blog.reddit.com/2011/03/why-reddit-was-down-for-6-of-last-24.html [2] https://forums.aws.amazon.com/thread.jspa?threadID=27590

cperciva · on Jan 27, 2012

Amazon is not saying it, but EBS is probably lying when doing an fsync().

Given that POSIX says "the nature of the transfer is implementation-defined" and they've defined what fsync does on their implementation: No, they're not lying.

I wouldn't say that they're being misleading, either. On a local physical disk, once fsync returns your data is safely stored unless/until the disk dies. According to the forum post you linked to, semantics on EBS are exactly the same.

fsync does not mean "has been written to spinning magnetic media".

pangram · on Jan 27, 2012

Reddit also found EBS to be unreliable; they've moved all their high traffic data to ephemeral storage. They have an interesting summary of their technology stack here: http://blog.reddit.com/2012/01/january-2012-state-of-servers...

foobarbazetc · on Jan 28, 2012

Reddit could easily cut their spend on hosting by like 50-75% just by moving to a dedicated hosting provider. They'd also get bare metal speeds in the process.

Who knows why they've stuck with AWS, though. It's possible Amazon is giving them a discount so they can be used as an example.

bitops · on Jan 27, 2012

What never seems to get mentioned is energy cost. Sure, you can keep throwing hardware at the problem, but eventually that will lead to a substantial power bill.

rorrr · on Jan 27, 2012

Sharding will lead to an even much higher power bill.

wildwood · on Jan 27, 2012

Agreed, sharding takes "throwing hardware at the problem" to a whole new level.

bitops · on Jan 28, 2012

I agree with what both of you have said. It's possible that, in the not-too-distant future, "green programming" will become a new field. If you can design an architecture (both hardware and software) that demonstrably reduces power costs, that's a valuable skill.

And, I'd argue, on two fronts: lower costs for the company, plus good marketing. Every forward-thinking company (at least from what I've observed) are eager to brand themselves as "carbon neutral", "eco-friendly", etc.

dkhenry · on Jan 28, 2012

This is nothing more then trying to pay the interest on your technical debt instead of paying off the balance. The longer you wait the more it grows and the bigger the problem becomes down the road. Right now its 20K for a new system, And for your troubles you still get no redundancy. Yes ram is cheap and you can just buy a bigger system next year, but hopefully you will hit a limit (growth is like that) then you will need to scale your now large and complex system. Code will need to be reworked. Interfaces will need to be re architected. The guy who wrote that bit of code for you that runs your whole business he may not be with you any more, even if he is that hack he put in place because you had the horsepower to spare, he doesn't really remember why he did it at 3:00am that one morning when stuff got out of hand. Now you have downtime and huge development costs, because you ignored a problem and instead threw hardware at it.

wazoox · on Jan 28, 2012

One of the most common similar occurrence I meet regularly is people grossly underestimating how many bytes per second you can really get from basic hardware such as hard drives, gigabit ethernet cards, etc. People often still remember that in 1998 the fastest FC 10k rpm drives hardly topped 12 MB/s, and that you needed an 8 CPU Origin2000 to push 80 MB/s through a GigE link (given that you had a terrific RAID array at both ends).

Nowadays, even the most basic PC can saturate a GigE link (115 MB/s), and the slowest hard drives go 100 MB/s. Any SSD sustains several thousands IOPS, and so on.

One of the most unfortunate result is that people often buy hugely powerful hardware when very basic stuff would have done the job just fine. How many people have I seen running puny workloads on 100k bucks 3Par or EMC arrays, where a handful of SSDs in a server would have done as well or better? Pulling 40Gb fibre across the room when cat 6 cable would have sufficed?

wes-exp · on Jan 27, 2012

For those with custom hardware setups like this, how much time do you estimate you've spent designing, building, and maintaining custom servers?

I can see how it would be less effort than sharding, but it seems like throwing hardware at the problem is not a completely "free lunch" when you get to the scale of having to custom design a server supercomputer and source specially made SSDs and terabytes of RAM.

sciurus · on Jan 27, 2012

You don't need to "custom design a server supercomputer". This stuff is as mainstream as Dell. You can go to dell.com and order a Poweredge R815 with 64 processor cores and 512GB of RAM for $16,000. If you want SSDs, you can purchase SATA or PCIe ones from Dell as well.

http://www.dell.com/us/enterprise/p/poweredge-r815/pd http://www.dell.com/us/business/p/fusion-io-drive/pd

mhartl · on Jan 27, 2012

I think they mean "Dr. Moore". "Mr. Moore" is a hack documentarian. Dr. Moore, on the other hand, has a Ph.D. in Chemistry from Caltech.

bwooce · on Jan 27, 2012

Gah, stop abusing the word punt. The headline is nonsensical.

Punt does not mean avoidance. It has much closer associations to "attempt" (punter, plus rugby usage).

I totally agree with the rest of the article, why shard or otherwise distribute your database before being required? Although I think I would build my DB as shard 1 of 1 to allow for the future case.

ellyagg · on Jan 27, 2012

You're thinking of British English. In America, punt means avoid, and most people here probably think of an American football punt--which is the voluntary end of a failed offensive possession--so suggests you're giving up for now and will try again later. We probably bastardized it, sure, but that ship has sailed.

bwooce · on Jan 28, 2012

Thanks. Such a great word, lost forever :)

So if I said "take a punt", would that be to take, or not take?

I should build a guide of "words not to use on presentations".

alexjipark · on Jan 27, 2012