Hacker News new | past | comments | ask | show | jobs | submit login
Numbers everyone should know (highscalability.com)
142 points by chuhnk on March 19, 2011 | hide | past | favorite | 44 comments



My databases professor at my university once brought up Jim Gray's data latency analogy to illustrate the costs of using different types of storage. Basically it goes something like this:

Using a register is like interacting with information in your own brain, using an L1 cache is like walking to a bookshelf in the room you're in, using an L2 cache is like walking across campus, using RAM is like going to Sacramento (we were in Berkeley), hitting the disk is like going to Pluto, and using tape drives is like going to Andromeda.

I never fully registered the consequences of hitting disk until then.


My company does this, but in reverse, whenever someone is getting something to eat there's a metaphor for it. Cache means you've already got it, hitting memory means you need to go to the kitchen, disk means you need a trip to the store, regisers of course means you're actively chewing.


I really like this analogy. When the article said "Numbers everyone should know", it's not really the numbers that matter, but the general relative scale of performance costs for common operations. IOTW, memorizing the nanoseconds of a cache hit isn't useful, but knowing how many times slower disk is than ram is.


Well, sometimes cache becomes stale.

Core dump and flush need no explanation :)


People should also know for writes that there are a variety of ways to mitigate this. The first way is to write sequentially (no seeking). The second way is to install battery backed write cache which will do this for you. If you're ok with losing a little bit of data you can also just have the OS write to memory before flushing to disk. BBWC controllers seem to handle write reorder optimization much better than the OS. So I'd recommend the BBWC over writing to memory.

This is a large reason I dislike cloud hosting, most of the hardware is crap and you could get much better performance from it with a little more expense. Also it's mostly pointless to bother making your writes sequential as the performance degrades to random IO because of all the other instances on the machine.


Well if anyone wants to learn more from Jeff Dean than just twelve numbers, here's a talk he gave at Standford about Google's architecture and system building in general:

http://ee380.stanford.edu/cgi-bin/videologger.php?target=101...

The slides for the talk: http://www.stanford.edu/class/ee380/Abstracts/101110-slides....


Needs updating for SSDs, I'm not sure it's really worth the development time and increased time to market to optimise for rotating disks.


You seem to have accidentally traveled back in time to 2011 from whatever golden future you live in before posting this comment.


We spent a lot of time over several years tweaking every setting imaginable, trying to get the performance up of a NFS server, none of which really made any difference. Installing SSDs fixed the problem completely. Disk I/O now stays in the single digit percentage range when previously would stay at 100%, sometimes for minutes.

SSDs about twice the price of 15k rpm drives, hardly out of reach for most. Also for VPSs, server axis 20mb unmetered SSDs are cheaper than slicehost's rotating disk based systems. http://serveraxis.com/vps-ssd.php


Yeah, SSDs are going to totally kill 15krpm disks. But killing disks in general is a ways off. In fact, disks might actually be getting cheaper per bit faster than SSDs are.


SSD still costs US$2-US$4 per gigabyte. Spinning rust costs US$0.10-US$0.20 per gigabyte. SSDs are much faster but disks are still twenty times bigger at the same price. A developer hour costs on the order of US$100. If your service uses more than a few gigabytes per developer hour, it's worth some development time to optimize for rotating disks. Whether it's worth increased time to market is harder to calculate.


Thankfully there's a happy fusion of spinning rust and SSD: flashcache [0] and L2ARC/ZIL [1]. Cheaper win.

[0]: https://github.com/facebook/flashcache [1]: http://blogs.sun.com/brendan/entry/test



I know noSQL is all the rage with the crazy kids these days and I know it has its place but why is half of the literature regarding noSQL about solving problems that are trivial in an RDBMS?


This particular article is addressing problems that would not be trivial in an RDBMS. If you're worried about concurrency when incrementing a counter, you're at a level of scale few people get to work on.

There's a legitimate rant to be had about using elephant architecture to serve mouse traffic, and I think that's what annoys me most about the NoSQL fad. But that doesn't mean the problems mentioned in this article don't exist.


"If you're worried about concurrency when incrementing a counter, you're at a level of scale few people get to work on."

I'm not sure I understand. If I make an application that has to increment a counter, shouldn't I always be worried about concurrency?

What I mean is, sure, if I'm serving only a few requests, then of course the probability of running into concurrency issues is lower than a site with more requests/sec. But it's still a matter of chance, there is some probability that two requests will come in at exactly the same time and cause a problem.


You are worrying about correctness of concurrency. Yes, this should always be worried about.

The parent poster is worrying about performance of concurrency. This doesn't need to be worried about unless you are one of about 5-10 tech companies whose name is recognisable to people on the street.


I think we are on the same page. In this particular article I was referring to the paragraph regarding paging results which is ancillary to the main point.


Because RDBMS can only scale as far as your most powerful machine. Unless of course you start sharding to several machines, and then you hit the same issues mentioned in this article.


Stack overflow runs on a single DB server.

I think people really overestimate their data-store needs, and underestimate how much a big iron DB server can deliver.

Its probably worth thinking about scaling out on your data end, but until you start hitting some sort of limit on a 8 core 96 gig machine with server level SSD's its probably worth investing your time in other issues.


That's a very valid point. You can get reallllly far with just one really powerful database.

But, if you've built a site on the precondition that one database is enough, with a lot of joins so you can't split tables, and your hardware can't keep up.. you're in a really bad place. I say this out of experience ;)


Just out of curiosity what sort of site would that be? I would be really interested to hear of the use cases that cause a single powerful db server to be insufficient.

Of course over a certain scale this is normal, so im not interested in a site with 50 millions users, but cases where a single DB is not the best solution.


Basically, it was a site with a medium amount of paid users. Probably 10-20k daily uniques. However, the nature and diversity of the data behind this site required thousands of tables and tens of thousands of stored procedures.

Since these queries were built over ~10 years with the expectation of being able to join any of the thousands of tables, it would now be a tremendous undertaking to perform any sort of sharding or other ways of distributing load (other than master-slave replication, which was used widely).

It's been a struggle to keep hardware up to pace with the desired growth of the company. Even with a dedicated data center and top of the line hardware, doubling their user base would probably require a significant architectural undertaking.

Had the site been designed to support an arbitrary amount of distributed database servers from day one, it would now be trivial to grow horizontally.


The scalability you get really depends on the type of queries you fire, which depends on the kind of schema you have. A good schema designed for read (assuming that read is much more frequent than a write for some use-case) rather than write would definietly work out well with a SQL db.


I totally agree. If you are not write limited, it is also very easy to scale reads with slave servers or materialized views in some cases.


I love the concept of sharding counters to trade write performance for read performance! As so often happens, my understanding advances by looking at something I thought I knew the only possible answer to ("Of course a counter goes in a single variable!") and seeing that other alternatives are possible.


An article with 'numbers everyone should know' without linking to Peter Norvig?! Here goes: http://norvig.com/21-days.html


Not trying to be mean, but sometimes I wonder how many programmers actually have friends outside the industry. This happens when I read titles like "Numbers everyone should know" and then find out that it's numbers like the time needed for an L2 cache reference or how to optimize for low write contention. Does anyone realize that this stuff is about as esoteric as it gets?


You're discounting the context. If you read the title as "Numbers Everyone [who reads this site] Should Know" then it makes more sense.

If you're complaining about the Hacker News title, then:

a) It's the linked article title, so it would be inappropriate to change it, and

b) HN shows the domain name of the link in parentheses after the title. The article was pretty much in line with my expectations for a High Scalability article with that title.


I can't speak to your specific use cases, but on hacker news (or a blog about scalability) I think it's completely reasonable to have such a title. It's not link bait. Anyone building a system (e.g., for a startup) or maintaining/extending an existing system should be aware of these numbers -- if for no other reason than that they are fully aware of any tradeoffs they are choosing to make ("well, optimizing for seeks on write is premature at te moment, we need some users first, but I can at least add a level of indirection to prepare for future improvements." or even "yes, I know this will require refactoring as soon as we have XXXX users, but f-it, we gotta get to market. Let me notify others of the iceberg I'm building and why.")

Knowledge, properly applied, can be power.


I think the author assumed readers were programmers. That's OK in a trade publication.

Indeed, every programmer should know those numbers.


There's still an argument to be made that it's largely esoteric even for programmers. There's an enormously large cross section of professional programmers for whom this is really only relevant to satisfy some curiosity rather than in any way relevant to their day to day coding. There's 100% no use in knowing the difference between an L1 and L2 cache hit compared to a branch misprediction if your day job involves, say, writing Wordpress plugins in php or iOS apps or maintaining some corporate Java app.


Your right that you can't really think about this stuff to much sitting in a high level language. It is useful though to have a good idea of the underlying architecture which your taking advantage of at a higher level. Sure L1 and L2 cache optimization is out of your hands but if you are writing an app for the GAE knowing the relative speeds of writes in relations to reads could influence the decisions you make.


I thought the same thing until my "put()s" were slow as hell on App Engine. "Oh hey this is just like mysql_query()! I just call entity.put() and everything's fine!" Not!

When it comes to parallel operations over multiple machines ( most startups at scale will run into this problem ), these numbers really can come in handy.


Another way to make unique ordered keys is to use a timestamp and append a random value, such as a random UUID.

key = timestamp + UUID4


I didn't like this idea at first, since it struck me that if for some reason you had to create many records in quick succession (e.g. some kind of bulk upload) they would essentially be placed in random order. But thinking about it, the only scenarios where you would need to do this and want to preserve the order exactly would be if all the records were coming from the same source -- in which case you could just choose a single UUID for the batch and append increasing values of a local counter to this.

TL;DR: This would actually work really well I think!


Isn't this exactly what the article suggests doing? Time stamp + user id + comment count.


Blogspam repost of a 2yo+ presentation. A very good one, but still...


I'm sorry you feel that way. I was re-reading information on systems architecture and came across the article. I thought it was relevant to HN and thought hey this is something people might like to read. Not everyone here has been here since 09 and not everyone here will have seen the article. Just trying to help educate fellow members of this community.


Well you succeeded as far as I'm concerned. I like to tinker with GAE quite a bit and have ideas about hosting my next (serious) project on it, thanks for posting.


I get connection reset when I try to comment on your blog.


I have a blog?



Excellent tidbits!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: