How We Made Github Fast (2009)

Lewisham · on Feb 22, 2011

This blog post blew my mind. I generally thought of GitHub as being in the front-end business, but maybe they're really in the back-end business?

Is this sort of complexity common? I'm not saying I think it's bad, as I am sure it's all there for a good reason, but it just seems like so many services to bounce through before a web page is even served.

nostrademons · on Feb 22, 2011

Pretty much, yeah. The first startup I worked at (corporate remote access) had half a dozen different services that a request touched before a connection could be established. The second (financial software) also had half a dozen to a dozen. The one I founded (casual game creation) had 4 different independent systems interacting, and never got to the point where it was productionized (I killed it because the prototypes indicated it wasn't a compelling business idea).

Google, of course, is in a class way above all those. I'm not sure if anyone actually knows how many services a single search hits now - perhaps some of the old-timers who're now VP level or above in search. The last public figures I was aware of were "hundreds of distinct services spread across tens of thousands of machines".

Most startups that actually do something useful require far more than a web frontend and a database.

msbarnett · on Feb 22, 2011

In a production environment, at scale, this is a relatively uncomplex example, I think. At any rate I've seen significantly more complicated multi-service architectures for what would also have been called a "front-end" business.

ptn · on Feb 22, 2011

Now, that scrambled my brains. Uncomplex? I thought this was pretty complicated and that mojombo was a scalability God (and he may be).

Could you point me to more complex examples?

nostrademons · on Feb 22, 2011

This is why it's helpful to work in a big, successful company before founding your startup. You get a sense of the scale and complexity of problems that actually have real users, and practice dealing with those problems. If you stay long enough, you also get a sense of the trade-offs involved, and which complexities are accidental vs. which are inherent.

Of course, when you actually go to found the company, you should forget all that scalability nonsense and start with the simplest system possible. However, it's worth having an image in your head about how the system will evolve and what the appropriate response to various scalability challenges is.

msbarnett · on Feb 22, 2011

Uncomplex is probably the wrong word. Averagely complex for this sort of thing? Not easy to achieve or trivial to build, but, from a 40,000 foot perspective, probably typical of the kinds of complexity you'd find in most popular web applications.

There's a very long discussion, somewhere, of Amazon's eventually-consistent architecture that I'd say is definitely on the high-complexity side of things. Unfortunately I can't find the link I'm thinking of, at the moment.

alexyim · on Feb 22, 2011

What does this post even mean? What is a front-end business vs. a back-end business?

As far as complexity goes, I believe Amazon touches some hundred something services before a page is even served. It's no surprise that they decided to make some of that into a business and start charging for it.

steveklabnik · on Feb 23, 2011

To put it slightly crudely, he thought GitHub was in the business of making a pretty UI on top of git repos, not designing Serious Technology that nobody sees.

alexyim · on Feb 23, 2011

Ah, gotcha. Thanks, this makes a little more sense

Erwin · on Feb 22, 2011

A few years back I've seen 100-150 as number of services accessed to render an amazon.com page; as mentioned in this 2006 interview: http://queue.acm.org/detail.cfm?id=1142065

thibaut_barrere · on Feb 22, 2011

Funny - I had the reverse opinion (having used resque etc) :)

d0m · on Feb 22, 2011

Ok! Now, make the website fast! I use GitHub for my documentation(wiki) and issue/bug tracker, and it takes ages to load those pages. I wanted to complain about it.. wrote a long feedback message, and it took 2mins to post it and then I saw the error GitHub page. How ironic.. The feedback system is too slow to post my feedback about the site being slow.

But, to be fair, it's not always like that. I guess that they had lots of trouble on our server recently and we've just been badlucky.

rtomayko · on Feb 22, 2011

Where are you located (geographically)?

We have a couple developers more or less dedicated to performance at this point, and we're moderately satisfied with server response time for most areas of the site (the issue tracker is a notable exception -- we're working on that).

I ask about your location because we also have some latency issues as you move further away from the US east coast (Wash. DC) datacenter. We're working on that also.

Would you mind maybe submitting the output of `ping github.com' and any other pages you find intolerably slow at http://support.github.com/ ? If that's too slow just dump stuff here I guess.

stdbrouw · on Feb 22, 2011

In London, and just about any page loads much slower than I'd want/expect. Not like there's a better alternative out there, but 2-4 second page loads can be a bit frustrating at times.

rtomayko · on Feb 23, 2011

Can you dump some ping times? 2-4s is waaaay outside of what I'd expect the average response time to be.

stdbrouw · on Feb 23, 2011

Ping times seem to be alright — about 120 ms on average. The 2-4s I was referring to are the resource load times I see for the main HTML in the Safari dev pane.

mhartl · on Feb 22, 2011

Nitpick: It's "GitHub", not "Github". Also note the date: 20 Oct 2009.

ptn · on Feb 22, 2011

Added the date to the title. Sorry about that.

unsigner · on Feb 22, 2011

What does he mean by "bare metal"? I've heard of attempts by Oracle to run directly on hardware, without an OS; does the same exist with mySQL and nginx? Or is he simply meaning not-a-VM?

rtomayko · on Feb 22, 2011

"Bare metal" as in "not-a-VM", yeah.

mcantelon · on Feb 22, 2011

He prolly means not a VM.

arthurschreiber · on Feb 22, 2011

nat-a-VM. As far as I know, GitHub was hosted on EngineYard VMs before they moved to rackspace.

Skywing · on Feb 22, 2011

This is an older post, but it's a high quality one to read.

kstenerud · on Feb 23, 2011

Page did not respond in a timely fashion. Check our status site for alerts.

Either you found a page that took too long to render or we're getting more requests right now than we can handle.

You can try refreshing the page, the problem may be temporary. Learn how to deal with GitHub outages and other access problems.

jjcm · on Feb 22, 2011

What surprised me was that they were using 15k rpm drives over SSD's for database lookups. Any idea why?

icefox · on Feb 22, 2011

Asked them about this recently and the answer was that the io wasn't a problem. Better bang for their buck elsewhere.

alanh · on Feb 22, 2011

Well, it’s 2009.

m0shen · on Feb 22, 2011

Better $/gb right now? Maybe they just don't require the performance?

cullenking · on Feb 22, 2011

SSD's can't handle the amount of writes github tosses at them.

joshu · on Feb 22, 2011

Can someone explain how you do mysql replication via DRBD?

wgj · on Feb 22, 2011

(2009)

_4vyi · on Feb 22, 2011

"...The Aristocrats!"