Hacker News new | past | comments | ask | show | jobs | submit login
How We Made Github Fast (2009) (github.com/blog)
166 points by ptn on Feb 22, 2011 | hide | past | favorite | 32 comments



This blog post blew my mind. I generally thought of GitHub as being in the front-end business, but maybe they're really in the back-end business?

Is this sort of complexity common? I'm not saying I think it's bad, as I am sure it's all there for a good reason, but it just seems like so many services to bounce through before a web page is even served.


Pretty much, yeah. The first startup I worked at (corporate remote access) had half a dozen different services that a request touched before a connection could be established. The second (financial software) also had half a dozen to a dozen. The one I founded (casual game creation) had 4 different independent systems interacting, and never got to the point where it was productionized (I killed it because the prototypes indicated it wasn't a compelling business idea).

Google, of course, is in a class way above all those. I'm not sure if anyone actually knows how many services a single search hits now - perhaps some of the old-timers who're now VP level or above in search. The last public figures I was aware of were "hundreds of distinct services spread across tens of thousands of machines".

Most startups that actually do something useful require far more than a web frontend and a database.


In a production environment, at scale, this is a relatively uncomplex example, I think. At any rate I've seen significantly more complicated multi-service architectures for what would also have been called a "front-end" business.


Now, that scrambled my brains. Uncomplex? I thought this was pretty complicated and that mojombo was a scalability God (and he may be).

Could you point me to more complex examples?


This is why it's helpful to work in a big, successful company before founding your startup. You get a sense of the scale and complexity of problems that actually have real users, and practice dealing with those problems. If you stay long enough, you also get a sense of the trade-offs involved, and which complexities are accidental vs. which are inherent.

Of course, when you actually go to found the company, you should forget all that scalability nonsense and start with the simplest system possible. However, it's worth having an image in your head about how the system will evolve and what the appropriate response to various scalability challenges is.


Uncomplex is probably the wrong word. Averagely complex for this sort of thing? Not easy to achieve or trivial to build, but, from a 40,000 foot perspective, probably typical of the kinds of complexity you'd find in most popular web applications.

There's a very long discussion, somewhere, of Amazon's eventually-consistent architecture that I'd say is definitely on the high-complexity side of things. Unfortunately I can't find the link I'm thinking of, at the moment.


What does this post even mean? What is a front-end business vs. a back-end business?

As far as complexity goes, I believe Amazon touches some hundred something services before a page is even served. It's no surprise that they decided to make some of that into a business and start charging for it.


To put it slightly crudely, he thought GitHub was in the business of making a pretty UI on top of git repos, not designing Serious Technology that nobody sees.


Ah, gotcha. Thanks, this makes a little more sense


A few years back I've seen 100-150 as number of services accessed to render an amazon.com page; as mentioned in this 2006 interview: http://queue.acm.org/detail.cfm?id=1142065


Funny - I had the reverse opinion (having used resque etc) :)


Ok! Now, make the website fast! I use GitHub for my documentation(wiki) and issue/bug tracker, and it takes ages to load those pages. I wanted to complain about it.. wrote a long feedback message, and it took 2mins to post it and then I saw the error GitHub page. How ironic.. The feedback system is too slow to post my feedback about the site being slow.

But, to be fair, it's not always like that. I guess that they had lots of trouble on our server recently and we've just been badlucky.


Where are you located (geographically)?

We have a couple developers more or less dedicated to performance at this point, and we're moderately satisfied with server response time for most areas of the site (the issue tracker is a notable exception -- we're working on that).

I ask about your location because we also have some latency issues as you move further away from the US east coast (Wash. DC) datacenter. We're working on that also.

Would you mind maybe submitting the output of `ping github.com' and any other pages you find intolerably slow at http://support.github.com/ ? If that's too slow just dump stuff here I guess.


In London, and just about any page loads much slower than I'd want/expect. Not like there's a better alternative out there, but 2-4 second page loads can be a bit frustrating at times.


Can you dump some ping times? 2-4s is waaaay outside of what I'd expect the average response time to be.


Ping times seem to be alright — about 120 ms on average. The 2-4s I was referring to are the resource load times I see for the main HTML in the Safari dev pane.


Nitpick: It's "GitHub", not "Github". Also note the date: 20 Oct 2009.


Added the date to the title. Sorry about that.


What does he mean by "bare metal"? I've heard of attempts by Oracle to run directly on hardware, without an OS; does the same exist with mySQL and nginx? Or is he simply meaning not-a-VM?


"Bare metal" as in "not-a-VM", yeah.


He prolly means not a VM.


nat-a-VM. As far as I know, GitHub was hosted on EngineYard VMs before they moved to rackspace.


This is an older post, but it's a high quality one to read.


Page did not respond in a timely fashion. Check our status site for alerts.

Either you found a page that took too long to render or we're getting more requests right now than we can handle.

You can try refreshing the page, the problem may be temporary. Learn how to deal with GitHub outages and other access problems.


What surprised me was that they were using 15k rpm drives over SSD's for database lookups. Any idea why?


Asked them about this recently and the answer was that the io wasn't a problem. Better bang for their buck elsewhere.


Well, it’s 2009.


Better $/gb right now? Maybe they just don't require the performance?


SSD's can't handle the amount of writes github tosses at them.


Can someone explain how you do mysql replication via DRBD?


(2009)


"...The Aristocrats!"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: