Hacker News new | past | comments | ask | show | jobs | submit login

The systems I work on are pretty sizable in scale, right now we are doing about 120 million calls a day on one of core clusters. I don't claim to be the end all authority on scaling, but I do have some observations based on what I've dealt with.

First off, that type of architecture is brittle to logic changes. If you have a completely static architecture that you don't need to change it may be fine, but deploying those changes to every machine in the cluster is problematic.

Second, not all components have the same underlying machine requirements. For instance, our nginx servers don't need much ram, but the HAProxy load balancers with nginx that terminate SSL need a lot of Ram and a good chunk of CPU. Hadoop works better with JBOD (Just a bunch of disks), whereas cassandra seems to work better with a raid 0 configuration. Certain layers like the Nginx through certain paths have real time requirements which means ultra low latency. Other things need to operate against massive data sets and compute answers in a few hours.

So, not every machine in your architecture can have all of the services required by every other part of your architecture. A lot of it depends on workload types and what the underlying requirements are for your system. There are many more reasons, but I'll leave it at those for now. Ultimately the post is right that a single machine can work very well, but it's also misguided in just dismissing the HN commenter. There are many other cases where distributed architectures are required. Guaranteeing robustness and performance in the face of service and machine failures is very difficult, and is essentially impossible on a single machine.

Which approach you apply depends, it depends on the unique situation and the unique constraints you have. Using a single model to solve all problems seems to be worse than using no model. Learn multiple models, learn how and when to apply each (yes - for those of you in the class, I'm taking the model thinking class :) ).




The kind of programmer I was ranting against is the kind who thinks redis LRU is faster than `new HashSplayTree<String>()`

Apart from missing the subtle bit about overriding eviction...


Yeah, I don't see a reason why the entire stack has to be on one machine. It simplifies configuration a little but there's a trade off in the granularity of resources allocation that comes with the tiered approach.

1st, web servers are are distributable by design. If the web app in question can not be distributed, then a reread of Fielding's dissertation is in order:

http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

So if the web tier is designed correctly and is simply a state-less, data transformation engine then what's left is making your data tier be distributable and elastic.

Even better we could put an elastic web service layer between the frontend web tier and the data storage tier to provide an even better separation of presentation and logic.

In this set-up I believe we have the best of both worlds, the elasticity of the confederated design because each tier is designed to be fault tolerant and elastic with the resource allocation granularity we get with the tiered clusters.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: