Hacker News new | past | comments | ask | show | jobs | submit login

seiji pretty much nails it. Hadoop seems to have come out of a weird culture. It is a distributed system with a single point of failure (name node) because its designers insisted on avoiding Paxos (distributed systems are too hard so we'll just make a broken-by-design protocol instead). Another example is that a lot of the database code built on top of Hadoop is designed around one Java hashmap per row which really limits performance.

There are all sorts of oddities and you can mostly work around them but it is...exhausting, and I spend a lot of time thinking "surely there must be a better way".




> surely there must be a better way

http://www.spark-project.org/


Wait, so Zookeeper (= distributed consensus thingie that I think implements the Paxos algorithm) is a Hadoop project but not actually used in Hadoop mapreduce?


That's correct. I believe they are using it in some new "high availability" stuff coming down the road




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: