Hacker News new | past | comments | ask | show | jobs | submit login

So assuming the data is one one machine (as I asked), why would an index not solve this problem? And why does Map Reduce solve it?



Indexes do not solve the locality problem (see Non-clustered indexes). Even for in-memory databases, it is non-trivial to minimize cache misses in irregular data structures like B-trees.

Now why MapReduce "might" be a better fit for a problem where data fits into one disk. Consider a program which is embarrassingly parallel. It just reads a tuple and writes a new tuple back to disk. The parallel IO provided by map/reduce can offer a significant benefit in this simple case as well.

Also NoSQL != parallel processing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: