A DHT is a fairly easy to understand technology that can give you a lot of insight into distributed systems. In school we had to build a crawl/index cluster of 8 machines and we used Chord to split the load. I had no experience with multi-machine computing but it made sense immediately at the time to split the crawling load by creating a DHT over the URL keyspace. Of course this doesn't balance actual load but it's a nice approximation to get you off the ground.
For someone who has never tried to do any Distributed Computing, a DHT can solve many simple scaling problems in a way that is fairly easy to reason about on paper.
I believe poster was trying to say that load balancing is probabilistic rather than deterministic: Load is levelled across the DHT only if the hashing function maps keys evenly over the nodes (assuming each key has an equal amount of work associated with it).
I think the main concern is that in many systems each key doesn't have an equal amount of work associated with it (this sort of thing is usually referred to as a "hotspot").
An example: suppose you have some distributed system storing article metadata and all of the sudden one of your articles becomes very widely shared. The machine that the popular key hashes to gets slammed. Perhaps we'd want to adjust it so that that particular machine is just dedicated to that one article, or some other way to distribute that one article across multiple machines. But we're just using a hash function, so without doing something fancier, we can run into problems when the load suddenly becomes wildly uneven.
Solving for read hot-spots is not difficult if you're willing to accept a small read penalty:
Your typical Kademlia DHT has k-replicas of each piece of data, so you read near the target node (node closest to the target key) rather than directly from it. This way, nodes at different points in the network read from many different replicas.
Of course, this depends on your consistency requirements.
Yep this is what I meant. In my example the URL hostname was the key space so all of the wikipedia URLs would go to one node. That's probably means that node had more work to do than some other node that got relatively less "interesting" domains.
For someone who has never tried to do any Distributed Computing, a DHT can solve many simple scaling problems in a way that is fairly easy to reason about on paper.