Intro to Distributed Hash Tables

covi · on Nov 11, 2014

Might as well learn an important concept from the original authors (the article does link to this paper): Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [1]

[1] http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigco...

moondowner · on Nov 12, 2014

The Wikipedia article is pretty good for a short summary: http://en.wikipedia.org/wiki/Chord_(peer-to-peer)

shakeel_mohamed · on Nov 12, 2014

I've read this for my distributed systems class a couple weeks ago, it's good stuff

kchoudhu · on Nov 12, 2014

Robert Morris? THE Robert Morris?

PostOnce · on Nov 12, 2014

He's cofounder of YC https://news.ycombinator.com/user?id=rtm

Read a book that had a section on him (and pg) written before they founded Viaweb. http://www.amazon.com/CYBERPUNK-Outlaws-Hackers-Computer-Fro...

Also, there are two Robert Morris, rtm, and his dad who also has a wikipedia article http://en.wikipedia.org/wiki/Robert_Morris_%28cryptographer%...

nmb · on Nov 12, 2014

Not to mention THE Ion Stoica! (https://databricks.com/about-us)

habosa · on Nov 11, 2014

A DHT is a fairly easy to understand technology that can give you a lot of insight into distributed systems. In school we had to build a crawl/index cluster of 8 machines and we used Chord to split the load. I had no experience with multi-machine computing but it made sense immediately at the time to split the crawling load by creating a DHT over the URL keyspace. Of course this doesn't balance actual load but it's a nice approximation to get you off the ground.

For someone who has never tried to do any Distributed Computing, a DHT can solve many simple scaling problems in a way that is fairly easy to reason about on paper.

sysk · on Nov 12, 2014

> this doesn't balance actual load

As someone who doesn't have much experience with distributed computing, this part confused me a little. What do you mean by that?

reubenbond · on Nov 12, 2014

I believe poster was trying to say that load balancing is probabilistic rather than deterministic: Load is levelled across the DHT only if the hashing function maps keys evenly over the nodes (assuming each key has an equal amount of work associated with it).

drifkin · on Nov 12, 2014

I think the main concern is that in many systems each key doesn't have an equal amount of work associated with it (this sort of thing is usually referred to as a "hotspot").

An example: suppose you have some distributed system storing article metadata and all of the sudden one of your articles becomes very widely shared. The machine that the popular key hashes to gets slammed. Perhaps we'd want to adjust it so that that particular machine is just dedicated to that one article, or some other way to distribute that one article across multiple machines. But we're just using a hash function, so without doing something fancier, we can run into problems when the load suddenly becomes wildly uneven.

reubenbond · on Nov 12, 2014

Solving for read hot-spots is not difficult if you're willing to accept a small read penalty:

Your typical Kademlia DHT has k-replicas of each piece of data, so you read near the target node (node closest to the target key) rather than directly from it. This way, nodes at different points in the network read from many different replicas.

Of course, this depends on your consistency requirements.

habosa · on Nov 12, 2014

Yep this is what I meant. In my example the URL hostname was the key space so all of the wikipedia URLs would go to one node. That's probably means that node had more work to do than some other node that got relatively less "interesting" domains.

realcr · on Nov 12, 2014

Hi. OP here. The article is part of the Freedom Layer project, where we describe the research of creating a scalable and secure mesh network. (It is currently still an open problem, at least by what we know).

The first things we have decided to do is to document known stuff. Then we move on to documenting our research. We will try to make it as down to earth as possible, though later articles might get more complicated. Some new results are also going to be discussed.

adrusi · on Nov 12, 2014

Scalable secure mesh networks might be the cold fusion of computing: possibly impossible, but with great consequences if it can ever work. I'd be interested to follow your research, especially if it's as easy to follow as this paper. Is there somewhere where you aggregate your research?

realcr · on Nov 12, 2014

I agree with you :) The stuff that I know (Hopefully everything that is relevant) will show up at the Freedom Layer website project. It will probably take me a month or two to write everything.

I might add some kind of RSS or a mailing list to make it easier to follow.

newlog · on Nov 15, 2014

Looking forward to joining that mailing list :)

foobarqux · on Nov 12, 2014

If someone wanted to deploy a DHT today what protocol and/or library would they use?

rakoo · on Nov 12, 2014

Kademlia. Certainly the most used (it's the default for Bittorrent) and, from a quick reading of other protocols, also the simplest. Moreover it has the advantage of naturally not depending on other peers behaving correctly (some other rely on peers forwarding search and coming back eventually when they have a result)

There's a working implementation over here [0] in Go, and there will most certainly be something for other language. Search for "DHT Kademlia Mainline" (mainline is the name of the network used by bittorrent peers as described here: [1])

[0] https://github.com/nictuku/dht

[1] http://www.bittorrent.org/beps/bep_0005.html

AnIrishDuck · on Nov 12, 2014

While I'm inclined to agree that Kademlia is the frontrunner for DHTs, I'd like to comment on:

> it has the advantage of naturally not depending on other peers behaving correctly

This is, depending on the variety of pedantic hat you put on, not true. While Kademlia is resistant to some simple malfunctions, it can be attacked by an adversary. Or, to put it formally, Kademlia is not Byzantine Fault Tolerant [2].

It can be strengthened, but at the cost of greater complexity. See [3] for an example. Further, it's arguable that zero-trust systems where anyone is allowed to join can never be BFT because of Sybil attacks [4]. The paper in [3] elaborates on how to mitigate this variety of attack, but it's unlikely that Sybil attacks can ever be "solved".

1. "Attacking the kad network" http://www.p2p.tu-darmstadt.de/fileadmin/user_upload/Group_P...

2. http://en.wikipedia.org/wiki/Byzantine_fault_tolerance

3. "S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing" http://doc.tm.uka.de/SKademlia_2007.pdf

4. http://en.wikipedia.org/wiki/Sybil_attack

rakoo · on Nov 12, 2014

Of course, a global attacker can always muck with the system. I was talking about single independent peers behaving badly on their little scale, such as returning wrong or no information, which I believe is more likely to happen.

Also, the guys at Bittorrent are pushing for a limitation on how you can form your own ID based on your external IP [0], which should make it much more costly to perform a large-scale attack.

Thank you for being pedantic :)

[0] http://blog.libtorrent.org/2012/12/dht-security/

qznc · on Nov 11, 2014

A related question I have for some time: Can you hijack/join an existing DHT? For example, use the Bittorrent DHT (which is large and bootstrapped) to implement this "phone list"? Would that be parasitic or symbiotic?

sliken · on Nov 11, 2014

Joining is easy. Not sure exactly what you mean by hijack.

The mainline bittorrent DHT only holds values IP:Port, so that's just a few bytes. But it would be quite handy for instance for anyone that needs peers to discover each other. Not so handy for using it as an arbitrary key store or distributed filesystem.

I'd say if you are a full member of the DHT (you correctly handle other nodes queries ) that you would have a symbiotic relationship with it. If you don't maintain your own routing table and ignore other peers requests you'd have a parasitic relationship with the mainline DHT.

So you could be a malicious peer, but it wouldn't have a noticeable effect on the DHT unless you ran quite a few of them. Even then most clients just use the DHT as one of several peer discovery methods.

In fact for any given hash you post to the DHT you'll find peers claiming to be members of that torrent, even if said torrent doesn't exist.

magila · on Nov 12, 2014

Actually the bittorrent DHT has been extended to support storing arbitrary data[1]. This extension is supported by uTorrent and libtorrent-rasterbar which constitute a majority of the bittorrent DHT nodes.

[1] http://www.libtorrent.org/dht_store.html

steeve · on Nov 12, 2014

Up to 1000 bytes of signed data, actually. Pretty neat IMHO

owlish · on Nov 12, 2014

What is the accepted identifier for nodes? I did a rough implementation of Chord once using IP address as the identifier and ran into the issue where the node would see itself at a different IP than others. The obvious solution is just "don't use it through NAT" I'm guessing.

realcr · on Nov 12, 2014

It depends. If you own all the computers, you could just pick random identifiers (Of some large space). You could also choose the identifiers to be a public key, or a hash on a public key. Then when talking to a node claiming that he has a certain ID, you could ask him to prove it.

Even when using the public key as an ID idea, it is still possible to get IDs as close as one wants to a particular number. (It is a bit harder computationally, but still possible).

There are more secure things you can do. Shortly, you would prefer the network itself to choose the ID for a new node, and not let the new node choose it itself. It appears that even this will not give you a secure enough solution. I will write about it in future articles.

newlog · on Nov 15, 2014

hey!

First of all, awesome explanation, thanks!

What I didn't understand from your explanation (I don't know if it's explained or if I missed it) is how you identify which node has the value for the key K when the keys are not in the Bs domain. How do you know where to stop the search? What if you have gone to a further node?

I'd be very grateful if you can clarify that :)

unclesaamm · on Nov 12, 2014

I get the hunch that skip lists are intimately related to graph search algorithms. This is like A* search where the heuristic is to pick the highest ID that doesn't exceed the target.

JetSpiegel · on Nov 12, 2014

Can we get a RSS for that blog? I really want to follow it.

TheLoneWolfling · on Nov 11, 2014

This doesn't explain what is to me the hard part: namely coping with additions and removals of peers, especially unannounced removals of peers.

realcr · on Nov 12, 2014

I was going to write more articles explaining DHTs in more detail, but I can at least write here a short answer.

My general belief about dealing with nodes leaving the network is as follows: You just notice at some point that a node doesn't respond, and then you look for a better one. That is because a node leaving the network could be caused by a failure or a user that kills the program. A node leaving the network is an interesting time, because the node doesn't have any incentive to behave well.

If a node x wants to join the network, he will first contact a representitive y on the network. y will then search for x inside the DHT, and find the best location. y will then send x that location on the DHT, and x will just join there. You might be wondering how will the links be set up at this point.

At the Chord DHT there is an operation called stabilize that is run from time to time. This operation is where every node updates his finger table. It is mentioned in a more detailed fashion in the original article.

I didn't deal with those stuff in the article I wrote because I will deal with them in the future, talking about DHT security and stability.

EDIT: One thing that people forget from time to time is that DHTs have some hidden assumption that allows them to work. A DHT assumes that things don't change too fast. At least not too fast that individual nodes can not follow.

DHTs can deal with a few nodes joining and leaving, but if lots of nodes become offline in one moment, you might face a problem.

Thanks for reading.

sliken · on Nov 12, 2014

The mainline DHT (bittorrent's Kademlia) is quite resistant to "lots of nodes becoming offline in a moment". Often 5-10M peers are connected, so each peer tracks 180 peers or so. So unless a very large fraction of the DHT is unreachable then you are likely going to be fine.

Also even if by some freak chance all 180 of those peers (distributed across the globe) are gone, you'll get incoming queries from the DHT (assuming you were connected long enough to be in their routing table). Once that happens you can start adding peers again. This is the reason that the DHT is pretty resistant to churn. As long as a reasonable fraction of your 180 node routing table is still around in 10-15 minutes (it's often refreshed that often) you can add as many peers as you want.

So in practice the DHT is pretty robust compared to other protocols/services.

rco8786 · on Nov 12, 2014

From the wiki: "Since the successor (or predecessor) of a node may disappear from the network (because of failure or departure), each node records a whole segment of the circle adjacent to it, i.e. the r nodes preceding it and the r nodes following it. This list results in a high probability that a node is able to correctly locate its successor or predecessor, even if the network in question suffers from a high failure rate."

I'm not 100% sure what that means though. Is it that nodes store redundant data to take over the responsibilities of a next door neighbor node going down? If so, what is the data replication strategy. The "failures and replication" section is blank :(

I'm also somewhat confused why they are strict about going counterclockwise. If they know if the desired node is higher or lower in the ring, they could easily implement searching in both directions and reduce the average search distance. This maybe doesn't matter because of the exponential jumps in "known" node distances...but it seems that it would be a cheap efficiency gain on some level.

covi · on Nov 12, 2014

> Is it that nodes store redundant data to take over the responsibilities of a next door neighbor node going down?

In my understanding, this extra-successor scheme has little (or nothing) to do with data redundancy. Instead, it deals with what you can call meta-data redundancy: instead of making a node knowing how to contact one other node, let's make it store more information so that it knows how to contact more nodes (successors). This way, the probability of a node having all neighbors experiencing a failure can be brought low.

> I'm also somewhat confused why they are strict about going counterclockwise. If they know if the desired node is higher or lower in the ring, they could easily implement searching in both directions and reduce the average search distance. This maybe doesn't matter because of the exponential jumps in "known" node distances...but it seems that it would be a cheap efficiency gain on some level.

I can only speculate that this is because the DHT is usually presented as a logical overlay. So the hops there are logical hops, not necessarily representative of actual network distance.

daddykotex · on Nov 11, 2014

I'm not done reading yet, this is very well explained. Thanks a lot.

I found a little typo in the line : "if A and B are very var "

realcr · on Nov 12, 2014

I'll fix it. Thanks :)

pdarshan · on Nov 11, 2014

Good read. Its explained in lay man terms.