How does a node join a Distributed Hash Table (DHT) cluster?

Asked 23/10, 2012 at 15:59 Answered 9/5 at 14:15

architecture theory p2p distributed-computing

I'm trying to learn about the Distributed Hash Table (DHT) paradigm, as it fits into a P2P or fully distributed computing architecture. From a theoretical standpoint, once a cluster is established, it makes some deal of sense how it manages to swarm data and distribute work.

The most interesting part to me is that the architecture never requires some kind of centralized controller or coordinator (no single point of failure.) However, I'm still struggling to understand the practical execution of the concept, particularly how a cluster formed. If it's a fully distributed system, how does a node know how to 'join' the already established cluster?

In a simplistic example:

Say I'm creating a P2P application based on the DHT model
The application is distributed across the Internet (a.k.a. not in the same network), and any public client may connect to the cluster
A client connected to the cluster can see some (but not necessarily all) of the other clients in the cluster
A client who isn't connected doesn't have any addresses or names of clients in the cluster.

So how would a new client 'connect' if there isn't any centralized server to act as a beacon, or serve the means of introducing the new client to the cluster?

Sturm answered 23/10, 2012 at 15:59 Comment(0)

This is a problem I covered as part of my dissertation, and I never found a solution I was happy with. The problem is that you need some kind of information about just one of the other peers before joining the network, getting that first address is the hard bit.

A Few ideas I came up with:

Encourage peers to publish their address, that way you get publicly accessible lists of known IPs building up
Run several "well known" bootstrap peers
Brute Force the address space

The last option is the only truly decentralised approach. A combination of the three is likely to be best.

Once you're bootstrapped into a network reestablishing connection after disconnecting is not hard, simply save the addresses of a couple of thousand nodes in the network who have already been long lived, at least one of them will still be online next time.

Battue answered 25/10, 2012 at 4:7 Comment(2)

Hmmmm, I was afraid this might be the case. The global-scan (Brute force) methodology is interesting, but doesn't seem terribly practical unless you have a targeted pool to scan. Looks like a central list might be the best practical solution for now. (Thanks!) – Sturm 25/10, 2012 at 14:31

The original Skype implementation had an Authentication server, which - upon users authentication - answered with a list of super peers. So you can have a centralized registry with some of the addresses, since serving those addresses wouldn’t have any load. (I’m saying the original Skype implementation, since I don’t know how much Microsoft modified it in the meanwhile) – Dais 24/7, 2014 at 5:52

From what I can think of you can create a proxy server for the network of DHT nodes and have shadow servers for that proxy server to enable reliability.

Any new node trying to join the DHT network , talks to proxy and the proxy lets it in the DHT network where it is entirely P2P.

This way only proxy server has to be public and all other DHT nodes can have their IP's private.

This might be a hinderance to you as the application is distributed across internet, but you can always talk via proxy.

Anatomize answered 29/8, 2019 at 2:41 Comment(0)

It is a fact that the party that maintains the main trunk of the Distributed Hashtable (DHT) source is the GOD of that DHT instance and therefore the main single point of failure. If the DHT bootstraps from anonymizing network (Tor, GNUnet, Chimera, etc.) nodes (hereafter: anonnodes) that have addresses that have been hardcoded into the DHT source, then the chances of that DHT being hijacked by some "No-Such-Agency" should not increase. The classical wget works with Tor network addresses, if used in conjunction with torsocks. An example:

torsocks wget http://xmh57jrzrnw6insl.onion/

To mitigate the risk that the DHT is hijacked by hijacking some of its bootstrapping nodes, an automated voting process can be used. The idea is that a booting node gets a list of addresses from the hardcoded anonnodes and bootstraps only from nodes that are present at majority of the lists. If some anonnodes are more "trustable" than others, then in stead of using a system, where each anonnode has one vote, the "more trustable" anonnodes can have more than one vote.

In the old days, when computers were not reliable, the voting system was used in a form, where in stead of different voting computers, the same computer was run multiple times over the same set of assembler commands. Computation results were compared. The most frequent answer was considered the right one. May be in the case of the distributed hashtable, the same methodology might be used: ask the different bootstrapping nodes the same question, list of known DHT nodes, multiple times through distinct Tor sessions over some "longer" period of time.

As of 2014_07_xx I have not tested the ideas out yet, but I hope that my current comment helps.

Bullard answered 23/7, 2014 at 21:8 Comment(0)

Details on how peers join in the DHT overlay might be limited. However, you only get details on how peers are discovered and how information is shared between them. I simplified this by looking at specific implementations. One implementation you can dive into directly is Kademlia Protocol (Go Implementation).

Function on creating DHT IDs, creating bootstrap node, deciding a peer to run in client or server mode, etc, are captured.

Wade answered 9/5 at 14:15 Comment(0)

Recommended topics

Hot tags