I'm trying to learn about the Distributed Hash Table (DHT) paradigm, as it fits into a P2P or fully distributed computing architecture. From a theoretical standpoint, once a cluster is established, it makes some deal of sense how it manages to swarm data and distribute work.
The most interesting part to me is that the architecture never requires some kind of centralized controller or coordinator (no single point of failure.) However, I'm still struggling to understand the practical execution of the concept, particularly how a cluster formed. If it's a fully distributed system, how does a node know how to 'join' the already established cluster?
In a simplistic example:
- Say I'm creating a P2P application based on the DHT model
- The application is distributed across the Internet (a.k.a. not in the same network), and any public client may connect to the cluster
- A client connected to the cluster can see some (but not necessarily all) of the other clients in the cluster
- A client who isn't connected doesn't have any addresses or names of clients in the cluster.
So how would a new client 'connect' if there isn't any centralized server to act as a beacon, or serve the means of introducing the new client to the cluster?