Autodiscovery in P2P Applications

Asked 11/4, 2013 at 19:6 Answered 11/4, 2013 at 21:15

Solved network-programming language-agnostic p2p autodiscovery

I want to create a P2P application on the internet. What is the best or if none exist a good enough way to do auto-discovery of other nodes in a decentralized network?

Grenoble answered 11/4, 2013 at 19:6 Comment(3)

Btw, as interesting as it is, you'll find that this question will probably get closed as 'too discursive'. Browse the questions in the Related sidebar (-->) and download some open-source stuff to see how they do it. Java has JXTA, but I hear that has been abandoned for some time, and is highly complex - might still be worth a look though, if you don't mind delving into something complicated. – Neela 11/4, 2013 at 19:20

JXTA looks interesting – Grenoble 11/4, 2013 at 19:22

How about security what if nodes tell me about evil nodes or non existent ones? – Grenoble 11/4, 2013 at 19:24

Grothoff and GauthierDickey from the GNUnet project (an anonymous censorship-resistant file-sharing network) researched on the question of bootstrapping a p2p network without any central hostlist.

They found that for the Gnutella (Limewire) network a random ip search needed on average 2500 connection attempts to find a peer.

In the paper they proposed a method which reduced the required connection attempts to 817 for Gnutella and 51 for the E2DK network.

Achieved was this through creating a statistical profile of p2p users for every DNS organization, this small (around 100kb) discovery database has to be created in advance and shipped with the p2p client.

Reitz answered 11/4, 2013 at 21:15 Comment(0)

This is the holy grail of P2P. There isn't a magic solution really - there's no way a node can discover other nodes without a good known point to act as a reference (well, you can do so on a LAN by using broadcasting, but not on the internet). P2P filesharing tends to work by having known websites distributing 'start points' for discovery, and then further discovery (I would expect) can come from asking nodes what other nodes they know about.

A good place to start on research would be Distributed Hash Tables.

As for security, that topic will be in the literature somewhere, I should think - again I would recommend Wikipedia. Non-existent ones are trivially dealt with: if you can't contact an IP/port, don't keep it on your list, and if a node regularly provides non-existent pointers, consider de-prioritising it or removing it from your list entirely.

For evil nodes, it depends on your use case, but let's say you are doing file sharing. If you request a section of a file, check with several nodes what the file section's hash should be, and then request by hash. If the evil node gives you a chunk that has a different hash, then you can again de-prioritise or forget that node.

Distributed processing systems work a little differently: they tend to ask several unrelated nodes to perform the same work, and then they use a voting system (probably using hashing again) to determine whether evilness is at hand. If a node provides consistently bad results, the administrator is contacted or the IP is removed from the known nodes list.

Neela answered 11/4, 2013 at 19:31 Comment(2)

Are there many libraries to implement this sort of discovery? – Grenoble 11/4, 2013 at 19:37

I don't really know, to be honest. There are a good number of libraries referenced in the WP article I cited, so they would be an excellent place to start. – Neela 11/4, 2013 at 19:43

ok, for two peers to find each other they both have to know a common, lets say, mediator to exchange IPs once. You can use anything for this kind of the first handshake whilst being able to WRITE and READ from that "channel". i.e: DNS (your well known domains), e-Mail, IRC, Twitter, Facebook, dropbox, etc.

Formless answered 11/4, 2013 at 19:41 Comment(2)

I was thinking of just brute forcing the IPv4 address space. It's actually feasible. Then once a node discovers another they exchange who else they know. The discovered nodes are stored in persistent storage for the future. I guess with IPv6 it becomes a problem. – Grenoble 11/4, 2013 at 20:13

I should think that brute-forcing any address space would start to harm the network it lives on in proportion to its success rate: a highly popular P2P client would create a great deal of wasted traffic. – Neela 11/4, 2013 at 22:44

Recommended topics

Hot tags