How do BitTorrent magnet links work?
Asked Answered
C

6

205

For the first time I used a magnet link. Curious about how it works, I looked up the specs and didn't find any answers. The wiki says xt means "exact topic" and is followed by the format (btih in this case) with a SHA1 hash. I saw base32 mentioned, knowing it's 5 bits per character and 32 characters, I found it holds exactly 160bits, which is exactly the size of the SHA1.

There's no room for an IP address or anything, it's just a SHA1. So how does the BitTorrent client find the actual file? I turned on URL Snooper to see if it visits a page (using TCP) or does a lookup or the like, but nothing happened. I have no idea how the client finds peers. How does this work?

Also, what is the hash of? Is it a hash of an array of all the file hashes together? Maybe it's a hash of the actual torrent file required (stripping certain information)?


In a VM, I tried a magnet link with uTorrent (which was freshly installed) and it managed to find peers. Where did the first peer come from? It was fresh and there were no other torrents.

Changeless answered 2/10, 2010 at 5:27 Comment(2)
Is this even relevant to programming?Subacid
Related: How PEX protocol (Magnetic links) finds it first IP?Absorbent
A
203

A BitTorrent magnet link identifies a torrent using1 a SHA-1 or truncated SHA-256 hash value known as the "infohash". This is the same value that peers (clients) use to identify torrents when communicating with trackers or other peers. A traditional .torrent file contains a data structure with two top-level keys: announce, identifying the tracker(s) to use for the download, and info, containing the filenames and hashes for the torrent. The "infohash" is the hash of the encoded info data.

Some magnet links include trackers or web seeds, but they often don't. Your client may know nothing about the torrent except for its infohash. The first thing it needs to is find other peers who are downloading the torrent. It does this using a separate peer-to-peer network2 operating a "distributed hash table" (DHT). A DHT is a big distributed index which maps torrents (identified by infohashes) to lists of peers (identified by IP address and ports) who are participating in a swarm for that torrent (uploading/downloading data or metadata).

The first time a client joins the DHT network it generates a random 160-bit ID from the same space as infohashes. It then bootstraps its connection to the DHT network using either hard-coded addresses of clients controlled by the client developer, or DHT-supporting clients previously encountered in a torrent swarm. When it wants to participate in a swarm for a given torrent, it searches the DHT network for several other clients whose IDs are as close3 as possible to the infohash. It notifies these clients that it would like to participate in the swarm, and asks them for the connection information of any peers they already know of who are participating in the swarm.

When peers are uploading/downloading a particular torrent, they try to tell each other about all of the other peers they know of that are participating in the same torrent swarm. This lets peers know of each other quickly, without subjecting a tracker or DHT to constant requests. Once you've learned of a few peers from the DHT, your client will be able to ask those peers for the connection information of yet more peers in the torrent swarm, until you have all of the peers you need.

Finally, we can ask these peers for the torrent's info metadata, containing the filenames and hash list. Once we've downloaded this information and verified that it's correct using the known infohash, we're in practically the same position as a client that started with a regular .torrent file and got a list of peers from the included tracker.

The download may begin.

1 The infohash is typically hex-encoded, but some old clients used base 32 instead. v1 (urn:btih:) uses the SHA-1 digest directly, while v2 (urn:btmh:) adds a multihash prefix to identify the hash algorithm and digest length.
2 There are two primary DHT networks: the simpler "mainline" DHT, and a more complicated protocol used by Azureus.
3 The distance is measured by XOR.

Further Reading

Absorbent answered 7/3, 2014 at 2:59 Comment(5)
Is the bootstrap node, e.g., dht.transmission.com, simply a tracker? The way I understand it is that it needs to keep track of the list of peers per info hash - which is exactly what a tracker does.Assort
@Kate Not exactly. A typical DHT node stores peer lists for some torrents that are "near" it in the DHT network "space". A tracker instead tries to store peer lists for every torrent it knows of. Moreover, bootstrap DHT nodes specifically don't store peer lists for any torrents. Instead, they only distribute lists of other DHT nodes, to help you connect to the overall network. You can then find a typical DHT node with the peer list you're interested in.Absorbent
"Some magnet links include trackers or web seeds" - I'm a bit confused. Magnet is being use for downloading the torrent file as you describe. From the Magnet URI spec I see "acceptable source" and "tracker" as information which can be encoded in the URI. Now the tracker is obviously Bittorrent specific and will most likely be used in addition to the trackers listed in the torrent file. Is the "acceptable source" meant to be used to download the torrent file or (one of) the actual files to be downloaded through the Torrent file?Charlenacharlene
@FrederickNord In supporting torrent clients, the ws= parameter points to a BEP-19 web seed URL of the actual data, and the xs= parameter points to a URL with the .torrent file itself. I think this is a bit inconsistent with other uses of the magnet: scheme, but that's how it is. I forget if any clients use as= for anything... maybe just as a fallback for xs=, but not widely-supported, IIRC.Absorbent
Is there a way to implement such a system with zero knowledge proof?Gastrovascular
P
57

Peer discovery and resource discovery (files in your case) are two different things.

I am more familiar with JXTA but all peer to peer networks work on the same basic principles.

The first thing that needs to happen is peer discovery.

Peer Discovery

Most p2p networks are "seeded" networks: when first starting a peer will connect to a well-known (hard-coded) address to retrieve a list of running peers. It can be direct seeding like connecting to dht.transmissionbt.com as mentioned in another post or indirect seeding as usually done with JXTA where the peer connects to an address that only delivers a plain text list of other peers network addresses.

Once connection is established with the first (few) peer(s), the connecting peer performs a discovery of other peers (by sending requests out) and maintains a table of them. Since the number of other peers can be huge, the connecting peer only maintains part of a Distributed Hash Table (DHT) of the peers. The algorithm to determine which part of the table the connecting peer should maintain varies depending on Network. BitTorrent uses Kademlia with 160 bit identifiers/keys.

Resource Discovery

Once a few peers have been discovered by the connecting peer, the latter sends a few requests out for discovery of resources to them. Magnet links identifies those resources and are built in such a way that they are a "signature" for a resource and guarantee that they uniquely identify the requested content among all the peers. The connecting peer will then send a discovery request for the magnet link/resource to peers around it. The DHT is built in such a way that it helps determine which peers should be asked first for the resource (read on Kademlia in Wikipedia for more). If the requested peer does not hold the requested resource it will usually "pass on" the query to additional peers fetched from its own DHT.

The number of "hops" the query can be passed on is usually limited; 4 is an usual number with JXTA type networks.

When a peer holds the resource, it replies with its full details. The connecting peer can then connect to the peer holding the resource (directly or via a relay - I won't go into details here) and start fetching it.

Resources/Services in P2P networks are not directly attached to network addresses: they are distributed and that is the beauty of these highly scalable networks.

Problematic answered 13/3, 2014 at 8:9 Comment(1)
This I think is the most succinct answer without a lot of technical jargon. Thanks.Deposal
E
33

I was curious by the same question myself. Reading the code for transmission, I found the following in libtrnasmission/tr-dht.c:

3248:     bootstrap_from_name( "dht.transmissionbt.com", 6881,
                               bootstrap_af(session) );

It tries that 6 times, waiting 40(!) seconds between tries. I guess you can test it by deleting the config files (~/.config/transmission on unix), and blocking all communication to dht.transmissionbt.com, and see what happens (wait 240 seconds at least).

So it appears the client has a bootstrap node built in to start with. Of course, once it has gotten into the network, it doesn't need that bootstrap node anymore.

Ethiop answered 15/7, 2011 at 19:25 Comment(0)
C
12

I finally found specification. For the first time google didnt help. (wiki linked to bittorrent.com which is the main site. I Clicked the developers link, notice the bittorrent.org tab on the right then it was easy from there. Its hard finding links when you have no idea what they are labeled and many clicks away).

It seems like all torrents have a network of peers. You find peers from trackers and you keep them between sessions. The network allows you to find peers and other things. I havent read how its used with magnet links but it seems like it is undefined how a fresh client find peers. Perhaps some is baked in, or they use their home server or known trackers embeded into the client to get the first peer in the network.

Changeless answered 2/10, 2010 at 8:50 Comment(1)
Ah, I guess I was right about it going to DHT to find clients. "If no tracker is specified, the client SHOULD use the DHT (BEP 0005 [3]) to acquire peers."Groschen
G
11

When I started answering your question, I didn't realize you were asking how the magnet scheme works. Just thought you wanted to know how the parts relevant to the bittorrent protocol were generated.


The hash listed in the magnet uri is the torrent's info hash encoded in base32. The info hash is the sha1 hash of the bencoded info block of the torrent.

This python code demonstrates how it can be calculated.

I wrote a (very naive) C# implementation to test this out since I didn't have a bencoder on hand and it matches what is expected from the client.

static string CalculateInfoHash(string path)
{
    // assumes info block is last entry in dictionary
    var infokey = "e4:info";
    var offset = File.ReadAllText(path).IndexOf(infokey) + infokey.Length;
    byte[] fileHash = File.ReadAllBytes(path).Skip(offset).ToArray();
    byte[] bytes;
    using (SHA1 sha1 = SHA1.Create())
        bytes = sha1.ComputeHash(fileHash, 0, fileHash.Length - 1); // need to remove last 'e' to compensate for bencoding
    return String.Join("", bytes.Select(b => b.ToString("X2")));
}

As I understand it, this hash does not include any information on how to locate the tracker, the client needs to find this out through other means (the announce url provided). This is just what distinguishes one torrent from another on the tracker.

Everything related to the bittorrent protocol still revolves around the tracker. It is still the primary means of communication among the swarm. The magnet uri scheme was not designed specifically for use by bittorrent. It's used by any P2P protocols as an alternative form of communicating. Bittorrent clients adapted to accept magnet links as another way to identify torrents that way you don't need to download .torrent files anymore. The magnet uri still needs to specify the tracker in order to locate it so the client may participate. It can contain information about other protocols but is irrelevant to the bittorrent protocol. The bittorrent protocol ultimately will not work without the trackers.

Groschen answered 2/10, 2010 at 6:47 Comment(11)
This doesnt help. But are you saying it hashes the entire torrent file skipping the infokey block? My question was about how it finds the peers.Changeless
That's all coordinated by the tracker. The tracker knows of all the torrents it contains and clients seeding/leeching it. When someone connects with a magnet link, the info hash is used to determine what torrent the user is trying to find and sends a list of peers back.Groschen
@acidzombie24 You're probably thinking about distributed trackers which uses DHT to locate peers. This has nothing to do with magnet links. (en.wikipedia.org/wiki/…)Thimbleweed
I don't know how DHT works technically but can only speculate. All clients keep track of the peers they've connected to. When a new client enters the swarm, it gets all the information it needs from the other peers it initially connects to. It would still need a tracker to find the swarm initially. But once it connects to the swarm, it remembers the peers it connected to the last time and is able to make connections directly with them the next time it wants to continue. After that, it no longer needs the tracker to find the peers.Groschen
@Jeff M: But what 'sends' a list of peers back. A link is just a link theres no tracker associated with it. I was trying to figure out WHAT sends back peers.Changeless
@Alexander Sagen: My question is about how do magnet links work. All i mentioned was with it has a SHA1 @Jeff M: Are you speculating that when someone uses a magnet link the torrent program connects to random peers it knows of? ok but i still would like an answer (from someone) about how it worksChangeless
I moved my comment into my answer. I didn't realize you were asking about how the whole magnet uri scheme works with bittorrent. Hopefully it should be clearer now. In summary, magnet links does not replace the role of the tracker, but replaces the role of the .torrent files.Groschen
+1. Also The magnet link in question does not specify tr(acker). Only th sha1 which left me confused. Especially when i am using a fresh install with no torrents running (and not connected to any peers) and have the magnet link find peers. Its magic, i have no idea how it works. There must be some home server it can ask for peers. But does that mean i send out queries to peers looking for a hash and the client passes the message on to many peers until one answers my call?Changeless
I'm not sure how to answer that. All magnet uris I've seen always specify the tracker. It might be your client trying a list of public trackers that it knows about and one happens to have it. What trackers does the associated torrent list as being used? How is it displayed? Is there any relation between the tracker it connects to and the source of the magnet link? Maybe it's a torrent that uses DHT? Does the same work for a private torrent? Again, I don't know how DHT works exactly. I'll see if I can find any more information.Groschen
@Jeff M: I did some research and i finally found something. I'm posting the answer now.Changeless
Tracker is no longer essential; Tracker has become but one of three possible ways to find peers, the others being DHT and PEX.Havenot
C
3

the list of peers are probably populated from the torrent that upgrades the client (e.g. there's a torrent for utorrent that upgrades it). as long as everyone's using the same client, it should be good because you have no choice but to share the upgrade.

Calysta answered 27/10, 2010 at 15:36 Comment(1)
Thats a very logical place to search for the hash and other peers. +1Changeless

© 2022 - 2024 — McMap. All rights reserved.