python http/udp bittorrent tracker scrape library
Asked Answered
T

2

9

I have a list of torrent info_hashes. For each info_hash, I have a list of trackers that correspond with that info_hash.

What I would like to do is scrape each tracker in the list to get the seeder/leecher/completed count. However, i'd rather not attempt to write this myself as i'm sure this code has been implemented elsewhere

Does anyone know of a python library that can scrape http:// and udp:// trackers?

I have been using libtorrent for other parts of this project, however it can only scrape a tracker from a valid torrent_handle (and I dont want to have to add these info_hashes to a libtorrent session in order to scrape the tracker because it will start downloading the files which I dont want)

Tackling answered 10/3, 2013 at 10:12 Comment(0)
T
12

I didnt want to use libtorrent also because it is quite inefficient - I want to be able to query a tracker for multiple info_hashes instead of one at a time.

I ended up writing my own python HTTP/UDP tracker scraping code, see here: https://github.com/erindru/m2t/blob/master/m2t/scraper.py (improvements most welcome!)

Tackling answered 11/3, 2013 at 3:56 Comment(5)
Can this get you the peer list/ seeder list of IP addresses?Ponderable
Nope it currently doesn't care about that, but could be extended to do soTackling
OK Thanks. One more question, I see the http expects a dictionary (bencoded) and so it gets the data. Yet the udp just offsets the buffer, how did you know the order of bytes and what they represent, so If I need the IPs of peers at what offset is that? Is there any documentation?Ponderable
UDP tracker protocol is not the same as HTTP, see xbtt.sourceforge.net/udp_tracker_protocol.htmlTackling
Thanks I was looking at this earlier, it does not have a peer_list. Is it possible to extend your implementation to get the peer list for both http and udp. Otherwise, how do the torrent clients do that?Ponderable
V
1

This is not directly an answer to your question, but a suggestion of how you could use libtorrent.

If you add the info-hash in a paused, non-auto-managed state (controlled by the flags in add_torrent_params). In that case libtorrent won't start downloading it.

Keep in mind that libtorrent does not (yet) support scraping the DHT.

Viyella answered 10/3, 2013 at 23:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.