How to query a playlist properly and safely
Asked Answered
B

2

5

I want to extract the information from a YouTube playlist but querying the whole playlist at once seems to be quite unreliable even if I use the ignoreerrors flag, because sometimes it gets stuck, especially if the internet connection is a bit shaky. Should I just download the playlist one by one by setting the playliststart and playlistend values and processing it in a loop?

My current code looks like this:

simulate_ydl_opts = {
    'format': "251",
    'playlistend': 50,
    'ignoreerrors': True,
    'simulate':True
}
youtube_dl_object = youtube_dl.YoutubeDL(simulate_ydl_opts)
test_info=youtube_dl_object.extract_info("https://www.youtube.com/user/Rasenfunk")
Bonkers answered 1/3, 2019 at 8:10 Comment(6)
I am not sure where is that you seem to find it getting stuck? Even after playlistend, you are seeing this error now?Hewett
Welll, it's hard to reproduce this "getting stuck" part, especially if your connection is not shaky. Just assume it as a fact that it gets stuck from time to time and I would like to have an alternative solution.Bonkers
What kind of information do you want to extract? Do you want to download the video from the playlist or you video metadata such as video name, description, video length...?Kumamoto
The kind of info youtube_dl gives with extract_info. For an example file click here.Bonkers
Once you are not interested in actually download the videos, have you considered use youtube v3 API to extract exactly what you need? developers.google.com/youtube/v3/getting-startedKumamoto
I have thought about another thing: getting active within the youtube_dl-project! :) Because it seems that the python lib would need lots of small tweaks to be really nice. But somehow your idea sounds quite plausible as well...Bonkers
H
5

IMHO, I think you can use the ratelimit: 50000 as a configuration, in case it depends on your download speed, coupled with playliststart, playlistend, retries and continuedl.

As per their github common module, ratelimit is

Download speed limit, in bytes/sec.

If you already know the speed at which this download needs to be capped at and then try the same download. It's basically going to help you to throttle if your bandwidth can't handle it.

In case you are not sure of the max limit, I suggest using something like speedtest-cli, wherein you can identify the download speed and apply that for throttling the speed, I have just hit this sample code to try something out:

import speedtest
import math
import youtube_dl

s = speedtest.Speedtest()
s.get_best_server()
s.download()

# This get's the download in bits per second
print(s.results.dict()['download'])

simulate_ydl_opts = {
    'format': "251",
    'playlistend': 50,
    'ignoreerrors': True,
    'verbose': True, # Might give you a clue on the Download speed as it prints
    'ratelimit': s.results.dict()['download'],
    'retries': 10, #retry 10 times if there is a failure
    'continuedl': True #Try to continue downloads if possible.
}


youtube_dl_object = youtube_dl.YoutubeDL(simulate_ydl_opts)
test_info=youtube_dl_object.extract_info("https://www.youtube.com/user/Rasenfunk")
Hewett answered 8/9, 2019 at 3:20 Comment(8)
Does this answer have any connection to my question? I never had any problems with a rate limit...Bonkers
@csabinho, can you reiterate once again, what do you mean by: querying the whole playlist at once seems to be quite unreliable? Does it fail because of download getting stuck?Hewett
The command gets stuck. For various reasons. The best solution would be to just extract the links, as youtube-dl does ist in the beginning, but there's no way to stop the whole thing there.Bonkers
Can you please use verbrose as mentioned? It will also print the exact thing that is happening when you say it's stuck. That might give more clue when you say it's stuck.Hewett
I can't even reproduce it at the moment. I just want an alternative solution.Bonkers
99.9% the problem was my internet connection, not a rat limit or something like a rate limit. But it's really awkward if you downloaded like 400 video descriptions and it gets stuck and doesn't process anything.Bonkers
If its internet connection, then the ratelimit which they refer to, is a way to check download speed and reduce the downloading time. As well, If you are asking for an alternative solution, you download 1 by 1 isn't going to help to download 400 videos. I would still suggest ratelimit + playliststart and playlistend would somewhere help you to get it smoother. You can also refer retries and continuedl which I have updated the answer.Hewett
Once again: there's no real way I can simulate a shaky internet connection with packet loss etc.Bonkers
L
1

Yes, you should probably try doing it one at a time. I recommend you also make your program keep track of what URL it got last so it can continue from where it last left off. A threading based timeout-and-restart system (For each video, make a new thread and include a timeout) would help the process go more seamlessly.

Lewendal answered 6/9, 2019 at 18:5 Comment(2)
That's not really what I was asking for as it already doesn't download the videos because of my simulate: True in the options of the object. I just want to collect the links without getting anything else.Bonkers
@Bonkers I edited the answer. Hopefully it's at least a bit helpfulLewendal

© 2022 - 2024 — McMap. All rights reserved.