Is there a way to use youtube-dl in async
Asked Answered
S

1

8

I have an application where I use zmq with asyncio to communicate with the clients who have the ability to download a video with youtube-dl to the server. I tried adding await to youtube_dl's download function but it gave me an error since it was not a coroutine. My code right now is simply looking like this:

import asyncio
import youtube_dl


async def networking_stuff():
    download = True
    while True:
        if download:
            print("Received a request for download")
            await youtube_to_mp3("https://www.youtube.com/watch?v=u9WgtlgGAgs")
            download = False
        print("Working..")
        await asyncio.sleep(2)


async def youtube_to_mp3(url):
    ydl_opts = {
        'format': 'bestaudio/best',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }]
    }

    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])


loop = asyncio.get_event_loop()
loop.create_task(networking_stuff())
loop.run_forever()

which gives the following output:

Received a request for download
[youtube] u9WgtlgGAgs: Downloading webpage
[youtube] u9WgtlgGAgs: Downloading video info webpage
[youtube] u9WgtlgGAgs: Extracting video information
[youtube] u9WgtlgGAgs: Downloading MPD manifest
[download] Destination: The Cardigans - My Favourite Game “Stone Version”-u9WgtlgGAgs.webm
[download] 100% of 4.20MiB in 00:03
[ffmpeg] Destination: The Cardigans - My Favourite Game “Stone Version”-u9WgtlgGAgs.mp3
Deleting original file The Cardigans - My Favourite Game “Stone Version”-u9WgtlgGAgs.webm (pass -k to keep)
Working..
Working..
....
Working..
Working..

whereas I would expect the Working.. message to be printed in between youtube-dl's messages as well. Am I missing something here or is this impossible with async/await? Is ffmpeg blocking? If so, can I run the download in async without converting to mp3 or is using threads the only way?

Sauter answered 21/11, 2017 at 17:13 Comment(0)
Z
3

You are correct that you cannot simply make any function asynchronous.

Your question assumes that youtube-dl requires ffmpeg to work. It's not entirely true, it can download individual streams by its own means, AFAIK ffmpeg is used only for muxing these streams (video + audio + maybe subtitles) to one file.

In case you use ffmpeg, there's not much to win from performance point of view because if it's used via subprocess (most likely case), then there's at least 1 full-blown process being spawned for doing the work. Interaction with subprocesses can also be done in non-blocking way — see https://docs.python.org/3/library/asyncio-subprocess.html, but anyway if your code spawns a process for each task, it will not scale well in either case.

Otherwise, it might be possible (and make some sense) to fork youtube-dl and make changes so that all network operations are based on asyncio. This is probably quite a lot of refactoring, but it should be doable.

regarding your code:
First, the function youtube_to_mp3 is not asynchronous at all, because there are no code paths which could execute an await … expression. The meaning of the code would not change at all if you remove the async word from the function definition and await from await youtube_to_mp3("….

Second, even if it was asynchronous, you are not using it in a way which would allow "parallel" execution. the await keyword really means that: the control flow in this task will continue only after the awaited coroutine finishes. if you need to run multiple coroutines in "parallel", you will need to not directly await them one by one. There are several ways to run coroutines in parallel, for example you may use https://docs.python.org/3/library/asyncio-task.html#asyncio.gather and await the resulting "combined" coroutine, if all the tasks are known at the same moment (but it doesn't look like your case), or use fire-and-forget approach (loop.create_task).

Zaporozhye answered 23/11, 2017 at 20:53 Comment(2)
Because it looks like I need to refactor many long files from youtube-dl, do you think it would be worth it? I mean, considering async will only help with downloading the audio in parallel, which by the way takes less than a second.Sauter
@StamKaly I recommend to find where's the bottleneck with the current solution (threads), and start from there. Depending on what's the limiting factor, making all code on python side async may or may not help. For example if the limiting factor is network bandwidth then async won't help because it won't magically increase network bandwidth. If the limiting factor is RAM amount or CPU load then it might help…I am not familiar with youtube-dl codebase, so it's hard to predict how much. And if you have enough free resources and you just want to run more parallel tasks, you can use more threads.Zaporozhye

© 2022 - 2024 — McMap. All rights reserved.