How to Extract The upload dates, Titles, URLs and Durations from Youtube videos in a Playlist with youtube-dl?
Asked Answered
T

1

6

I'm trying to extract the Upload Dates, Titles, URLs and Durations from all the Youtube videos of a specific Playlist with youtube-dl, I don't need the videos - just the above pieces of data.

So far I've tested the following two approaches suggested here by Alen Paul Varghese :

Youtube-dl's GitHub Doc Used as reference

The Playlist URL used for testing

APPROACH #1

youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD > example.json

and

APPROACH #2

youtube-dl --get-upload_date https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD > example.txt

APPROACH #1 outputs a whole json dump — about 3000 lines per video - very inconvenient to handle for large number Youtube videos Playlists - but it contains the 4 needed data.

APPROACH #2 returns the following error:

youtube-dl: error: no such option: --get-upload_date

I wanted to favor APPROACH #2 to limit the output data to just the ones needed (upload dates, Titles, URLs and Durations), following Alen Paul Varghese's 2nd suggestion and after checking the upload_date is a valid youtube-dl option here Youtube-dl's GitHub Doc Used as reference.

Why doesn't the upload_data option get validated?

What alternative in order to limit the data?

I would appreciate very much your helpful advice.

Here's the json dump file: example.json


EDIT (Thanks to @PIERPY GREAT GUIDANCE - FULL DOCUMENTED COMPLIMENTARY PROCESS - HELPFUL TO OTHERS):


I've successfully installed Chocolatey NuGet with Admin CMD to install jq 1.5 with chocolatey install jq as required by Download jq - Windows

My Chocolatey NuGet Installation Output:

    Microsoft Windows [Version 10.0.19042.867]
(c) 2020 Microsoft Corporation. All rights reserved.
C:\WINDOWS\system32>@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"                                                         
Forcing web requests to allow TLS v1.2 (Required for requests to Chocolatey.org)                                        
Getting latest version of the Chocolatey package for download.                                                          
Not using proxy.
Getting Chocolatey from https://community.chocolatey.org/api/v2/package/chocolatey/0.10.15.
Downloading https://community.chocolatey.org/api/v2/package/chocolatey/0.10.15 to C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall\chocolatey.zip
Not using proxy.
Extracting C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall\chocolatey.zip to C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall
Installing Chocolatey on the local machine
Creating ChocolateyInstall as an environment variable (targeting 'Machine')
  Setting ChocolateyInstall to 'C:\ProgramData\chocolatey'
WARNING: It's very likely you will need to close and reopen your shell
  before you can use choco.
Restricting write permissions to Administrators
We are setting up the Chocolatey package repository.
The packages themselves go to 'C:\ProgramData\chocolatey\lib'
  (i.e. C:\ProgramData\chocolatey\lib\yourPackageName).
A shim file for the command line goes to 'C:\ProgramData\chocolatey\bin'
  and points to an executable in 'C:\ProgramData\chocolatey\lib\yourPackageName'.

Creating Chocolatey folders if they do not already exist.

WARNING: You can safely ignore errors related to missing log files when
  upgrading from a version of Chocolatey less than 0.9.9.
  'Batch file could not be found' is also safe to ignore.
  'The system cannot find the file specified' - also safe.
chocolatey.nupkg file not installed in lib.
 Attempting to locate it from bootstrapper.
PATH environment variable does not have C:\ProgramData\chocolatey\bin in it. Adding...
WARNING: Not setting tab completion: Profile file does not exist at 'C:\Users\###\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1'.
Chocolatey (choco.exe) is now ready.
You can call choco from anywhere, command line or powershell by typing choco.
Run choco /? for a list of functions.
You may need to shut down and restart powershell and/or consoles
 first prior to using choco.
Ensuring Chocolatey commands are on the path
Ensuring chocolatey.nupkg is in the lib folder

C:\WINDOWS\system32>

Then I ran chocolatey install jq and successfully install it:

My jq installation output:

    C:\WINDOWS\system32>chocolatey install jq
Chocolatey v0.10.15
Installing the following packages:
jq
By installing you accept licenses for the packages.
Progress: Downloading jq 1.6... 100%

jq v1.6 [Approved]
jq package files install completed. Performing other installation steps.
The package jq wants to run 'chocolateyinstall.ps1'.
Note: If you don't run this script, the installation will fail.
Note: To confirm automatically next time, use '-y' or consider:
choco feature enable -n allowGlobalConfirmation
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): Y

Downloading jq 64 bit
  from 'https://github.com/stedolan/jq/releases/download/jq-1.6/jq-win64.exe'
Progress: 100% - Completed download of C:\ProgramData\chocolatey\lib\jq\tools\jq.exe (3.36 MB).
Download of jq.exe (3.36 MB) completed.
Hashes match.
C:\ProgramData\chocolatey\lib\jq\tools\jq.exe
 ShimGen has successfully created a shim for jq.exe
 The install of jq was successful.
  Software install location not explicitly set, could be in package or
  default install location if installer.

Chocolatey installed 1/1 packages.
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).

I then ran the your @pierpy youtube-dl command:

youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq '{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}'

and got a syntax error, with this output:

    Microsoft Windows [Version 10.0.19042.867]
(c) 2020 Microsoft Corporation. All rights reserved.

C:\Users\###>cd documents

C:\Users\###\Documents>cd youtube-dl

C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq '{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}'
jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Windows cmd shell quoting issues?) at <top-level>, line 1:
'{date:
jq: 1 compile error
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\__init__.py", line 475, in main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\__init__.py", line 465, in _real_main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 2060, in download
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 799, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 806, in wrapper
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 838, in __extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 924, in process_ie_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1058, in __process_playlist
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 806, in wrapper
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1068, in __process_iterable_entry
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 910, in process_ie_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 872, in process_ie_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1683, in process_video_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1793, in process_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1765, in __forced_printings
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 520, in to_stdout
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 509, in _write_string
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\utils.py", line 3180, in write_string
OSError: [Errno 22] Invalid argument

C:\Users\###\Documents\youtube-dl>

I then googled the error

jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Windows cmd shell quoting issues?)

and found insight from this suggestion:

It's all about the quoting

I then accordingly adapted your @pierpy youtube-dl command single wrapping quotes to double quotes:

youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}"

Now it outputs the data Upload Dates, Titles, URLs and Durations just as needed.

The final Output:

C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}"
{
  "date": "20150717",
  "title": "3.1: Flow (setup and draw) - Processing Tutorial",
  "URL": "https://r1---sn-n0ogpnx-b85s.googlevideo.com/videoplayback?expire=1617730292&ei=lEZsYKDoEZmAp-oP3ayk8AI&ip=188.154.162.181&id=o-AHFxnOR5c5xqmgtu1JG4FbL6lJW0gz1pJQN77cr2-27T&itag=22&source=youtube&requiressl=yes&mh=m6&mm=31%2C29&mn=sn-n0ogpnx-b85s%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=1&pl=23&initcwndbps=1578750&vprv=1&mime=video%2Fmp4&ns=r3pR-nwt6FkDQa33iQQu-qgF&ratebypass=yes&dur=944.007&lmt=1607684088067796&mt=1617708538&fvip=5&fexp=24001373%2C24007246&beids=9466585&c=WEB&txp=5432434&n=3P6HQoLfY8ktFLG5&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAMiNOv8QDjfsn7yxicEOtSjcEYjZlX3CfrI8D-HGBd63AiEA4E6rKv_kYti6rAeieJzPAdTYjoh05Az_11Kcxt-0jBg%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRAIgD43F71OxMExfQyN9FeNWfZX_aiGAD3SKlKOLNR14NT8CICEuD_Ry0oymKZmFfHuP4F6v9MKCrmRI0x27sLG8fvyG",
  "duration": 944
}
{
  "date": "20150717",
  "title": "3.2: Built-in Variables (mouseX, mouseY) - Processing Tutorial",
  "URL": "https://r4---sn-n0ogpnx-b85l.googlevideo.com/videoplayback?expire=1617730293&ei=lEZsYMO2OczSWaPiueAC&ip=188.154.162.181&id=o-ANuT73vsKQLvQqynOeh00stVP-zqbq3x-iUrdDiYwg8E&itag=22&source=youtube&requiressl=yes&mh=kE&mm=31%2C29&mn=sn-n0ogpnx-b85l%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=4&pl=23&initcwndbps=1617500&vprv=1&mime=video%2Fmp4&ns=tPtC_l82gq-yi-rk_oQXatAF&cnr=14&ratebypass=yes&dur=814.207&lmt=1551720899437893&mt=1617708538&fvip=5&fexp=24001373%2C24007246&beids=9466585&c=WEB&txp=5432432&n=LhJHXWU8TGNOrD9u&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRAIgSHTlBPN0j49hoB02SYDeF3-9fe1iSz1KRiv9iFy8nj0CIHEafdAOBefsos8kO5FGhDljsKpOV7ZQ9dY1BEzQQ0n0&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRgIhAJkd-9posqapJekca_35YNG0g3nLgxTfW06EqRM-a3wDAiEApSrsS5wPlMPXjlI_bvOh53cjxlrHfNSKD4XbhyDyZ6w%3D",
  "duration": 815
}
{
  "date": "20150717",
  "title": "3.3: Events (mousePressed, keyPressed) - Processing Tutorial",
  "URL": "https://r4---sn-n0ogpnx-b85l.googlevideo.com/videoplayback?expire=1617730293&ei=lUZsYK6WJ4TeWaeflbgF&ip=188.154.162.181&id=o-AD1WgS46WiFogy00v3aHRp6aZXkd_ACN-_m76lPoQvA8&itag=22&source=youtube&requiressl=yes&mh=it&mm=31%2C29&mn=sn-n0ogpnx-b85l%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=4&pl=23&initcwndbps=1617500&vprv=1&mime=video%2Fmp4&ns=AlyS4uv2BH5ENfp_nP53I-sF&cnr=14&ratebypass=yes&dur=441.225&lmt=1472343659978757&mt=1617708538&fvip=4&fexp=24001373%2C24007246&beids=9466585&c=WEB&n=np6rmmeSKhYEvG1K&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAIRmvxmY-VidN3LPhnzCNQ2TLsUB_7i1yU0QOMBVUS6AAiEAm9DE-Kk6cCNb8FC0we4c2O8299n2_2jGnQfzYzz0igo%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgZzrGEwMcb0Vrj9FleanW2apPMu_55OdH2SRdw66DQ1QCIQCDsAz7X5RxczKtWzokBhyUNcyXLXeZF-ENufpjA0BP2Q%3D%3D",
  "duration": 442
}

C:\Users\###\Documents\youtube-dl>

LAST ISSUE:


The gotten URLs don't display the standard video ones. Why not?

In the Youtube-dl's GitHub Doc Used as reference it states:

url (string): Video URL

How to retrieve the standard Youtube Video URLs?

LAST ISSUE ANSWER:

I just reviewed my example.json file generated yesterday and found out that the standard Youtube video URLs accepts webpage_url in place of url.


FINAL YOUTUBE-DL OUTPUT:


C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .webpage_url,"duration": .duration}"
{
  "date": "20150717",
  "title": "3.1: Flow (setup and draw) - Processing Tutorial",
  "URL": "https://www.youtube.com/watch?v=o8dffrZ86gs",
  "duration": 944
}
{
  "date": "20150717",
  "title": "3.2: Built-in Variables (mouseX, mouseY) - Processing Tutorial",
  "URL": "https://www.youtube.com/watch?v=ibW4oA7-n8I",
  "duration": 815
}
{
  "date": "20150717",
  "title": "3.3: Events (mousePressed, keyPressed) - Processing Tutorial",
  "URL": "https://www.youtube.com/watch?v=UvSjtiW-RH8",
  "duration": 442
}

C:\Users\###\Documents\youtube-dl>

TO GET THE FINAL OUTPUT IN A JSON FILE:

youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .webpage_url,"duration": .duration}" > example.json
Tachograph answered 5/4, 2021 at 22:27 Comment(2)
--get-upload_date isn't a valid option. upload_date is used for naming output fileUnblinking
Ok. Would be great to add those options as valid options.Tachograph
U
5

You need to filter output with a convenient tool, like jq:
Paste this command line:
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq '{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}'
You can obtain jq from https://stedolan.github.io/jq/download/

UPDATE:

the key "webpage_url" holds standard YouTube URLs, if they are needed. For full listing of various possible keys, run:
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq keys
This gives complete key names in original JSON.

Unblinking answered 6/4, 2021 at 5:33 Comment(5)
Thanks a lot for that great help. Just a last issue (also mentioned in my question edit above). The URLs aren't the standard Youtube videos ones. What would be the appropriate option to get the standard URLs?Tachograph
I just reviewed my example.json file generated yesterday and found out that the standard Youtube video URLs accept webpage_url in place of url. It works! Again thank your very much for your awesome guidance, very much appreciate your helpful hand. Be well!Tachograph
@Tachograph I was writing another answer for that, but you made it before me :-DUnblinking
Thanks a lot for the follow up on possible keys, much helpful. Be well!Tachograph
@Tachograph glad to help!Unblinking

© 2022 - 2024 — McMap. All rights reserved.