I'm trying to extract the Upload Dates
, Titles
, URLs
and Durations
from all the Youtube videos of a specific Playlist with youtube-dl
, I don't need the videos - just the above pieces of data.
So far I've tested the following two approaches suggested here by Alen Paul Varghese :
Youtube-dl's GitHub Doc Used as reference
The Playlist URL used for testing
APPROACH #1
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD > example.json
and
APPROACH #2
youtube-dl --get-upload_date https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD > example.txt
APPROACH #1 outputs a whole json dump — about 3000 lines per video - very inconvenient to handle for large number Youtube videos Playlists - but it contains the 4 needed data.
APPROACH #2 returns the following error:
youtube-dl: error: no such option: --get-upload_date
I wanted to favor APPROACH #2 to limit the output data to just the ones needed (upload dates
, Titles
, URLs
and Durations
), following Alen Paul Varghese's 2nd suggestion and after checking the upload_date
is a valid youtube-dl
option here Youtube-dl's GitHub Doc Used as reference.
Why doesn't the upload_data
option get validated?
What alternative in order to limit the data?
I would appreciate very much your helpful advice.
Here's the json dump file: example.json
EDIT (Thanks to @PIERPY GREAT GUIDANCE - FULL DOCUMENTED COMPLIMENTARY PROCESS - HELPFUL TO OTHERS):
I've successfully installed Chocolatey NuGet with Admin CMD
to install jq 1.5 with chocolatey install jq
as required by Download jq - Windows
My Chocolatey NuGet
Installation Output:
Microsoft Windows [Version 10.0.19042.867]
(c) 2020 Microsoft Corporation. All rights reserved.
C:\WINDOWS\system32>@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"
Forcing web requests to allow TLS v1.2 (Required for requests to Chocolatey.org)
Getting latest version of the Chocolatey package for download.
Not using proxy.
Getting Chocolatey from https://community.chocolatey.org/api/v2/package/chocolatey/0.10.15.
Downloading https://community.chocolatey.org/api/v2/package/chocolatey/0.10.15 to C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall\chocolatey.zip
Not using proxy.
Extracting C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall\chocolatey.zip to C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall
Installing Chocolatey on the local machine
Creating ChocolateyInstall as an environment variable (targeting 'Machine')
Setting ChocolateyInstall to 'C:\ProgramData\chocolatey'
WARNING: It's very likely you will need to close and reopen your shell
before you can use choco.
Restricting write permissions to Administrators
We are setting up the Chocolatey package repository.
The packages themselves go to 'C:\ProgramData\chocolatey\lib'
(i.e. C:\ProgramData\chocolatey\lib\yourPackageName).
A shim file for the command line goes to 'C:\ProgramData\chocolatey\bin'
and points to an executable in 'C:\ProgramData\chocolatey\lib\yourPackageName'.
Creating Chocolatey folders if they do not already exist.
WARNING: You can safely ignore errors related to missing log files when
upgrading from a version of Chocolatey less than 0.9.9.
'Batch file could not be found' is also safe to ignore.
'The system cannot find the file specified' - also safe.
chocolatey.nupkg file not installed in lib.
Attempting to locate it from bootstrapper.
PATH environment variable does not have C:\ProgramData\chocolatey\bin in it. Adding...
WARNING: Not setting tab completion: Profile file does not exist at 'C:\Users\###\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1'.
Chocolatey (choco.exe) is now ready.
You can call choco from anywhere, command line or powershell by typing choco.
Run choco /? for a list of functions.
You may need to shut down and restart powershell and/or consoles
first prior to using choco.
Ensuring Chocolatey commands are on the path
Ensuring chocolatey.nupkg is in the lib folder
C:\WINDOWS\system32>
Then I ran chocolatey install jq
and successfully install it:
My jq
installation output:
C:\WINDOWS\system32>chocolatey install jq
Chocolatey v0.10.15
Installing the following packages:
jq
By installing you accept licenses for the packages.
Progress: Downloading jq 1.6... 100%
jq v1.6 [Approved]
jq package files install completed. Performing other installation steps.
The package jq wants to run 'chocolateyinstall.ps1'.
Note: If you don't run this script, the installation will fail.
Note: To confirm automatically next time, use '-y' or consider:
choco feature enable -n allowGlobalConfirmation
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): Y
Downloading jq 64 bit
from 'https://github.com/stedolan/jq/releases/download/jq-1.6/jq-win64.exe'
Progress: 100% - Completed download of C:\ProgramData\chocolatey\lib\jq\tools\jq.exe (3.36 MB).
Download of jq.exe (3.36 MB) completed.
Hashes match.
C:\ProgramData\chocolatey\lib\jq\tools\jq.exe
ShimGen has successfully created a shim for jq.exe
The install of jq was successful.
Software install location not explicitly set, could be in package or
default install location if installer.
Chocolatey installed 1/1 packages.
See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
I then ran the your @pierpy
youtube-dl command:
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq '{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}'
and got a syntax error, with this output:
Microsoft Windows [Version 10.0.19042.867]
(c) 2020 Microsoft Corporation. All rights reserved.
C:\Users\###>cd documents
C:\Users\###\Documents>cd youtube-dl
C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq '{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}'
jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Windows cmd shell quoting issues?) at <top-level>, line 1:
'{date:
jq: 1 compile error
Traceback (most recent call last):
File "__main__.py", line 19, in <module>
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\__init__.py", line 475, in main
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\__init__.py", line 465, in _real_main
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 2060, in download
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 799, in extract_info
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 806, in wrapper
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 838, in __extract_info
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 924, in process_ie_result
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1058, in __process_playlist
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 806, in wrapper
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1068, in __process_iterable_entry
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 910, in process_ie_result
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 872, in process_ie_result
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1683, in process_video_result
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1793, in process_info
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1765, in __forced_printings
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 520, in to_stdout
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 509, in _write_string
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\utils.py", line 3180, in write_string
OSError: [Errno 22] Invalid argument
C:\Users\###\Documents\youtube-dl>
I then googled the error
jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Windows cmd shell quoting issues?)
and found insight from this suggestion:
I then accordingly adapted your @pierpy
youtube-dl command single wrapping quotes to double quotes:
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}"
Now it outputs the data Upload Dates
, Titles
, URLs
and Durations
just as needed.
The final Output:
C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}"
{
"date": "20150717",
"title": "3.1: Flow (setup and draw) - Processing Tutorial",
"URL": "https://r1---sn-n0ogpnx-b85s.googlevideo.com/videoplayback?expire=1617730292&ei=lEZsYKDoEZmAp-oP3ayk8AI&ip=188.154.162.181&id=o-AHFxnOR5c5xqmgtu1JG4FbL6lJW0gz1pJQN77cr2-27T&itag=22&source=youtube&requiressl=yes&mh=m6&mm=31%2C29&mn=sn-n0ogpnx-b85s%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=1&pl=23&initcwndbps=1578750&vprv=1&mime=video%2Fmp4&ns=r3pR-nwt6FkDQa33iQQu-qgF&ratebypass=yes&dur=944.007&lmt=1607684088067796&mt=1617708538&fvip=5&fexp=24001373%2C24007246&beids=9466585&c=WEB&txp=5432434&n=3P6HQoLfY8ktFLG5&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAMiNOv8QDjfsn7yxicEOtSjcEYjZlX3CfrI8D-HGBd63AiEA4E6rKv_kYti6rAeieJzPAdTYjoh05Az_11Kcxt-0jBg%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRAIgD43F71OxMExfQyN9FeNWfZX_aiGAD3SKlKOLNR14NT8CICEuD_Ry0oymKZmFfHuP4F6v9MKCrmRI0x27sLG8fvyG",
"duration": 944
}
{
"date": "20150717",
"title": "3.2: Built-in Variables (mouseX, mouseY) - Processing Tutorial",
"URL": "https://r4---sn-n0ogpnx-b85l.googlevideo.com/videoplayback?expire=1617730293&ei=lEZsYMO2OczSWaPiueAC&ip=188.154.162.181&id=o-ANuT73vsKQLvQqynOeh00stVP-zqbq3x-iUrdDiYwg8E&itag=22&source=youtube&requiressl=yes&mh=kE&mm=31%2C29&mn=sn-n0ogpnx-b85l%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=4&pl=23&initcwndbps=1617500&vprv=1&mime=video%2Fmp4&ns=tPtC_l82gq-yi-rk_oQXatAF&cnr=14&ratebypass=yes&dur=814.207&lmt=1551720899437893&mt=1617708538&fvip=5&fexp=24001373%2C24007246&beids=9466585&c=WEB&txp=5432432&n=LhJHXWU8TGNOrD9u&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRAIgSHTlBPN0j49hoB02SYDeF3-9fe1iSz1KRiv9iFy8nj0CIHEafdAOBefsos8kO5FGhDljsKpOV7ZQ9dY1BEzQQ0n0&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRgIhAJkd-9posqapJekca_35YNG0g3nLgxTfW06EqRM-a3wDAiEApSrsS5wPlMPXjlI_bvOh53cjxlrHfNSKD4XbhyDyZ6w%3D",
"duration": 815
}
{
"date": "20150717",
"title": "3.3: Events (mousePressed, keyPressed) - Processing Tutorial",
"URL": "https://r4---sn-n0ogpnx-b85l.googlevideo.com/videoplayback?expire=1617730293&ei=lUZsYK6WJ4TeWaeflbgF&ip=188.154.162.181&id=o-AD1WgS46WiFogy00v3aHRp6aZXkd_ACN-_m76lPoQvA8&itag=22&source=youtube&requiressl=yes&mh=it&mm=31%2C29&mn=sn-n0ogpnx-b85l%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=4&pl=23&initcwndbps=1617500&vprv=1&mime=video%2Fmp4&ns=AlyS4uv2BH5ENfp_nP53I-sF&cnr=14&ratebypass=yes&dur=441.225&lmt=1472343659978757&mt=1617708538&fvip=4&fexp=24001373%2C24007246&beids=9466585&c=WEB&n=np6rmmeSKhYEvG1K&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAIRmvxmY-VidN3LPhnzCNQ2TLsUB_7i1yU0QOMBVUS6AAiEAm9DE-Kk6cCNb8FC0we4c2O8299n2_2jGnQfzYzz0igo%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgZzrGEwMcb0Vrj9FleanW2apPMu_55OdH2SRdw66DQ1QCIQCDsAz7X5RxczKtWzokBhyUNcyXLXeZF-ENufpjA0BP2Q%3D%3D",
"duration": 442
}
C:\Users\###\Documents\youtube-dl>
LAST ISSUE:
The gotten URLs
don't display the standard video ones.
Why not?
In the Youtube-dl's GitHub Doc Used as reference it states:
url (string): Video URL
How to retrieve the standard Youtube Video URLs?
LAST ISSUE ANSWER:
I just reviewed my example.json
file generated yesterday and found out that the standard Youtube video URLs accepts webpage_url
in place of url
.
FINAL YOUTUBE-DL OUTPUT:
C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .webpage_url,"duration": .duration}"
{
"date": "20150717",
"title": "3.1: Flow (setup and draw) - Processing Tutorial",
"URL": "https://www.youtube.com/watch?v=o8dffrZ86gs",
"duration": 944
}
{
"date": "20150717",
"title": "3.2: Built-in Variables (mouseX, mouseY) - Processing Tutorial",
"URL": "https://www.youtube.com/watch?v=ibW4oA7-n8I",
"duration": 815
}
{
"date": "20150717",
"title": "3.3: Events (mousePressed, keyPressed) - Processing Tutorial",
"URL": "https://www.youtube.com/watch?v=UvSjtiW-RH8",
"duration": 442
}
C:\Users\###\Documents\youtube-dl>
TO GET THE FINAL OUTPUT IN A JSON FILE:
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .webpage_url,"duration": .duration}" > example.json
--get-upload_date
isn't a valid option.upload_date
is used for naming output file – Unblinking