Extract automatic captions from YouTube video
Asked Answered
C

4

29

I'm having problems extracting automatic captions from YouTube videos.

I tried using the http://video.google.com/timedtext?type=track&v=3wszM2SA12E&name=Automatic&lang=en method, but that one only works for those videos, which have named tracks. For example, this one doesn't have any named tracks (only automatic caption) and doesn't load up: rrkrvAUbU9Y

There are several web-applications out there which can do it (like http://www.serpsite.com/youtube-subtitles-download-tool/ and http://mo.dbxdb.com/), but I need a script, because I want to use it for my research.

Anyone has any ideas what is the correct way to get this? YouTube's API has something about captions, but only for registered users, while the apps above work for all videos and I doubt they just capture the html code from the page (although that's possible too). There must be a way... please help!

Ced answered 23/12, 2012 at 18:16 Comment(2)
What is the reason why you won't use the api as a registered user?Seyler
@Drifter: You’re asking this as if having a YouTube account was everyone’s birthright. Just because you have one, doesn’t mean everyone can have one. Many people are legally banned from using a YouTube account for life. Others are arbitrarily denied registration without reason. Others already have enough accounts elsewhere and couldn’t manage more; keeping credentials and having an account is both a burden and a responsibility. Others don’t feel like being monitored.Orlina
H
8

You need to call another API first: http://video.google.com/timedtext?type=list&v=3wszM2SA12E

This will give you the list of the tracks available. In your case only one track can be obtained: id="0" name="Automatic" lang_code="en" lang_original="English" lang_translated="English" lang_default="true"

In this particular video I could get the track by name name=Automatic:

https://video.google.com/timedtext?type=track&v=3wszM2SA12E&name=Automatic&lang=en

But for another video id= worked fine:

http://video.google.com/timedtext?type=track&v=zenMEj0cAC4&id=0&lang=en

Hent answered 14/8, 2017 at 13:25 Comment(1)
Is this API is down?Fausta
J
8

Here my suggestions after spending some time:

Janiculum answered 24/5, 2018 at 18:18 Comment(1)
The linked "youtube-captions-scraper" repo (a fork) appears outdated. The source-repo appears to work fine: github.com/algolia/youtube-captions-scraper (the parsing part of the code anyway; haven't tested the axios.get calls yet, but I assume that could be replaced with fetch)Odel
L
2

A great way about going to get data from a page is by using file_get_contents however this only works if the video has a 'CC' or captions button but when it does you can get all the text elements from the xml file unfortunately as documented by one of the 'Youtube to caption' services stated it needs captions enabled by the youtuber to get the captions so unfortunately you cannot get captions from videos without 'CC' enabled, however if you still want you can use file_get_contents on the xml file then find all the 'text' tags and then turn those into captions.

Lasko answered 30/1, 2015 at 12:15 Comment(0)
F
0

I was taking a look at downsub.com and found the following API call which seems to work for automatically generated captions:

https://www.youtube.com/api/timedtext?expire=1491547251&v=YD1tc8lRsdQ&sparams=asr_langs%2Ccaps%2Cv%2Cexpire&hl=en_US&signature=6241BAB9F7E9DB164AFE496B40B4DA4B58B463FD.D7FEC5B2CC81721AF9928215343509E280FEF6BD&asr_langs=pt%2Cit%2Ces%2Cru%2Cfr%2Cko%2Cde%2Cja%2Cnl%2Cen&key=yttt1&caps=asr&kind=asr&lang=en

Fromma answered 7/4, 2017 at 0:0 Comment(1)
My guess is they're using the YT partners api. i suppose you did notice the expire parameter and the signature.Emancipator

© 2022 - 2024 — McMap. All rights reserved.