How to find if a youtube channel is currently live streaming without using search?
Asked Answered
F

6

16

I'm working on a website to load multiple youtube channels live streams. At first i was trying to figure out a way to do this without utilizing youtube's api but have decided to give in.

To find whether a channel is live streaming and to get the live stream links I've been using:

https://www.googleapis.com/youtube/v3/search?part=snippet&channelId={CHANNEL_ID}&eventType=live&maxResults=10&type=video&key={API_KEY}

However with the minimum quota being 10000 and each search being worth 100, Im only able to do about 100 searches before I exceed my quota limit which doesn't help at all. I ended up exceeding the quota limit in about 10 minutes. :(

Does anyone know of a better way to figure out if a channel is currently live streaming and what the live stream links are, using as minimal quota points as possible?

I want to reload youtube data for each user every 3 minutes, save it into a database, and display the information using my own api to save server resources as well as quota points.

Hopefully someone has a good solution to this problem!

If nothing can be done about links just determining if the user is live without using 100 quota points each time would be a big help.

Foochow answered 30/5, 2019 at 20:57 Comment(3)
"I want to reload youtube data for each user every 3 minutes, save it into a database, and display the information using my own api to save server resources as well as quota points." Yep, that's just basic caching, and exactly what I'd recommend you do. You could even have your script do the lookup each time, and put a proxy in front (such as Nginx) and have it cache on its own... offloading this problem to another layer.Yseult
What language are you using to grab this data every 3 minutes and database it?Sierrasiesser
Just so everyone is aware i finished working on this project sometime ago and was able to collect the information without the need of YouTube's API. I basically set up a cron job using PHP HTML DOM parser. Only detail needed for the cron job to work was the channel's ID. Once i had the information, i just used a simple find function to search through the HTML and collect the info desired. The method did begin using a lot of bandwidth once the database was attempting to collect more the 1000 channels info. I didn't have to, but if youtube notices you may eventually need a proxy.Foochow
S
8

Since the question only specified that Search API quotas should not be used in finding out if the channel is streaming, I thought I would share a sort of work-around method. It might require a bit more work than a simple API call, but it reduces API quota use to practically nothing:

I used a simple Perl GET request to retrieve a Youtube channel's main page. Several unique elements are found in the HTML of a channel page that is streaming live:

The number of live viewers tag, e.g. <li>753 watching</li>. The LIVE NOW badge tag: <span class="yt-badge yt-badge-live" >Live now</span>.

To ascertain whether a channel is currently streaming live requires a simple match to see if the unique HTML tag is contained in the GET request results. Something like: if ($get_results =~ /$unique_html/) (Perl). Then, an API call can be made only to a channel ID that is actually streaming, in order to obtain the video ID of the stream.

The advantage of this is that you already know the channel is streaming, instead of using thousands of quota points to find out. My test script successfully identifies whether a channel is streaming, by looking in the HTML code for: <span class="yt-badge yt-badge-live" > (note the weird extra spaces in the code from Youtube).

I don't know what language OP is using, or I would help with a basic GET request in that language. I used Perl, and included browser headers, User Agent and cookies, to look like a normal computer visit.

Youtube's robots.txt doesn't seem to forbid crawling a channel's main page, only the community page of a channel.

Let me know what you think about the pros and cons of this method, and please comment with what might be improved rather than disliking if you find a flaw. Thanks, happy coding!

2020 UPDATE The yt-badge-live seems to have been deprecated, it no longer reliably shows whether the channel is streaming. Instead, I now check the HTML for this string:

{"text":" watching"}

If I get a match, it means the page is streaming. (Non-streaming channels don't contain this string.) Again, note the weird extra whitespace. I also escape all the quotation marks since I'm using Perl.

Sierrasiesser answered 31/5, 2019 at 2:38 Comment(3)
I wrote a script like this similar in php using DOMDocument but ran into an issue with it using an insane amount of resources and taking a long time to complete, the file_get_contents function seemed to also load the css associated with that page which is entirely unneeded. If i could keep it from doing that if would be a potential solution. What is your reasoning behind using PERL over PHP to make this request?Foochow
It was using too much bandwidth? Hmm I guess I don't have the volume of requests to find out. I'm using a DigitalOcean $5 droplet for mine which has about 1TB outbound and free inbound. I already had a bunch of Perl scripts so I just modified one for this. Here's an example of a full PHP request with browser headers though: beamtic.com/setting-request-headers-curlSierrasiesser
@Sierrasiesser What if the channel is streaming more than one video? How can I select the one which I want?Swayder
P
6

Here are my two suggestions:

  • Check my answer where I explain how you can check how retrieve videos from channels who are livestreaming.
  • Another option could be use the following URL and somehow make request(s) each time for check if there's a livestreaming.

https://www.youtube.com/channel/<CHANNEL_ID>/live

Where CHANNEL_ID is the channel id you want check if that channel is livestreaming1.


1 Just notice that maybe the URL wont work in all channels (and that depends of the channel itself).

For example, if you check the channel_id UC7_YxT-KID8kRbqZo7MyscQ - link to this channel livestreaming - https://www.youtube.com/channel/UC4nprx9Vd84-ly7N-1Ce6Og/live, this channel will show if it is livestreaming, but, with its channel id UC4nprx9Vd84-ly7N-1Ce6Og - link to this channel livestreaming -, it will show his main page instead.

Petaloid answered 8/6, 2019 at 23:35 Comment(4)
Using channel name does also work like https://www.youtube.com/c/<CHANNEL_NAME>/live and https://www.youtube.com/user/<CHANNEL_NAME>/live with removing any whitespace if there's any in the channel nameIson
@Ison thank you. I tested with "Microsoft" like this: https://www.youtube.com/c/Microsoft/live and https://www.youtube.com/user/Microsoft/live, but maybe this doesn't work with all YouTube channels like "NASAtelevision": works with https://www.youtube.com/user/NASAtelevision/live, but not with https://www.youtube.com/c/NASAtelevision/live.Petaloid
The 'user' and 'c' can be different with other channels, while NASAtelevision is the user name, NASA is the channel name (not all channels use for both the same name) thus https://www.youtube.com/c/NASA/live, works both in upper case and lower case. Although using 'c' with channel name indeed doesn't work with some channels, id and user name works. The thing with these ways is that you don't get to choose which one it redirects to for a channel running multiple live streams, But it's ideal to verify that a channel is live streaming and use the API to retrieve video ids of the live streams.Ison
better than the accept answer for most casesSenior
S
3

Adding to the answer by Bman70, I tried eliminating the need of making a costly search request after knowing that the channel is streaming live. I did this using two indicators in the HTML response from channels page who are streaming live.

function findLiveStreamVideoId(channelId, cb){
  $.ajax({
    url: 'https://www.youtube.com/channel/'+channelId,
    type: "GET",
    headers: {
      'Access-Control-Allow-Origin': '*',
      'Accept-Language': 'en-US, en;q=0.5'
  }}).done(function(resp) {
      
      //one method to find live video
      let n = resp.search(/\{"videoId[\sA-Za-z0-9:"\{\}\]\[,\-_]+BADGE_STYLE_TYPE_LIVE_NOW/i);

      //If found
      if(n>=0){
        let videoId = resp.slice(n+1, resp.indexOf("}",n)-1).split("\":\"")[1]
        return cb(videoId);
      }

      //If not found, then try another method to find live video
      n = resp.search(/https:\/\/i.ytimg.com\/vi\/[A-Za-z0-9\-_]+\/hqdefault_live.jpg/i);
      if (n >= 0){
        let videoId = resp.slice(n,resp.indexOf(".jpg",n)-1).split("/")[4]
        return cb(videoId);
      }

      //No streams found
      return cb(null, "No live streams found");
  }).fail(function() {
    return cb(null, "CORS Request blocked");
  });
}

However, there's a tradeoff. This method confuses a recently ended stream with currently live streams. A workaround for this issue is to get status of the videoId returned from Youtube API (costs a single unit from your quota).

Sit answered 31/10, 2020 at 15:57 Comment(1)
A complete solution is posted here in a gist. gist.github.com/MMujtabaRoohani/…Sit
C
1

Adding onto the other answers here, I use a GET request to https://www.youtube.com/c/<CHANNEL_NAME>/live and then search for "isLive":true (rather than {"text":" watching"})

Coquillage answered 5/11, 2022 at 17:33 Comment(1)
Noticed a problem with this. If a stream is scheduled but not currently live, you'll spot one instance of "isLive": true. If the stream is actually live then the page should include two instances of "isLive": true.Tension
D
0

I found youtube API to be very restrictive given the cost of search operation. Apparently the accepted answer did not work for me as I found the string on non live streams as well. Web scraping with aiohttp and beautifulsoup was not an option since the better indicators required javascript support. Hence I turned to selenium. I looked for the css selector

#info-text and then search for the string Started streaming or with watching now in it.

To reduce load on my tiny server that would have otherwise required lot more resources, I moved this test of functionality to a heroku dyno with a small flask app.

# import flask dependencies
import os
from flask import Flask, request, make_response, jsonify
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

base = "https://www.youtube.com/watch?v={0}"
delay = 3
# initialize the flask app
app = Flask(__name__)

# default route
@app.route("/")
def index():
    return "Hello World!"

# create a route for webhook
@app.route("/islive", methods=["GET", "POST"])
def is_live():
    chrome_options = Options()
    chrome_options.binary_location = os.environ.get('GOOGLE_CHROME_BIN')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--remote-debugging-port=9222')
    driver = webdriver.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), chrome_options=chrome_options)
    url = request.args.get("url")
    if "youtube.com" in url:
        video_id = url.split("?v=")[-1]
    else:
        video_id = url
        url = base.format(url)
    print(url)
    response = { "url": url, "is_live": False, "ok": False, "video_id": video_id }
    driver.get(url)
    try:
        element = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#info-text")))
        result = element.text.lower().find("Started streaming".lower())
        if result != -1:
            response["is_live"] = True
        else:
            result = element.text.lower().find("watching now".lower())
            if result != -1:
                response["is_live"] = True
        response["ok"] = True
        return jsonify(response)
    except Exception as e:
        print(e)
        return jsonify(response)
    finally:
        driver.close()

# run the app
if __name__ == "__main__":
   app.run()

You'll however need to add the following buildpacks in settings

https://github.com/heroku/heroku-buildpack-google-chrome
https://github.com/heroku/heroku-buildpack-chromedriver
https://github.com/heroku/heroku-buildpack-python

Set the following Config Vars in settings

CHROMEDRIVER_PATH=/app/.chromedriver/bin/chromedriver
GOOGLE_CHROME_BIN=/app/.apt/usr/bin/google-chrome

You can find supported python runtime here but anything below python 3.9 should be good since selenium had problems with improper use of is operator

I hope youtube will provide better alternatives than workarounds.

Duncan answered 8/4, 2021 at 8:39 Comment(0)
K
0

I know this is a old thread, but i thought i share my way of checking to for example grab the status code to use in an app.

This is for a single Channel, but you could easly do a foreach with it.

<?php
    #####
    $ytchannelID = "UCd0BTXriKLvOs1ANx3puZ3Q";
    #####
    $ytliveurl = "https://www.youtube.com/channel/".$ytchannelID."/live";
    $ytchannelLIVE = '{"text":" watching now"}';
    $contents = file_get_contents($ytliveurl);
        if ( strpos($contents, $ytchannelLIVE) !== false ){http_response_code(200);} else {http_response_code(201);}
   unset($ytliveurl); 
?>
Kopaz answered 30/10, 2022 at 8:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.