How to get all comments in youtube with selenium?
Asked Answered
K

5

6

The webpage shows that there are 702 Comments.
target youtube sample
enter image description here
I write a function get_total_youtube_comments(url) ,many codes copied from the project on github.

project on github

def get_total_youtube_comments(url):
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.common.exceptions import TimeoutException
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    import time
    options = webdriver.ChromeOptions()
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument("--headless")
    driver = webdriver.Chrome(options=options,executable_path='/usr/bin/chromedriver')
    wait = WebDriverWait(driver,60)
    driver.get(url)
    SCROLL_PAUSE_TIME = 2
    CYCLES = 7
    html = driver.find_element_by_tag_name('html')
    html.send_keys(Keys.PAGE_DOWN)   
    html.send_keys(Keys.PAGE_DOWN)   
    time.sleep(SCROLL_PAUSE_TIME * 3)
    for i in range(CYCLES):
        html.send_keys(Keys.END)
        time.sleep(SCROLL_PAUSE_TIME)
    comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
    all_comments = [elem.text for elem in comment_elems]
    return  all_comments

Try to parse all comments on a sample webpage https://www.youtube.com/watch?v=N0lxfilGfak.

url='https://www.youtube.com/watch?v=N0lxfilGfak'
list = get_total_youtube_comments(url)

It can get some comments ,only small party of all comments.

len(list)
60

60 is much less than 702,how to get all comments in youtube with selenium?
@supputuri,i can extract all comments with your code.

comments_list = driver.find_elements_by_xpath("//*[@id='content-text']")
len(comments_list)
709
print(driver.find_element_by_xpath("//h2[@id='count']").text)
717 Comments
comments_list[-1].text
'mistake at 23:11 \nin NOT it should return false if x is true.'
comments_list[0].text
'Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Python Course curriculum, Visit our Website:  Use code "YOUTUBE20" to get Flat 20% off on this training.'

Why the comments number is 709 instead of 717 shown in page?

Kettledrum answered 5/7, 2020 at 13:49 Comment(0)
P
7

You are getting a limited number of comments as YouTube will load the comments as you keep scrolling down. There are around 394 comments left on that video you have to first make sure all the comments are loaded and then also expand all View Replies so that you will reach the max comments count.

Note: I was able to get 700 comments using the below lines of code.

# get the last comment
lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
# scroll to the last comment currently loaded
lastEle.location_once_scrolled_into_view
# wait until the comments loading is done
WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))

# load all comments
while lastEle != driver.find_element_by_xpath("(//*[@id='content-text'])[last()]"):
    lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
    driver.find_element_by_xpath("(//*[@id='content-text'])[last()]").location_once_scrolled_into_view
    time.sleep(2)
    WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))

# open all replies
for reply in driver.find_elements_by_xpath("//*[@id='replies']//paper-button[@class='style-scope ytd-button-renderer'][contains(.,'View')]"):
    reply.location_once_scrolled_into_view
    driver.execute_script("arguments[0].click()",reply)
time.sleep(5)
WebDriverWait(driver, 30).until(
        EC.invisibility_of_element((By.CSS_SELECTOR, "div.active.style-scope.paper-spinner")))
# print the total number of comments
print(len(driver.find_elements_by_xpath("//*[@id='content-text']")))
Profile answered 8/7, 2020 at 5:0 Comment(1)
Almost done,why the comments number extracted with your code is 709,the number shown on webpage at the beginning of comments list is 717?Please see my updated post.Kettledrum
K
5

There are a couple of things:

  • The WebElements within the website https://www.youtube.com/ are dynamic. So are the comments dynamically rendered.
  • With in the webpage https://www.youtube.com/watch?v=N0lxfilGfak the comments doesn't render unless user scrolls the following element within the Viewport.

edureka

  • The comments are with in:

    <!--css-build:shady-->
    

    Which applies, Polymer CSS Builder is used apply Polymer's CSS Mixin shim and ShadyDOM scoping. So some runtime work is still done to convert CSS selectors under the default settings.


Considering the above mentioned factors here's a solution to retrieve all the comments:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException, ElementClickInterceptedException, WebDriverException
import time

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.youtube.com/watch?v=N0lxfilGfak')
driver.execute_script("return scrollBy(0, 400);")
subscribe = WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH, "//yt-formatted-string[text()='Subscribe']")))
driver.execute_script("arguments[0].scrollIntoView(true);",subscribe)
comments = []
my_length = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//yt-formatted-string[@class='style-scope ytd-comment-renderer' and @id='content-text'][@slot='content']"))))
while True:
    try:
        driver.execute_script("window.scrollBy(0,800)")
        time.sleep(5)
        comments.append([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//yt-formatted-string[@class='style-scope ytd-comment-renderer' and @id='content-text'][@slot='content']")))])
    except TimeoutException:
        driver.quit()
        break
print(comment)
Kaoliang answered 9/7, 2020 at 23:43 Comment(4)
It cost more time to extract comments than supputuri's way.Kettledrum
Well :) I agree thats what Selenium is meant for, to mock the user actions, agree? You must be able to watch how Selenium mimics the user action scrolling till the end. However, tomorrow morning I will try to improve the answer if possibleKaoliang
Please have a look at my updated post ,there is a new issue waiting for.Kettledrum
@Kettledrum Can you check the updated code now, you shouldn't face the previous NoSuchElement error.Kaoliang
B
4

If you don't have to use Selenium I would recommend you to look at the google/youtube api.

https://developers.google.com/youtube/v3/getting-started

Example :

https://www.googleapis.com/youtube/v3/commentThreads?key=YourAPIKey&textFormat=plainText&part=snippet&videoId=N0lxfilGfak&maxResults=100

This would give you the first 100 results and gets you a token that you can append on the next request to get the next 100 results.

Biradial answered 8/7, 2020 at 13:45 Comment(0)
C
4

I'm not familiar with python, but I'll tell you the steps that I would do to get all comments. First of all, if your code I think the main issue is with the

CYCLES = 7

According to this, you will be scrolling for 2 seconds 7 times. Since you are successfully grabbing 60 comments, fixing the above condition will solve your issue.

I assume you don't have any issue in finding elements on a website using locators.

  1. You need to get the total comments to count to a variable as an int. (in your case, let's say it's COMMENTS = 715)

  2. Define another variable called VISIBLECOUNTS = 0

  3. The use a while loop to scroll if the COMMENTS > VISIBLECOUNTS

  4. The code might look like this ( really sorry if there are syntax issues )

    // python - selenium command to get all comments counts.
    COMMENTS = 715
    (715 is just a sample value, it will change upon the total comments count)
    VISIBLECOUNTE = 0 
    SCROLL_PAUSE_TIME = 2
    
    while VISIBLECOUNTS  < COMMENTS :
    html.send_keys(Keys.END)
    time.sleep(SCROLL_PAUSE_TIME)
    VISIBLECOUNTS = len(driver.find_elements_by_xpath('//ytm-comment-thread-renderer'))
    

    With this, you will be scrolling down until the COMMENTS = VISIBLECOUNTS. Then you can grab all the comments as all of them share the same element attributes such as ytm-comment-thread-renderer

    Since I'm not familiar with python I'll add the command to get the comments to count from js. you can try this on your browser and convert it into your python command

Run the bellow queries in your console and check.

To get total comments count
var comments = document.querySelector(".comment-section-header-text").innerText.split(" ")
//We can get the text value "Comments • 715" and split by spaces and get the last value

Number(comments[comments.length -1])
//Then convirt string "715" to int, you just need to do these in python - selenium
To get active comments count
$x("//ytm-comment-thread-renderer").length

Note: if it's hard to extract the values you still can use the selenium js executor and do the scrolling with js until all the comments are visible. But I guess it's not hard to do it in python since the logic is the same.

I'm really sorry about not being able to add the solution in python. But hope this helped. cheers.

Capuchin answered 9/7, 2020 at 18:50 Comment(0)
C
0

The first thing you need to do is scroll down the video page to load all comments:

$actualHeight = 0;
$nextHeight = 0;
while (true) {
    try {
        
        $nextHeight += 10;      
        $actualHeight =  $this->driver->executeScript('return document.documentElement.scrollHeight;');
        
        if ($nextHeight >= ($actualHeight - 50 ) ) break;
        $this->driver->executeScript("window.scrollTo(0, $nextHeight);");
        $this->driver->manage()->timeouts()->implicitlyWait = 10;
    } catch (Exception $e) {
        break;
    }
}
Compressive answered 15/5, 2022 at 13:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.