programmatically access google search watchlist
Asked Answered
C

1

11

Google offers you the possibility with Google search to track, like and rate movies or TV shows. You can see those options blended with the information of a given movie when searching for it. You then can have access to that data later by searching my watchlist.

I've been trying without success to know if I can access programmatically (e.g. is there an API) to that data (movies inside of my watchlist, ratings...). Is it possible?

Many thanks, Joel

Conductivity answered 25/9, 2020 at 8:58 Comment(2)
just for reference, you can access the list here. So maybe a web crawler like Beautiful Soup can help?Garrettgarrick
same question has been asked here in google forumGarrettgarrick
U
0

There is still no API to access Google's watchlists, but you can use a tool like Axios to access the watchlist page and then jsdom to walk down the document tree and scrape the data.

The trick is finding a spot where Google will accurately show you your entire watchlist, because when accessing it from a search, Google will truncate long lists no matter what. Here are a few key steps to find the full list:

  1. Find and open your Google watchlist at this page: https://www.google.com/interests/saved

  2. Click the "Share" button, turn on sharing, choose "View only link", and click "Continue". Copy the resulting link.

Here's some example code to return your watchlist in an array:

// Paste your watchlist URL here:
const GOOGLE_WATCHLIST_URL = 'https://'; 

const jsdom = require('jsdom');
const { JSDOM } = jsdom;
const axios = require('axios');

// Scrape Google's 'my watchlist'
async function scrape() {
    let document = {};
    let elements = [];
    let items = [];
    let prevFirstItem = null;

    console.log('Scraping new data.');

    for (let i = 0; i <= 5; i++) {
        await axios.request({
            method: 'GET',
            url: GOOGLE_WATCHLIST_URL + '?pageNumber=' + (i + 1),
            headers: {
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
            }
        }).then(response => {
            document = new JSDOM(response.data).window.document;

            // Important: This selector could change in future, but it works as of July 29, 2024
            // Find the selector that grabs the name of each movie on your watchlist.
            elements = document.querySelectorAll('[data-hveid] a[aria-label]');

            items.push([]);

            // Find and collect items
            for (let el of elements) {
                if (el.getAttribute('aria-label') === prevFirstItem) {
                    // Stop because this item is same as prev
                    return;
                }

                items[i].push(el.getAttribute('aria-label'));
            };

            if (items[i][0] === prevFirstItem) {
                // Stop because first item here is same as prev
                console.log('Stopping because prev equals current: ' + items[i][0] + ' = ' + prevFirstItem)
                return;
            } else {
                // Set first item for next iteration
                prevFirstItem = items[i][0];
                console.log((i + 1) + ': ' + prevFirstItem);
            }
            
        }).catch(error => {
            console.error(error);
        });
    }

    items = items.filter(arr => arr.length);
    items = items.flat(Infinity);

    return await items;
}

(async () => {
    await scrape();
})()

This code works by accessing your watchlist page, reading the items, then looping through pages using the pageNumber URL parameter until there are no more unique items. Technically this code only scrapes up to 5 pages, but you could modify that by changing the for loop:

for (let i = 0; i <= 10; i++)

It may even be possible to exit the loop early if there are no more items, but this does the trick.

Underworld answered 29/7, 2024 at 15:5 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.