Puppeteer - Error: Protocol error (Network.getResponseBody): No resource with given identifier found
Asked Answered
S

2

15

I'm trying with this code to get the response body from a website using puppeteer.

#!/usr/bin/env node

require('dotenv').config();
const puppeteer = require('puppeteer');
const readline = require('readline').createInterface({
    input: process.stdin,
    output: process.stdout
});
const path = require('path');
const fs = require('fs');

//
console.log('Starting Puppeteer...');

let responseBody = [];

(async () => {
    const browser = await puppeteer.launch({
        headless: false,
        executablePath: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
    });
    const page = await browser.newPage();
    
    await page.setRequestInterception(true);

    page.on('request', (request) => {
        request.continue();
    });

    //
    page.on('requestfinished', async (request) => {
        const response =  await request.response();
        const url = response.url();
        // store chunks url
        if( url.startsWith('https://audio-akp-quic-control-examplecdn-com.akamaized.net/audio/') ){
            console.log(await response.buffer());
            //responseBody.push(response.buffer());
        }
    });

    //
    await page.goto('https://accounts.examplecdn.com/login', {
        waitUntil: ['load', 'networkidle2']
    });

    const emailField = await page.waitForSelector('#login-username', {timeout: 3000});
    await emailField.type(process.env.EMAIL, {delay: 100});

    const passwordField = await page.waitForSelector('#login-password', {timeout: 3000});
    await passwordField.type(process.env.PASSWORD, {delay: 100});

    const submitButton = await page.waitForSelector('#login-button', {timeout: 3000});
    await submitButton.click();
    
    //
    const navigation = await page.waitForNavigation({ waitUntil: ['load', 'networkidle2'] });
    
    //if( navigation.url().endsWith('status') ){
    await page.goto('https://example.cdn.com/search', { 
        waitUntil: ['load', 'networkidle2'] 
    }).then( async (response) => {
        //console.log(response);
        const cookieButton = await page.$('#onetrust-accept-btn-handler');
        await cookieButton.click();
        const searchField = await page.$('[data-testid="search-input"]');
        await readline.question('What track do you want to search for?', (input) => {
            console.log('answer:', input);
            searchField.type(input).then( async () => {
                await page.waitForXPath('//*[@id="searchPage"]/div/div/section[1]/div[2]/div/div/div/div[4]').then( async (element) => {
                    element.focus().then( async () => {
                        // //*[@id="searchPage"]/div/div/section[1]/div[2]/div/div/div/div[3]/button
                        const playButton = await page.waitForXPath('//*[@id="searchPage"]/div/div/section[1]/div[2]/div/div/div/div[3]/button');
                        await playButton.click();
                    });
                });
            });
        });
    });
    
    
    //}

})();

I'm having problem with it and this error will be logged and the script will terminate.

/Users/dev/Desktop/test/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:208
            this._callbacks.set(id, { resolve, reject, error: new Error(), method });
                                                              ^

Error: Protocol error (Network.getResponseBody): No resource with given identifier found
    at /Users/dev/Desktop/test/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:208:63
    at new Promise (<anonymous>)
    at CDPSession.send (/Users/dev/Desktop/test/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:207:16)
    at /Users/dev/Desktop/test/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPResponse.js:99:53
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:93:5)
    at async /Users/dev/Desktop/test/index.js:40:25

I need to collect all the response body content when a certain url is called, then using ffmpeg I want to convert it back to a full length track. How I can solve the problem? Is it possible to get the response body of each request and then join all together?

Skinflint answered 26/2, 2021 at 17:45 Comment(0)
L
2

The error No resource with given identifier found will happen when the page navigated to another URL before you complete getting the content of network response. It should be caused by redirecting, JS history API, and so on.

Thus, you can do either:

  • Stop browser to move to other pages before the response is processed.
  • Use Firefox. Firefox doesn't have this issue and Chrome team won't fix the issue.

ref(in Japanese): https://happy-nap.hatenablog.com/entry/2023/04/15/081747

Limekiln answered 24/10, 2023 at 16:35 Comment(1)
This gives the cause, but not the solution as it applies to OP. OP needs to await their promises to remove race conditions, which will "Stop browser to move to other pages before the response is processed". Using FF isn't a good solution, even if it happens to work, since OP's underlying code is still racy, which is a basic correctness problem that likely won't hold up as the browser API changes.Chipboard
C
0

"No resource with given identifier found" (and in recent Puppeteer versions, including ^21.2.1, "ProtocolError: Could not load body for this request. This might happen if the request is a preflight request.") is caused by a race condition, which typically occurs when you forget to await a promise, resulting in a navigation interleaving with response handling.

There are many issues and antipatterns here, some of which cause race conditions. A couple of your .then callbacks never return anything. For example:

element.focus().then(...

should be

return element.focus().then(...

The following pattern is incorrect:

await readline.question('What track do you want to search for?', (input) => {

Asynchronous functions typically either return a promise or accept a callback, not both. The await tricks you into thinking you're keeping this in the promise chain, when you're actually awaiting undefined. The actual "promise" is the callback.

Almost always, never mix await and then. The point of promises is to flatten out code so you can write it in a synchronous style. If you find you have many layers of nested callbacks or .then(async () => ..., a red flag should go off and the chances you've failed to handle an error or abandoned a promise chain increase.

If you need to promisify a callback, you can:

const question = prompt =>
  new Promise(resolve =>
    readline.question(prompt, response => resolve(response))
  );

Now you can use it in your code "synchronously" like:

const input = await question("What track do you want to search for?");

There's also Node's utils.promisify which does more or less the same operation mechanically.

I can't run your code without the username and password, but if you remove all thens (yes, every last one!), await everything in a single promise chain and promisify any callback-based asynchronous functions, you should be able to avoid this error.

I also suggest avoiding those long, rigid, browser-generated XPaths. They make too many assumptions about the structure that can easily fail, and there are almost always more robust selectors or paths you can use.

Taking a step back, I suggest coding slowly and running the code at each step so you can verify each assumption along the way. In doing so, you can minimize problems and tackle them immediately and avoid a chaotic, complex situation with multiple issues that are difficult to debug all at once.

See this answer for a minimal, reproducible example of working code that avoids this error (the question it's attached to is also non-reproducible, unfortunately). The linked answer is in Playwright, but the same promise issue and solution applies equally to Puppeteer.

See also:

Chipboard answered 19/3, 2023 at 16:27 Comment(8)
This comment does not solve the original issue. Sounds like useless ChatGPT response.Limekiln
@KazuyaGosho It's not ChatGPT. ChatGPT doesn't link its resources and writes in a pandering tone. It won't say things like "never mix async and await". Check my other answers and you'll see it's quite consistent with them. How does it not answer the question? OP's not awaiting their readline promises. If they do, they shouldn't have any problems, assuming their scrape is otherwise correct. There's no way to write a more complete answer than this, since OP hasn't provided their site (example.cdn.com/search was given). If they provide that, I can provide full, working code.Chipboard
Other things ChatGPT would never say because it always uses "soft" language that isn't too forceful or strong: "(yes, every last one!)", "abandoned a promise chain", "await tricks you into thinking", "chaotic, complex situation", etc. Check the linked blog post (written by me, published before ChatGPT was released), and you'll see it has the same tone. See also What should I do if I suspect that a question or answer is written by ChatGPT?.Chipboard
If you have the same problem as this error, please post a question with a minimal reproducible example, link me up, and I'm pretty sure I'll be able to show you the race condition that's causing it. Hopefully then it'll be apparent how this answers the question. I'd like to have a better, reproducible canonical than this question for the error, which is clearly a common issue (7k views as of Oct '23).Chipboard
@KazuyaGosho See also my posts on meta like this one, as well as my SO profile. I went on strike against ChatGPT answers for 3 months and spent an inordinate amount of time fighting to keep SO GPT-free on both meta SE and meta SO. Please do your research before making baseless accusations.Chipboard
First of all, I'm sorry for my comment above. I should have not accused you without any evidence. What I wanted to say is that while your answer is well describing best practice to write clean JavaScript code, it still does not solve the original issue ( Network.getResponseBody ). I faced the same issue as OP while writing code for Puppeteer, so I'll write my answer on my own. Again, sorry for my impoliteness.Limekiln
@KazuyaGosho No problem, but I don't think you're correct in your assessment of this post not answering the question. Not awaiting the promises causes the error, awaiting them fixes the problem. It's not a matter of writing clean JS code, it's a matter of avoiding race conditions that cause Puppeteer errors. Feel free to provide another answer if you want, but it'll likely be the same thing: making sure all promises are awaited so that response bodies are read on the same navigation state. I edited the answer to clarify this and provide more references and reproducible examples.Chipboard
@KazuyaGosho Also, if you want to retract your ChatGPT comment, you can delete it so I can delete my comments and clean up the thread, removing noise for future visitors.Chipboard

© 2022 - 2024 — McMap. All rights reserved.