Puppeteer page.evaluate querySelectorAll return empty objects
Asked Answered
A

4

40

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/

The problem is this code is returning an array of empty objects:

[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]

Am I making a mistake?

const browser = await puppeteer.launch();

const page = await browser.newPage();
await page.goto('https://reddit.com/');

let list = await page.evaluate(() => {
  return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});

console.log(JSON.stringify(list))

await browser.close();
Angular answered 23/9, 2017 at 9:13 Comment(1)
Promise.resolve isn't doing anything here, in addition to the DOM nodes not being JSON serializable.Thermography
A
60

The values returned from evaluate function should be json serializeable. https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968

the solution is to extract the href values from the elements and return it.

 await this.page.evaluate((sel) => {
        let elements = Array.from(document.querySelectorAll(sel));
        let links = elements.map(element => {
            return element.href
        })
        return links;
    }, sel);
Angular answered 24/9, 2017 at 2:51 Comment(3)
The docs are unclear to me because their link to Serializable goes to the JSON.stringify definition, which clearly states objects as serializable (and they obviously are). Nevertheless, a simple await page.evaluate(_ => { a: 1 }) will return undefinedBattlefield
Not sure if you mistyped. But if you're trying to return that object using the shorthand notation, you need to wrap the return object; await page.evaluate(_ => ({ a: 1 })). Could definitely be the cause for getting undefined.Repugnance
old comment, but @Battlefield not all objects are serializable - particularly those with circular references (e.g. parent has a property with the child, child has property pointing to the parent object). This definately applies to objects representing document elements.Exhibit
H
19

Problem:

The return value for page.evaluate() must be serializable.

According to the Puppeteer documentation, it says:

If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.

In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.

Solution:

You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.

Use page.$$() to obtain an ElementHandle array:

let list = await page.$$('.title');

Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():

let list = await page.$$eval('.title', a => a.href);
Hollister answered 13/3, 2020 at 0:9 Comment(1)
let list = await page.$$eval('.title', a => a.href); is incorrect. It would need to be const list = await page.$$eval('.title', a => a.map(e => e.href));Thermography
E
14

I faced the similar problem and i solved it like this;

 await page.evaluate(() => 
       Array.from(document.querySelectorAll('.title'), 
       e => e.href));
Eugeniaeugenics answered 7/11, 2019 at 14:33 Comment(2)
TIL Array.From takes a callback map functionSynsepalous
Consider using page.$$eval(".title", els => els.map(el => el.href)). $$eval is provided as a convenience to avoid the commonplace pattern of having to run documentquerySelectorAll() as the first line of a browser function.Thermography
T
1

Existing answers are reasonable, and identify the fundamental issue with OP's code, which is that you can't return DOM nodes from evaluate blocks, only serializable data. But these answers could be improved in a few respects.

This answer is the strongest so far, because it uses $eval and $$eval, which are shorthand for the common pattern of an evaluate that immediately calls querySelector/querySelectorAll, but has an incorrect line at the time of writing. The correct way to use $$eval is the following:

const textContents = await page.$$eval(
  ".title",
  els => els.map(el => el.textContent)
);

or, applying the same pattern to issue a series of untrusted clicks:

await page.$$eval("button", els => els.forEach(el => el.click()));

Most of the time, $$eval is the best way to work with the DOM.

If you only have one node, you can use:

const textContent = await page.$eval(".title", el => el.textContent);

If you need to trigger trusted events on multiple nodes, then use page.$, page.$$ or on rare occasions, page.evaluateHandle.

For example, this pattern is a fairly common way to click a series of buttons:

const btns = await page.$$("button");

for (const btn of btns) {
  await btn.click();
}

The same pattern can be used to retrieve text contents, but is not recommended relative to $$eval because it involves many network calls rather than one, which can lead to flakiness:

const titleHandles = await page.$$(".title");
const textContents = [];

for (const el of titleHandles) {
  textContents.push(await el.evaluate(el => el.textContent));
}

As always, don't forget to use waitForSelector or waitForFunction to make sure the elements are on the page before you query them, if they're not in the static HTML. Puppeteer also has a new auto-waiting locator API, but it's currently experimental and I haven't used it as much as the Playwright locator API. If Puppeteer's locator API gains the same level of support as Playwright's locator API, it'll probably render most other selection methods described above obsolete, as it's done in Playwright.

Thermography answered 4/12, 2023 at 3:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.