Be sure to await
all promises and avoid combining then
with await
.
//vvv
await page.waitForSelector('div#rso h3')
//^^^
Note that await page.waitForNavigation();
can cause a race condition if called after the event that triggers the navigation. I generally avoid waitForNavigation
in favor of waiting for a selector or condition that appears on the next page. This typically results in faster, shorter, more reliable code.
If you do use waitForNavigation
, set it alongside with Promise.all
or before the nav trigger event, in this case press
.
After these adjustments, if your goal is to get the data as quickly and reliably as possible rather than test the steps along the way, there's room for improvement.
It's often unnecessary to navigate to a landing page, then type into a box in order to run a search. It's typically faster and less error-prone to navigate directly to the results page with your query encoded into the URL. In this case, your code can be reduced to
const url = "https://www.google.com/search?q=cheese";
await page.goto(url, {waitUntil: "networkidle"});
console.log(await page.textContent(".fP1Qef h3"));
If you notice that the text you want is in the static HTML as is the case here, you can go a step further and block JS and external resources:
const playwright = require("playwright"); // ^1.30.1
let browser;
let context;
(async () => {
browser = await playwright.chromium.launch();
context = await browser.newContext({javaScriptEnabled: false});
const page = await context.newPage();
const url = "https://www.google.com/search?q=cheese";
await page.route("**", route => {
if (route.request().url().startsWith(url)) {
route.continue();
}
else {
route.abort();
}
});
// networkidle is a suboptimal way to handle redirection
await page.goto(url, {waitUntil: "networkidle"});
console.log(await page.locator(".fP1Qef h3").allTextContents());
})()
.catch(err => console.error(err))
.finally(async () => {
await context?.close();
await browser?.close();
});
Once you block JS and all external resources, you can often go all the way to the holy grail of web scraping: skip browser automation entirely and use a HTTP request and lightweight HTML parser instead:
const cheerio = require("cheerio"); // 1.0.0-rc.12
const query = "cheese";
const url = `https://www.google.com/search?q=${encodeURIComponent(query)}`;
fetch(url, { // Node 18 or install node-fetch
headers: {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
}
})
.then(res => res.text())
.then(html => {
const $ = cheerio.load(html);
console.log($(".fP1Qef h3").first().text()); // first result
console.log([...$(".fP1Qef h3")].map(e => $(e).text())); // all results
});
waitForNavigation
call. In this case, it will also sayTimeout of XXXms exceeded.
a bit higher up in the console output. – Glassware