Puppeteer doesn't close browser
Asked Answered
S

12

27

I'm running puppeteer on express/node/ubuntu as follow:

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

/* GET home page. */
router.get('/', function(req, res, next) {
    (async () => {
        headless = true;
        const browser = await puppeteer.launch({headless: true, args:['--no-sandbox']});
        const page = await browser.newPage();
        url = req.query.url;
        await page.goto(url);
        let bodyHTML = await page.evaluate(() => document.body.innerHTML);
        res.send(bodyHTML)
        await browser.close();
    })();
});

running this script multiple times leaves hundred of Zombies:

$ pgrep chrome | wc -l
133

Which clogs the srv,

How do I fix this?

Running kill from a Express JS script could solve it?

Is there a better way to get the same result other than puppeteer and headless chrome?

Selfwinding answered 27/12, 2018 at 3:43 Comment(4)
did you try await page.close(); before closing the browser ?Scutiform
Which is version puppeteer using?Coda
An additional info for detecting Chromium is still running or not: #52046100Jeannettajeannette
@Insensibility Could you share an exact URL you send off to your Express app / Puppetteer to open and evaluate?Rousseau
S
6

I solve it with https://www.npmjs.com/package/shelljs

var shell = require('shelljs');
shell.exec('pkill chrome')
Selfwinding answered 27/12, 2018 at 8:47 Comment(7)
this works for me, somehow browser.close doesn't work but this command works.Tragedian
will it kill all chrome instances running in system or it will only the chrome instances puppeteer script startedLucilelucilia
it suppose to kill all processSelfwinding
can't believe this is the accepted answer. Not for the answer itself, but for the fact that we are almost forced to do this to simply close puppeteer processesPledge
This answer is terrible, because it will kill all chrome instances. In the original code, it was clear that puppeteer was running in an API, meaning that if there are more than 1 api request at the same time, the other requests will fail.Gradygrae
@Pledge you're not forced to do this to close Puppeteer processes. Using a finally block and giving proper attention to control flow, resource management and error handling is the correct way. Most folks who have memory leaks also have sloppy design that makes it hard for them to tell where their open resources are (or aren't) being closed (if they're even aware of the need to close resources).Sufferable
This solution is too radical. Closing the pages and the browser in a finally block is sufficient, no need to go to such extremes.Cliffordclift
D
31

Ahhh! This is a simple oversight. What if an error occurs and your await browser.close() never executes thus leaving you with zombies.

Using shell.js seems to be a hacky way of solving this issue.

The better practice is to use try..catch..finally. The reason being you would want the browser to be closed irrespective of a happy flow or an error being thrown. And unlike the other code snippet, you don't have to try and close the browser in the both the catch block and finally block. finally block is always executed irrespective of whether an error is thrown or not.

So, your code should look like,

const puppeteer = require('puppeteer');
const express = require('express');

const router = express.Router();

/* GET home page. */
router.get('/', function(req, res, next) {
  (async () => {
    const browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox'],
    });

    try {
      const page = await browser.newPage();
      url = req.query.url;
      await page.goto(url);
      const bodyHTML = await page.evaluate(() => document.body.innerHTML);
      res.send(bodyHTML);
    } catch (e) {
      console.log(e);
    } finally {
      await browser.close();
    }
  })();
});

Hope this helps!

Drily answered 19/7, 2019 at 10:51 Comment(3)
Can you explain whats happening here? Your code sample looks the same to me as the original one. It's only in a try-catch-finally block, but the order of execution is the same right? Also, what do you mean with "execution time stops"? Execution doesn't stop unless you use return. You can do any amount of work after calling res.send().Insensibility
browser needs to be declared in an outer scope so it's in scope for the finally block, otherwise you'll get ReferenceError: browser is not defined. This code also makes various globals, headless and url. Always use const or let to scope variables correctly.Sufferable
Also, if puppeteer.launch() throws, it's uncaught here and url = req.query.url; should be const to avoid a global. await page.evaluate(() => document.body.innerHTML); can be await page.content().Sufferable
B
22

wrap your code in try-catch like this and see if it helps

headless = true;
const browser = await puppeteer.launch({headless: true, args:['--no-sandbox']});
try {
  const page = await browser.newPage();
  url = req.query.url;
  await page.goto(url);
  let bodyHTML = await page.evaluate(() => document.body.innerHTML);
  res.send(bodyHTML);
  await browser.close();
} catch (error) {
  console.log(error);
} finally {
  await browser.close();
}
Bac answered 27/12, 2018 at 11:17 Comment(5)
Good point, 10x. how ever, I think the try should be after const browser =... for it to be used in the catch/finally, isn't it?Selfwinding
UPDATE: I tried both solution, I have significantly less zombie, but still I have about 20 left every daySelfwinding
await browser.close(); in the try and catch blocks is redundant. finally will run after both blocks regardless of whether an error is thrown. Use const url and const headless to avoid unnecessary, error-prone globals.Sufferable
Yes, it was redundantBac
It's still there. If the code doesn't throw, browser.close() runs twice in this code. I suggest removing it from the try block. await page.evaluate(() => document.body.innerHTML); is cleaner as await page.content(), url = req.query.url; should be const, if the launch throws, it's uncaught and headless = true; is never used and should be const. Many problems here.Sufferable
W
13

From my experience, the browser closing process may take some time after close is called. Anyway, you can check the browser process property to check if it's still not closed and force kill it.

if (browser && browser.process() != null) browser.process().kill('SIGINT');

I'm also posting the full code of my puppeteer resources manager below. Take a look at bw.on('disconnected', async () => {

const puppeteer = require('puppeteer-extra')
const randomUseragent = require('random-useragent');
const StealthPlugin = require('puppeteer-extra-plugin-stealth')

const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
puppeteer.use(StealthPlugin())

function ResourceManager(loadImages) {
    let browser = null;
    const _this = this;
    let retries = 0;
    let isReleased = false;

    this.init = async () => {
        isReleased = false;
        retries = 0;
        browser = await runBrowser();
    };

    this.release = async () => {
        isReleased = true;
        if (browser) await browser.close();
    }

    this.createPage = async (url) => {
        if (!browser) browser = await runBrowser();
        return await createPage(browser,url);
    }

    async function runBrowser () {
        const bw = await puppeteer.launch({
            headless: true,
            devtools: false,
            ignoreHTTPSErrors: true,
            slowMo: 0,
            args: ['--disable-gpu','--no-sandbox','--no-zygote','--disable-setuid-sandbox','--disable-accelerated-2d-canvas','--disable-dev-shm-usage', "--proxy-server='direct://'", "--proxy-bypass-list=*"]
        });

        bw.on('disconnected', async () => {
            if (isReleased) return;
            console.log("BROWSER CRASH");
            if (retries <= 3) {
                retries += 1;
                if (browser && browser.process() != null) browser.process().kill('SIGINT');
                await _this.init();
            } else {
                throw "===================== BROWSER crashed more than 3 times";
            }
        });

        return bw;
    }

    async function createPage (browser,url) {
        const userAgent = randomUseragent.getRandom();
        const UA = userAgent || USER_AGENT;
        const page = await browser.newPage();
        await page.setViewport({
            width: 1920 + Math.floor(Math.random() * 100),
            height: 3000 + Math.floor(Math.random() * 100),
            deviceScaleFactor: 1,
            hasTouch: false,
            isLandscape: false,
            isMobile: false,
        });
        await page.setUserAgent(UA);
        await page.setJavaScriptEnabled(true);
        await page.setDefaultNavigationTimeout(0);
        if (!loadImages) {
            await page.setRequestInterception(true);
            page.on('request', (req) => {
                if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
                    req.abort();
                } else {
                    req.continue();
                }
            });
        }

        await page.evaluateOnNewDocument(() => {
            //pass webdriver check
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false,
            });
        });

        await page.evaluateOnNewDocument(() => {
            //pass chrome check
            window.chrome = {
                runtime: {},
                // etc.
            };
        });

        await page.evaluateOnNewDocument(() => {
            //pass plugins check
            const originalQuery = window.navigator.permissions.query;
            return window.navigator.permissions.query = (parameters) => (
                parameters.name === 'notifications' ?
                    Promise.resolve({ state: Notification.permission }) :
                    originalQuery(parameters)
            );
        });

        await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'plugins', {
                // This just needs to have `length > 0` for the current test,
                // but we could mock the plugins too if necessary.
                get: () => [1, 2, 3, 4, 5],
            });
        });

        await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'languages', {
                get: () => ['en-US', 'en'],
            });
        });

        await page.goto(url, { waitUntil: 'networkidle2',timeout: 0 } );
        return page;
    }
}

module.exports = {ResourceManager}
Weeds answered 3/10, 2020 at 21:55 Comment(4)
This seems very overengineered. You don't need this much boilerplate just to ensure the browser closes. For all the effort, there are no try/catch/finally blocks here, so if anything throws, it still hangs. Also, await page.setDefaultNavigationTimeout(0) introduces the potential for an infinite hang as well. timeout: 0 same thing although that's redundant. networkidle2 is a last resort, not a default navigation strategy and also introduces hanging potential on some sites.Sufferable
@Sufferable It shows what to do if the browser is hung up. And shows the example context with the working code (my context it may not be relevant in your case). Without killing it I experienced memory leakage that leads to node resources overload.Weeds
SIGINT-ing the browser process is a last-resort hack, usually used by folks who set up their Puppeteer code incorrectly. I pointed out the many potential hang points in this code that might make you think the only way out is a SIGINT. It's much better to fix the problems at the source.Sufferable
@Sufferable got it, i will look at the state of my code and make some updates. But as for now i have a cluster that is running for 2 years constantly scrapping and all works well without any intervention.Weeds
G
9

I've ran into this issue today myself and I've found a solution. It seems that the issue with Chromium not closing is due to not closed pages. Close all the pages before calling browser.close() and everything should be fine:

const pages = await browser.pages();
for (let i = 0; i < pages.length; i++) {
    await pages[i].close();
}
await browser.close()

Hopefully that helps someone!

Guilty answered 19/6, 2023 at 10:27 Comment(3)
it is true, for some reason pages are not being closed. I did this and it worked great for mePushcart
in one line: await Promise.all((await browser.pages()).map((page) => page.close()));Invalidity
Closing the browser should destroy all pages, so this shouldn't be necessary. Pages are tabs within the browser, so unless there's a bug in Puppeteer that leaks memory, all resources managed within the browser, including pages, should be cleaned up with browser.close(), and the process should be able to exit cleanly. If this was actually necessary in a particular situation, I'd like to see a reproducible example where not doing it leaves the process hanging, preferably with a particular Puppeteer version.Sufferable
S
6

I solve it with https://www.npmjs.com/package/shelljs

var shell = require('shelljs');
shell.exec('pkill chrome')
Selfwinding answered 27/12, 2018 at 8:47 Comment(7)
this works for me, somehow browser.close doesn't work but this command works.Tragedian
will it kill all chrome instances running in system or it will only the chrome instances puppeteer script startedLucilelucilia
it suppose to kill all processSelfwinding
can't believe this is the accepted answer. Not for the answer itself, but for the fact that we are almost forced to do this to simply close puppeteer processesPledge
This answer is terrible, because it will kill all chrome instances. In the original code, it was clear that puppeteer was running in an API, meaning that if there are more than 1 api request at the same time, the other requests will fail.Gradygrae
@Pledge you're not forced to do this to close Puppeteer processes. Using a finally block and giving proper attention to control flow, resource management and error handling is the correct way. Most folks who have memory leaks also have sloppy design that makes it hard for them to tell where their open resources are (or aren't) being closed (if they're even aware of the need to close resources).Sufferable
This solution is too radical. Closing the pages and the browser in a finally block is sufficient, no need to go to such extremes.Cliffordclift
S
3

I use the following basic setup for running Puppeteer:

const puppeteer = require("puppeteer");

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();

  /* use the page */
  
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Here, the finally block guarantees the browser will close correctly regardless of whether an error was thrown. Errors are logged (if desired). I like .catch and .finally as chained calls because the mainline Puppeteer code is one level flatter, but this accomplishes the same thing:

const puppeteer = require("puppeteer");

(async () => {
  let browser;

  try {
    browser = await puppeteer.launch();
    const [page] = await browser.pages();

    /* use the page */
  }
  catch (err) {
    console.error(err);
  }
  finally {
    await browser?.close();
  }
})();

There's no reason to call newPage because Puppeteer starts with a page open.


As for Express, you need only place the entire code above, including let browser; and excluding require("puppeteer"), into your route, and you're good to go, although you might want to use an async middleware error handler.

You ask:

Is there a better way to get the same result other than puppeteer and headless chrome?

That depends on what you're doing and what you mean by "better". If your goal is to get document.body.innerHTML and the page content you're interested in is baked into the static HTML, you can dump Puppeteer entirely and just make a request to get the resource, then use Cheerio to extract the desired information.

Another consideration is that you may not need to load and close a whole browser per request. If you can use one new page per request, consider the following strategy:

const express = require("express");
const puppeteer = require("puppeteer");

const asyncHandler = fn => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next);

const browserReady = puppeteer.launch({
  args: ["--no-sandbox", "--disable-setuid-sandbox"]
});

const app = express();
app
  .set("port", process.env.PORT || 5000)
  .get("/", asyncHandler(async (req, res) => {
    const browser = await browserReady;
    const page = await browser.newPage();

    try {
      await page.goto(req.query.url || "http://www.example.com");
      return res.send(await page.content());
    }
    catch (err) {
      return res.status(400).send(err.message);
    }
    finally {
      await page.close();
    }
  }))
  .use((err, req, res, next) => res.sendStatus(500))
  .listen(app.get("port"), () =>
    console.log("listening on port", app.get("port"))
  );

Finally, make sure to never set any timeouts to 0 (for example, page.setDefaultNavigationTimeout(0);), which introduces the potential for the script to hang forever. If you need a generous timeout, at most set it to a few minutes--long enough not to trigger false positives.

See also:

Sufferable answered 9/6, 2021 at 19:3 Comment(0)
V
1

I encountered this issue using chromium browser (@sparticuz/chromium). Following the issue forum, closing all the pages made a change. It looks like there is some extra page or tab opened by chromium and you need to ensure to close them all.

const pages = await browser.pages();
await Promise.all(pages.map((page) => page.close()));
await browser.close();
Vial answered 27/8, 2023 at 13:47 Comment(0)
H
0

try to close the browser before sending the response

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

router.get('/', function(req, res, next) {
    (async () => {
        headless = true;
        const browser = await puppeteer.launch({headless: true});
        const page = await browser.newPage();
        url = req.query.url;
        await page.goto(url);
        let bodyHTML = await page.evaluate(() => document.body.innerHTML);
        await browser.close();
        res.send(bodyHTML);
    })();
});
Headsman answered 4/3, 2019 at 14:31 Comment(2)
Putting browser.close() above res.send() slows down sending the response for no reason and doesn't have any impact on closing the browser. If anything throws, the browser will leak and the request will hang. Use finally and always handle errors.Sufferable
headless = true; is unused and should be const, url = req.query.url; should be const and await page.evaluate(() => document.body.innerHTML); should be await page.content().Sufferable
P
0

I ran into the same issue and while your shelljs solution did work, it kills all chrome processes, which might interrupt one that is still processing a request. Here is a better solution that should work.

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

router.get('/', function (req, res, next) {
    (async () => {
        await puppeteer.launch({ headless: true }).then(async browser => {
            const page = await browser.newPage();
            url = req.query.url;
            await page.goto(url);
            let bodyHTML = await page.evaluate(() => document.body.innerHTML);
            await browser.close();
            res.send(bodyHTML);
        });
    })();
});
Pledgee answered 17/8, 2019 at 2:14 Comment(2)
I adapted his code to one of the sample code found on the Puppeteer web site. To be honest, I'm kind of new to async/await and Promises so I don't fully understand why executing the browser object code in a then() is any different than doing a series of statements preceded by await. All I know is that my code is very similar and changing it to that fixed the issue.Pledgee
You're correct in mentioning that shelljs is a dirty hack, but as I've mentioned on other answers, this code doesn't work. If any throws occur, the browser hangs and the request never completes. Mixing async/await is really confusing and makes it even harder to plug the holes in the code here. I won't re-mention the other issues I've raised in other comments.Sufferable
Z
0

use

 (await browser).close()

that happens because what the browser contains is a promise you have to solve it, I suffered a lot for this I hope it helps

Zeuxis answered 31/1, 2022 at 22:51 Comment(5)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Saturated
This is wrong. browser is an object of type Browser, as you can see in the documentation: puppeteer.github.io/puppeteer/docs/puppeteer.browserBonne
I don't know how, But this actually worked for me. I was using a puppeteer instance outside of my router function. Then call it inside the router function. I was getting error's like browser.newPage() is not a function.Dicker
Then your "browser" object probably was a Promise<Browser> in your case and thus didn't have the "close" method, but that's not how it usually is.Bonne
This code doesn't make sense. browser isn't generally a promise, it's an object. .close() is the method that returns a promise, so await browser.close() is the correct approach. If OP has used the variable browser as a promise, that's nonstandard, but even then, the correct solution would be await (await browser).close() so the .close() promise is awaited (but don't actually do this pattern in the first place).Sufferable
D
0

the try-catch-finally approach did not work for me and going with shelljs' shell.exec('pkill chrome') feels a desperate move.

in my case, the problem was, I had redis' await cache.set('key', 'value') somewhere on my code, that needs to be closed first, so I have to call await cache.quit() before await browser.close(). this solved my issue.

I suggest you need to check for the libs or modules you used, that requires closing/quiting first, specially those that are continuously running and not throwing any errors, on which try-catch wont help, thus preventing browser to be closed.

Disconcerted answered 22/6, 2023 at 17:1 Comment(0)
S
0

I run puppeteer inside docker (with docker-compose). What worked for me to reap the zombie processes is adding init: true in the docker-compose.yml file in the service where puppeteer was run.

services:
  web:
    image: alpine:latest
    init: true

References

  1. https://pptr.dev/troubleshooting#running-puppeteer-in-docker
  2. https://mcmap.net/q/245512/-what-39-s-the-docker-compose-equivalent-of-docker-run-init
  3. https://docs.docker.com/compose/compose-file/compose-file-v2/#init
Schiro answered 13/9, 2023 at 18:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.