Parallelism of Puppeteer with Express Router Node JS. How to pass page between routes while maintaining concurrency
Asked Answered
D

1

3
app.post('/api/auth/check', async (req, res) => {
try {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(
    'https://www.google.com'
  );
  res.json({message: 'Success'})
} catch (e) {
  console.log(e);
  res.status(500).json({ message: 'Error' });
}});

app.post('/api/auth/register', async (req, res) => {
  console.log('register');
  // Here i'm need to transfer the current user session (page and browser) and then perform actions on the same page.
  await page.waitForTimeout(1000);
  await browser.close();
}});

Is it possible to somehow transfer page and browser from one route to another while maintaining puppeteer concurrency. If you set the variable globally, then the page and browser will be overwritten and multitasking will not work.

Donelson answered 3/4, 2021 at 21:7 Comment(1)
Welcome to SO! I'm not sure if the terms "parallelism", "concurrency" and "multitasking" are what you're looking for here. Node is single-threaded, async event-driven. I think you mean you want to maintain individual browser instances across routes without blocking the event loop and I answered under this assumption. Feel free to clarify if your intent is something else.Brainsick
B
3

One approach is to create a closure that returns promises that will resolve to the same page and browser instances. Since HTTP is stateless, I assume you have some session/authentication management system that associates a user's session with a Puppeteer browser instance.

I've simplified your routes a bit and added a naive token management system to associate a user with a session in the interests of making a complete, runnable example but I don't think you'll have problems adapting it to your use case.

import express from "express"; // ^4.19.2
import puppeteer from "puppeteer"; // ^22.10.0

// https://stackoverflow.com/a/51391081
const asyncHandler = fn => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next);

const startPuppeteerSession = async () => {
  const browser = await puppeteer.launch({headless: false});
  return (await browser.pages())[0];
};
const sessions = {};

express()
  .use((req, res, next) => 
    req.query.token === undefined ? res.sendStatus(401) : next()
  )
  .get("/start", asyncHandler(async (req, res) => {
    sessions[req.query.token] = await startPuppeteerSession();
    res.sendStatus(200);
  }))
  .get("/navigate", asyncHandler(async (req, res) => {
    const page = await sessions[req.query.token];
    await page.goto(req.query.to || "http://www.example.com");
    res.sendStatus(200);
  }))
  .get("/content", asyncHandler(async (req, res) => {
    const page = await sessions[req.query.token];
    res.send(await page.content()); 
  }))
  .get("/kill", asyncHandler(async (req, res) => {
    await sessions[req.query.token].browser().close();
    delete sessions[req.query.token];
    res.sendStatus(200);
  }))
  .use((err, req, res, next) => res.sendStatus(500))
  .listen(8000, () => console.log("listening on port 8000"));

Sample usage from the client's perspective:

$ curl localhost:8000/start?token=1
OK
$ curl 'localhost:8000/navigate?to=https://stackoverflow.com/questions/66935883&token=1'
OK
$ curl localhost:8000/content?token=1 | grep 'apsenT'
        <a href="/users/15547056/apsent">apsenT</a><span class="d-none" itemprop="name">apsenT</span>
            <a href="/users/15547056/apsent">apsenT</a> is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        <a href="/users/15547056/apsent">apsenT</a> is a new contributor. Be nice, and check out our <a href="/conduct">Code of Conduct</a>.
$ curl localhost:8000/kill?token=1
OK

You can see the client associated with token 1 has persisted a single browser session across multiple routes. Other clients can launch browser sessions and manipulate them simultaneously.

To reiterate, this is only a proof-of-concept of sharing a Puppeteer browser instance across routes. Using the code above, a user can just spam the start route and create browsers until the server crashes, so this is totally unfit for production without real authentication and session management/error handling.

Brainsick answered 5/4, 2021 at 0:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.