Opening local HTML file using Puppeteer
Asked Answered
H

8

91

Is it possible to open a local HTML file with headless Chrome using Puppeteer (without a web server)? I could only get it to work against a local server.

I found setContent() and goto() in the Puppeteer API documentation, but:

  1. page.goto: did not work with a local file or file://.
  2. page.setContent: is for an HTML string
Hookup answered 1/12, 2017 at 5:48 Comment(0)
C
72

I just did a test locally (you can see I did this on windows) and puppeteer happily opened my local html file using page.goto and a full file url, and saved it as a pdf:

'use strict';

const puppeteer = require('puppeteer');    
(async() => {    
const browser = await puppeteer.launch();
const page = await browser.newPage();    
await page.goto('file://C:/Users/compoundeye/test.html');    
await page.pdf({
  path: 'test.pdf',
  format: 'A4',
  margin: {
        top: "20px",
        left: "20px",
        right: "20px",
        bottom: "20px"
  }    
});    
await browser.close();    
})();

If you need to use a relative path might want to look at this question about the use of relative file paths: File Uri Scheme and Relative Files

Clarabelle answered 5/12, 2017 at 0:53 Comment(3)
It looks much nicer with await page.goto(`file:${path.join(__dirname, 'test.html')}`);Thole
if you're a Node.js noob like me, don't forget to define path before: const path = require('path');Febrifacient
If you're a Node.js noob like me, but you are using new es6 features in your project import * as path from 'path' [trying to be fun and make the world happier :)]Robinett
S
57

If file is on local, using setContent will be better than goto

var contentHtml = fs.readFileSync('C:/Users/compoundeye/test.html', 'utf8');
await page.setContent(contentHtml);

You can check performance between setContent and goto at here

Silo answered 13/10, 2018 at 5:24 Comment(1)
While setContent is faster than goto, I find that waitUntil: 'networkidle2' will make setContent take twice as long as goto with waitUntil: 'networkidle2' option.Catechism
K
14

Let's take a screenshot of an element from a local HTML file as an example.

import puppeteer from 'puppeteer';


(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    
    //  __dirname is a global node variable that corresponds to the absolute 
    // path of the folder containing the currently executing file
    await page.goto(`file://${__dirname}/pages/test.html`);

    const element = await page.$('.myElement');

    if (element) {
        await element.screenshot({
            path: `./out/screenshot.png`,
            omitBackground: true,
        });
    }

    await browser.close();
})();
Kalman answered 18/8, 2020 at 17:58 Comment(1)
I had to remove : omitBackground: trueGlutton
G
8

Navigation to local files only works if you also pass a referer of file://, otherwise security restrictions prevent this from succeeding.

Glamorous answered 11/10, 2019 at 15:16 Comment(0)
G
6

Why not open the HTML file read the content, then "setContent"

Grizzled answered 9/8, 2018 at 14:5 Comment(2)
Other way javascript await page.goto(`data:text/html,${pageHtml}`, { waitUntil: 'networkidle0' }); Grizzled
Using page.goto(`data:text/html,${html}`) led to some issues with special characters for us, setContent is definitely a better solution! duo to this postRobinett
D
5

You can use file-url to prepare the URL to pass to page.goto:

const fileUrl = require('file-url');
const puppeteer = require('puppeteer');    

const browser = await puppeteer.launch();
const page = await browser.newPage();   
 
await page.goto(fileUrl('file.html'));    
 
await browser.close();    
Dumbwaiter answered 17/10, 2020 at 4:38 Comment(0)
W
4

I open the file I wanted to load into the browser and copied the URL to make sure all the \'s where correct.

await page.goto(`file:///C:/pup_scrapper/testpage/TM.html`);
Wooldridge answered 19/10, 2018 at 7:37 Comment(0)
J
1

tl;dr there are caveats using page.setContent() in blank page

As noted by other answers, you can read the file using a Node API and then call page.setContent() for more flexibility over page.goto(). However, there are some limitations when the about:blank (default) page is displayed such as relative resources not loaded (more info here).

A workaround is to create an empty empty.html file, navigate to it and then call page.setContent():

// would typically load from a file
const html = '<!DOCTYPE html><title>Hello</title><p>World</p>';
await page.goto('file://empty.html', { waitUntil: 'load' });
await page.setContent(html, { waitUntil: 'networkidle0' });

If you want to load other resources locally which are not available using file://, you can take advantage of page.setRequestInterception():

import path from 'path';

let resources = [
    'style.css': {
        content: Buffer.from('p {color: navy;}'),
        mimetype: 'text/css'
    }
]

page.on('request', interceptedRequest => {
    const url = new URL(interceptedRequest.url());

    if (url.protocol === 'file:' && url.pathname !== 'empty.html') {
        const resourceName = path.basename(url.pathname); // Extract the file name
        const resource = resources[resourceName];
        if (resource) {
            interceptedRequest.respond({
                status: 200,
                contentType: resource.mimetype,
                body: resource.content,
            });
        } else {
            interceptedRequest.abort();
        }
    } else {
        interceptedRequest.continue();
    }
});
Johnnyjohnnycake answered 16/4 at 21:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.