Convert HTML to PDF or PNG without headeless browser instance in NodeJS
Asked Answered
S

3

16

TL;DR:

  1. Any suggestions in NodeJS to convert an HTML to PDF or PNG without any headless browser instances.
  2. Also anyone uses puppeteer in any production environment. I would like to know how the resource utilisations and performance of running headless browser in prod.

Longer version:

In a NodeJS server we need to convert an HTML string to a PDF or PNG based on the request params. We are using puppeteer to generate this PDF and PNG (screenshot) deployed in a google cloud function. In my local running this application in a docker and restricted memory usage to 100MB and this seems working. But in cloud function it throws memory limit exception when we set the cloud function to 250MB memory. For a temporary solution we upgraded the cloud function to 1 GB.

We would like to try any alternatives for puppeteer without any headless browser approach. Another library PDF-Kit looks good but it have canvas api kind of input. We can't directly feed html.

Any thoughts or input on this

Shanahan answered 27/1, 2021 at 6:49 Comment(0)
M
3

If you can use Docker, then a great solution for you may be Gotenberg.

It's an incredible service that can convert a lot of formats (HTML, Markdown, Word, Excel, etc.) into PDF.

If your page render depends on JavaScript, then no problem, it will run it and wait (you can even configure the max wait time) for the page to be completely rendered to generate your PDF.

We are using it for an application that generates 3000 PDFs per day and never had any issue with it.

Demo:

Take a look at this sample HTML invoice: https://sparksuite.github.io/simple-html-invoice-template/

Now let's convert it to PDF:

enter image description here

Boom, done!

1: Gotenberg URL (here using a demo endpoint provided by Gotenberg team with some limitations like 2 requests per second per IP and 5MB body limit)

2: pass an url parameter with the URL of the webpage you want to convert

3: You get the PDF as the HTTP response with Content-Type application/pdf

Curl version:

curl --location --request POST 'https://demo.gotenberg.dev/forms/chromium/convert/url' \
--form 'url="https://sparksuite.github.io/simple-html-invoice-template/"' \
-o myfile.pdf

Node.JS version:

const fetch = require('node-fetch');
const FormData = require('form-data');
const fs = require('fs');

async function main() {
  const formData = new FormData();
  formData.append('url', 'https://sparksuite.github.io/simple-html-invoice-template/')
  const res = await fetch('https://demo.gotenberg.dev/forms/chromium/convert/url', {
    method: 'POST',
    body: formData
  })
  const pdfBuffer = await res.buffer()
  // You can do whatever you like with the pdfBuffer, such as writing it to the disk:
  fs.writeFileSync('/home/myfile.pdf', pdfBuffer);
}

main()

Using your own Docker instance instead of the demo endpoint, here is what you need to do:

1. Create the Gotenberg Docker container:

docker run -p 3333:3000 gotenberg/gotenberg:7 gotenberg

2. Call the http://localhost:3333/forms/chromium/convert/url endpoint:

enter image description here

Curl version:

curl --location --request POST 'http://localhost:3333/forms/chromium/convert/url' \ 
--form 'url="https://sparksuite.github.io/simple-html-invoice-template/"' \
-o myfile.pdf

Node.JS version:

const fetch = require('node-fetch');
const FormData = require('form-data');
const fs = require('fs');

async function main() {
  const formData = new FormData();
  formData.append('url', 'https://sparksuite.github.io/simple-html-invoice-template/')
  const res = await fetch('http://localhost:3333/forms/chromium/convert/url', {
    method: 'POST',
    body: formData
  })
  const pdfBuffer = await res.buffer()
  // You can do whatever you like with the pdfBuffer, such as writing it to the disk:
  fs.writeFileSync('/home/myfile.pdf', pdfBuffer);
}

main()

Gotenberg homepage: https://gotenberg.dev/

Movement answered 4/11, 2022 at 9:24 Comment(3)
Doesn't gotenberg use Chromium under the hood? The question asked for a non-headless browser solutionIlluminant
@cdimitroulas, you're right, it does use Chromium. I interpreted "I want to convert HTML to PDF without headless browser instance" to "Please suggest a solution where I don't need to write code to create/navigate a headless browser because it's a pain", so I suggested this approach that just works without the headless browser hassle, but my interpretation might be wrong, indeed.Movement
This is chromium so I have voted the answer down. It is a nice find though, so thank you. I am specifically looking for html/react -> PDF that is suitable for a serverless function environment. The headless chromium is 50-60mb which is too much. Using pdf builds instead of html is poor as it is effort to construct the pdf.Drawstring
A
-1

Any suggestions in NodeJS to convert an HTML to PDF or PNG without any headless browser instances.

Yes, you can try with jsPDF. I never used it before. The syntax is simple.
Under the hood it looks no headless browser libraries are used and it seems this is a 100% pure javascript implementation.
You can feed the library directly with and HTML string.
BUT there is no png option. For images anyway there are a lot of solution that could be combined with jsPDF (so, HTML to PDF to PNG) or also other HTML to PNG direct solutions. Take a look here.

Also anyone uses puppeteer in any production environment. I would like to know how the resource utilisations and performance of running headless browser in prod.

When you want use puppeteer, I suggest to split services: a simple http server that must just handle the HTTP communication with your clients and a separate puppeteer service. Both services must be scalable but, ofcourse, the second will require more resources to run. To optimize resorces, I suggest using puppeter-cluster to create a cluster of puppeteer workers. You can better handle errors, flow and concurrency and at the same time you can save memory by using a single istance of Chromium (with the CONCURRENCY_PAGE or CONCURRENCY_CONTEXT model)

Alcaraz answered 30/10, 2022 at 14:55 Comment(5)
Looks to be client-side, I'm more interested on the server-sideMussman
that's not true! You can use the library both in browsers and in node.js based services. take a Better look to the documention..Alcaraz
github.com/parallax/jsPDF#running-in-nodejsAlcaraz
Taking a closer look, jsPDF uses htmltocanvas, which clearly says in it's Readme that is meant to be used in the browser. Makes sense, imagine making a tool that can understand all HTML/CSS and remains up to date with spec. You'd be making a proto-browser.Moonshot
"You can feed the library directly with and HTML string." Node from node.js. That function only works in a browser.Eighteenth
M
-1

If you have access to command wkhtmltopdf, I recommended it.

We use with success in our production website to generate pdfs.

First generate file_name html file, then

wkhtmltopdf --encoding utf8 --disable-smart-shrinking --dpi 100 -s {paper_size} -O {orientation}  '{file_name}'
Militate answered 4/11, 2022 at 10:27 Comment(2)
wkhtmltopdf.org is in fact headless WebKit under the hood what OP forbid, but maybe it could be less demanding than his current approach after all.Florida
I tried to refactor to not use wkhtmltopdf, but solutions found was to slow to generate pdf from html. The other options was to manually create those pdf using some server side pdf library. Not time to explore this solution.Militate

© 2022 - 2024 — McMap. All rights reserved.