Embedding all the external resources of an HTML page into a single file using javascript in the browser
Asked Answered
P

2

15

As you all know, external resources, like images, can be embedded into the html file using base64 encoding:

<img src="data:image/png;base64,iVBORw0KGgoAAAANS..." />

I'm looking for a pure browser-based javascript way to traverse an html page and embed all the external resources into the file so when I say $("html").html(), it returns all the page's contents. Even including its external resources.

Just so it makes sense, I'm trying to download web pages into single files using a headless browser on my server.

Probability answered 27/10, 2014 at 19:32 Comment(2)
If you're using JS, why encode the images?Cull
Because JS can easily traverse all the html elements. Otherwise I'll need a parser to read and turn the tags into DOM objects before I can query them for external resources.Probability
S
13

There are tools out there to do that. Examples:

While there are benefits to this approach, remember that a page visited more than once, or site with multiple pages with same JS/CSS files will enjoy client (browser) side caching.

Squeamish answered 27/10, 2014 at 19:42 Comment(4)
I'm sorry, I forgot to mention that by javascript I mean a browser-based one. I'm looking for a non-NodeJs solution.Probability
the tools I suggest are running as one time on the server, to generate the client side js/css. There are not server side solution, just tools.Squeamish
I know, but I'm looking for a solution that uses a web browser. I find this much more stable solution than NodeJs as a web browser's parser is much more powerful than any other. I intend to use PhantomJs with javascript.Probability
Does anyone have had any experience with those tools and could recomend one?Asbury
S
1

Browser extensions

There are Save Page WE extension for Firefox and Chrome:

This extension can scroll or zoom out the page in order to allow fetching lazy-loading resources before saving.

Command line tools

There is also the inliner npm module which exposes the inliner command line utility — it works with some URLs but throws an error with others. It pipes the output to stdout and therefore needs to be used like e.g. inliner https://http.cat > cats.html.

It can be installed with (assuming you have nodejs+npm):

npm install -g inliner
Sancho answered 1/3, 2022 at 14:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.