Capture or save a current state (HTML elements) of a dynamic page from a SPA
Asked Answered
A

3

8

It's possible to retrieve current state from a SPA (made on frameworks like Angular, React, etc)?

By current sate I meant a snapshot/save/export of all current HTML elements, as well styles/images and the data being shown at the moment? So can be [statically] consumed afterwards with a browser and file://?

Example:

<html>
    <head />
    <body>
        <h1>Welcome, <span ng-model="userController.username">John Doe</span></h1>
        <!-- etc -->
    </body>
</html>

Being the John Doe the current data shown by that controller.

As I tried, Save As... on browsers does not work. Save the HTML with CSS but full of tags {{variableName}}. And I assume that depending on the SPA was developed not even saves the desired page, instead saves the master/root/main page of the SPA.

There are other tools HTTrack Website Copier but from the usage I had on past this works best for static pages, I think.

Any suggestion of tools, browser extensions or even techniques that allow me to develop the tool or extension to achieve this?

Attorneyatlaw answered 11/9, 2019 at 10:53 Comment(2)
I am not sure how to achieve this (did you try the developer tools embedded in browsers lke Chrome) - but you want the HTML representation of the DOM that has been built by JavaScript. If you search for that you might find an answerAcrolith
@Acrolith I have done that, I have looked for it already without luck. That's why I recurred to this community.Attorneyatlaw
G
2

The "save as" functionality saves the source code (will save a blank page for a SPA).

I think your best option is to use the following command copy(document.documentElement.innerHTML) and then paste the code into an empty HTML file.

The problem is that all of the external resources (loaded via src attribute) won't be embedded so you have to do the job yourself with a script before using the copy command. Use const documentClone = document.documentElement.cloneNode(true); and then:

  • Parse all of the <style> tags and build a single inline <style> tag
  • Remove <script> tags?
  • For other tags <img>, <video>, etc... fetch them and replace them with inline Data URLs
Gaslight answered 16/10, 2019 at 11:56 Comment(0)
P
2

As others mentioned "Save As" does not always produce desirable result. In my case it always modified the data to some extent or saved no data at all due to app being a SPA..

so I came up with this solution based on multiple sources

async function saveToFile() {
    const handle = await showSaveFilePicker({
        suggestedName: 'index.html',
        types: [{
              description: 'HTML',
              accept: {'text/html': ['.html']},
          }]
    });
    const writable = await handle.createWritable();
    await writable.write(document.body.parentNode.innerHTML);
    writable.close();
}; saveToFile();

it uses the browser API to prompt user with "save as" dialog and even sets the default filename. You could modify it accordingly.

paste it into a console window. Or alternatively you could probably install some kind of Chrome extension and put the script in there for easy access.

Peat answered 29/5, 2022 at 8:56 Comment(0)
D
0

document.body.parentNode.innerHTML should do the trick, then (using code from another answer) you may download directly the string as file.

Here is a working snippet:

var pageSource = document.body.parentNode.innerHTML;
var downloadLink = document.createElement('a');
downloadLink.href = "data:text/html," + unescape(encodeURI( escape(pageSource) ));
downloadLink.target = '_blank';
downloadLink.download = 'page.html';
downloadLink.click();
Duffel answered 11/10, 2019 at 10:18 Comment(2)
This only retrieves the plain HTML without any CSS (not even embedded on HTML). There is a way to add the images and styles/CSS to the page.html?Attorneyatlaw
I think that we could get a list of linked files by analyzing the HTML string, and we could copy all contents insite the single HTML file (CSS and JS as text, Images as base64 data) and finally generate a stand-alone copy of the current state of the page, but it would be be way more complex.Duffel

© 2022 - 2024 — McMap. All rights reserved.