Chrome extension: DOMParser is not defined with Manifest v3
Asked Answered
C

2

13

I have developped an extension to scrape some content from web page and up to now it was working fine but since I switched to manifest v3, the parsing doesn't work anymore.

I use the following script to read the source code:

chrome.scripting.executeScript( 
  {
    target: {tabId: tab.id, allFrames: true},
    files: ['GetSource.js'],
  }, async function(results) 
  {
    // GETTING HTML
    parser = new DOMParser();
    content = parser.parseFromString(results, "text/html");

... ETC ... This code used to work fine but now I get the following message in my console:

Uncaught (in promise) ReferenceError: DOMParser is not defined

The code is part of a promise but I don't think the promise is the problem here. I basically need to load the source code into a variable so that I can parse it afterwards.

I've checked the documentation but I haven't found something mentionned that DOMParser was not going to work with v3.

Any idea?

Thanks

Circassian answered 28/8, 2021 at 12:52 Comment(2)
The background script is a service worker now so it doesn't have any DOM stuff. You'll have to load a javascript library to parse HTML or use DOMParser in a visible page of your extension e.g. in the popup.Severity
ah, that explains the problem, thanks. That's very annoying :( My pop-up contains a search field where I can enter a keyword (ex: a product) that will be searched accross multiple sites. Can I simply move my background.js scripts to popup.js? The benefit of background.js is that is was not annoying for the end user.Circassian
B
7

From the docs:

Since service workers don't have access to DOM, it's not possible for an extension's service worker to access the DOMParser API or create an to parse and traverse documents.

Using an external library just for doing what DomParser already does?? It is too heavy.

To work-around with it, we can use an offscreen document. It's just invisible webpage where you can run fetch, audio, DomParser, ... and communicate with background (service_worker) via chrome.runtime.

See an example below:

background.js

// create (load) the offscreen document (xam.html)
chrome.offscreen.createDocument({
    url: chrome.runtime.getURL('xam.html'),
    reasons: [chrome.offscreen.Reason.DOM_PARSER],
    justification: 'reason for needing the document',
});

// This is simply a test.
// It represents a scenario where, after three seconds, you want to fetch a webpage and extract HTML.
// Once the three seconds have elapsed, we send a 'broadcast' out to the listeners of our extension.
// The listener in the offscreen document will handle the job and send back us with its resulting data.

setTimeout(() => {
    const onDone = (result) => {
        console.log(result);
        chrome.runtime.onMessage.removeListener(onDone);
    };
    chrome.runtime.onMessage.addListener(onDone);
    chrome.runtime.sendMessage('from-background-page');
}, 3000);

xam.html

<html lang="en">

<head>
  <meta charset="UTF-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Document</title>
</head>

<body>
  <script src="xam.js">
  </script>
</body>

</html>

xam.js

async function main() {
    const v = await fetch('https://......dev/').then((t) => t.text());
    const d = new DOMParser().parseFromString(v, 'text/html');
    const options = Array.from(d.querySelector('select').options)
        .map((v) => `${v.value}|${v.text}`)
        .join('\n');
    chrome.runtime.sendMessage(options);
}

chrome.runtime.onMessage.addListener(async (msg) => {
    console.log(msg);
    main();
});

manifest.json

  "permissions": [
    // ...
    "offscreen"
  ]

https://developer.chrome.com/docs/extensions/reference/offscreen/

The extension's permissions carry over to offscreen documents, but extension API access is heavily limited. Currently, an offscreen document can only use the chrome.runtime APIs to send and receive messages; all other extension APIs are not exposed.

Notes:

  • I haven't tested how long this offscreen document alive.
  • Just sample codes, it should work. Customzie as your own cases.
Biogeography answered 10/2, 2023 at 16:58 Comment(5)
Let's say I have the html as string , doesn't this method need you to save the content as a file(ie xam.html) first . How do I do from string directly ?Microscopic
@ishandutta2007, I don't think we can. According to chrome.offscreen.createDocument documentation, it requires us to specify a URL for the document we want to create.Biogeography
then we are back to the ones which @yidoon suggested. Actually if I already know the static html upfront I don't think I would need all these hassle anyway. I would do all the required extraction offline such that I don't even need to add the html to the repo. Can you tell a use-case where your method is useful , maybe I am missing something ?Microscopic
To clarify, the xam.html file is serving as a host for running xam.js; it is not the content you want to parse. You can then send your HTML string to the xam.js file via Chrome messaging (chrome.runtime.sendMessage or port.postMessage). The DOMParser should be available inside xam.js for parsing your content.Biogeography
Can you add that piece of code as well. I can see only one *.html file in your entire code. So what I am understanding is dom-parser is required in xam.js, is it ? if so then it beats the purpose of going this convoluted path .Microscopic
L
1

Since service workers don't have access to DOM, it's not possible for an extension's service worker to access the DOMParser API or create an

to parse and traverse documents.

More detail

And I solve the problem by using library dom-parser.The code could be like this

import DomParser from "dom-parser";
const parser = new DomParser();
const dom = parser.parseFromString('you html string');
Lydell answered 4/12, 2022 at 12:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.