Getting unparsed (raw) HTML with JavaScript
Asked Answered
S

3

12

I need to get the actual html code of an element in a web page.

For example if the actual html code inside the element is "How to fix"

Running this JavaScript:

getElementById('myE').innerHTML

Gives me "How to fix" which is the parsed HTML.

How can I get the unparsed "How to fix" using JavaScript?

Sophister answered 11/10, 2010 at 10:8 Comment(2)
The correct Javascript property is innerHTML, not innerHtmlImmune
Make sure that when you're displaying the string from getElementById('myE').innerHtml that's it not being re-interpreted as HTML resulting in not showing the non-breaking space code.Milburt
A
6

What you have should work:

Element test:

<div id="myE">How to&nbsp;fix</div>​

JavaScript test:

alert(document.getElementById("myE​​​​​​​​").innerHTML); //alerts "How to&nbsp;fix"

You can try it out here. Make sure that wherever you're using the result isn't show &nbsp; as a space, which is likely the case. If you want to show it somewhere that's designed for HTML, you'll need to escape it.

Arrogance answered 11/10, 2010 at 10:13 Comment(1)
This works for some entities only. Entity references like &eacute; do not appear in innerHMTML; instead the character denoted, such as é, appears there.Macfarlane
S
30

You cannot get the actual HTML source of part of your web page.

When you give a web browser an HTML page, it parses the HTML into some DOM nodes that are the definitive version of your document as far as the browser is concerned. The DOM keeps the significant information from the HTML—like that you used the Unicode character U+00A0 Non-Breaking Space before the word fix—but not the irrelevent information that you used it by means of an entity reference rather than just typing it raw ( ).

When you ask the browser for an element node's innerHTML, it doesn't give you the original HTML source that was parsed to produce that node, because it no longer has that information. Instead, it generates new HTML from the data stored in the DOM. The browser decides on how to format that HTML serialisation; different browsers produce different HTML, and chances are it won't be the same way you formatted it originally.

In particular,

  • element names may be upper- or lower-cased;

  • attributes may not be in the same order as you stated them in the HTML;

  • attribute quoting may not be the same as in your source. IE often generates unquoted attributes that aren't even valid HTML; all you can be sure of is that the innerHTML generated will be safe to use in the same browser by writing it to another element's innerHTML;

  • it may not use entity references for anything but characters that would otherwise be impossible to include directly in text content: ampersands, less-thans and attribute-value-quotes. Instead of returning &nbsp; it may simply give you the raw   character.

You may not be able to see that that's a non-breaking space, but it still is one and if you insert that HTML into another element it will act as one. You shouldn't need to rely anywhere on a non-breaking space character being entity-escaped to &nbsp;... if you do, for some reason, you can get that by doing:

x= el.innerHTML.replace(/\xA0/g, '&nbsp;')

but that's only escaping U+00A0 and not any of the other thousands of possible Unicode characters, so it's a bit questionable.

If you really really need to get your page's actual source HTML, you can make an XMLHttpRequest to your own URL (location.href) and get the full, unparsed HTML source in the responseText. There is almost never a good reason to do this.

Succession answered 11/10, 2010 at 10:50 Comment(1)
Good stuff @bobince. I found a ridiculous use case for your XMLHttpRequest idea. I'm working with a <picture> polyfill and IE9 is helpfully stripping the <source> child elements from the DOM. Getting the unparsed HTML is doing the trick.Brotherhood
A
6

What you have should work:

Element test:

<div id="myE">How to&nbsp;fix</div>​

JavaScript test:

alert(document.getElementById("myE​​​​​​​​").innerHTML); //alerts "How to&nbsp;fix"

You can try it out here. Make sure that wherever you're using the result isn't show &nbsp; as a space, which is likely the case. If you want to show it somewhere that's designed for HTML, you'll need to escape it.

Arrogance answered 11/10, 2010 at 10:13 Comment(1)
This works for some entities only. Entity references like &eacute; do not appear in innerHMTML; instead the character denoted, such as é, appears there.Macfarlane
C
0

You can use a script tag instead, which will not parse the HTML. This is more relevant when there are angle brackets, like loading a lodash or underscore template.

document.getElementById("asDiv").value = document.getElementById("myDiv").innerHTML;
document.getElementById("asScript").value = document.getElementById("myScript").innerHTML;
<div id="myDiv">
<h1>
<%= ${var} %> %>
How to&nbsp;fix
</h1>
</div>

<script id="myScript" type="text/template">
<h1>
<%= ${var} %>
How to&nbsp;fix
</h1>
</script>

<textarea rows="10" cols="40" id="asDiv"></textarea>
<textarea rows="10" cols="40" id="asScript"></textarea>

Because the HTML in a div is parsed, the inner HTML for brackets comes back as

&lt;

, but as a script it does not.

Curator answered 9/10, 2021 at 13:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.