Unescape HTML entities in JavaScript?

Asked 16/12, 2009 at 5:27 Answered 18/7, 2023 at 3:29

303

I have some JavaScript code that communicates with an XML-RPC backend. The XML-RPC returns strings of the form:

<img src='myimage.jpg'>

However, when I use JavaScript to insert the strings into HTML, they render literally. I don't see an image, I see the string:

<img src='myimage.jpg'>

I guess that the HTML is being escaped over the XML-RPC channel.

How can I unescape the string in JavaScript? I tried the techniques on this page, unsuccessfully: http://paulschreiber.com/blog/2008/09/20/javascript-how-to-unescape-html-entities/

What are other ways to diagnose the issue?

Godderd answered 16/12, 2009 at 5:27 Comment(5)

The huge function included in this article seems to work fine: blogs.msdn.com/b/aoakley/archive/2003/11/12/49645.aspx I don't think that's the most clever solution but works. – Lazarus 13/9, 2010 at 12:52

As strings containing HTML entities are something different than escaped or URI encoded strings, those functions won't work. – Wismar 13/9, 2010 at 13:15

@Matias note that new named entities have been added to HTML (e.g. via the HTML 5 spec) since that function was authored in 2003 - for instance, it doesn't recognise &zopf;. This is a problem with an evolving spec; as such, you should pick a tool that's actually being maintained to solve it with. – Alleviation 19/2, 2017 at 15:3

Possible duplicate of How to decode HTML entities using jQuery? – Darr 13/11, 2018 at 19:23

I've just realized how easy it is to confuse this question with encoding HTML entities. I've just realized I accidentally posted an answer for the wrong question on this question! I've deleted it, though. – Slovak 25/9, 2020 at 16:59

200

EDIT: You should use the DOMParser API as Wladimir suggests, I edited my previous answer since the function posted introduced a security vulnerability.

The following snippet is the old answer's code with a small modification: using a textarea instead of a div reduces the XSS vulnerability, but it is still problematic in IE9 and Firefox.

function htmlDecode(input){
  var e = document.createElement('textarea');
  e.innerHTML = input;
  // handle case of empty input
  return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

htmlDecode("&lt;img src='myimage.jpg'&gt;"); 
// returns "<img src='myimage.jpg'>"

Basically I create a DOM element programmatically, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the innerHTML insertion. Since it just creates an element but never adds it, no site HTML is modified.

It will work cross-browser (including older browsers) and accept all the HTML Character Entities.

EDIT: The old version of this code did not work on IE with blank inputs, as evidenced here on jsFiddle (view in IE). The version above works with all inputs.

UPDATE: appears this doesn't work with large string, and it also introduces a security vulnerability, see comments.

Nashner answered 16/12, 2009 at 5:33 Comment(17)

Got it, you changed to ', so let me delete my comment back, thx, its working great, +1 – Aurthur 16/12, 2009 at 5:41

@S.Mark: ' doesn't belongs to the HTML 4 Entities, that's why! w3.org/TR/html4/sgml/entities.html fishbowl.pastiche.org/2003/07/01/the_curse_of_apos – Fayola 16/12, 2009 at 5:48

See also @kender's note about the poor security of this approach. – Godderd 16/12, 2009 at 20:52

See my note to @kender about the poor testing he did ;) – Rosauraroscius 16/12, 2009 at 21:8

See this related post in SO: #1090556 ... looks like using innerHTML is not the way to go for security reasons. – Hypnoanalysis 22/12, 2010 at 23:13

@CMS how do I do the opposite of this? – Planet 7/6, 2011 at 15:51

@CMS Nevermind. Sorted my problem by encoding to HTML entities in PHP and then using your function in JS to decode – Planet 8/6, 2011 at 14:51

Some jsperf tests: jsperf.com/decodehtmlclone if you are decoding strings in a loop you might consider creating only once the "div" outside of the decode function. – Sapajou 7/2, 2013 at 10:39

Regarding @TomAuger security comment - the above code does not add the div to DOM, so nothing is rendered. It safely converts un-escapes HTML elements. – Stab 24/7, 2013 at 15:56

This actually doesn't work for very long strings, above 65536 chars in Chrome v39. Then Chrome splits the contents into many e.childNodes[*], so one needs to iterate over them. I added an answer that does that, see: https://mcmap.net/q/37318/-unescape-html-entities-in-javascript – Avelinaaveline 18/12, 2014 at 12:31

I think this will cause a slow memory leak. You probably want to store the result, remove the element you create, and then return the stored result. – Emelyemelyne 30/3, 2015 at 18:19

ok, I know that SO doesn't like "thanks!" and "me too!", but you saved my day: your code also works for reading javascript inside a <pre></pre> element and evaluating by inserting a script element with the javascript as code. Which is what I spent hours trying to do... – Scaly 6/8, 2015 at 16:19

This function is a security hazard, JavaScript code will run even despite the element not being added to the DOM. So this is only something to use if the input string is trusted. I added my own answer explaining the issue and providing a secure solution. As a side-effect, the result isn't being cut off if multiple text nodes exist. – Thermomotor 3/12, 2015 at 11:13

This does not work if the string is already unescaped. In my use case, sometimes the string is escaped, and sometimes it is not. So I would want a method that can take any string and decode it. Is that possible? – Fidler 5/7, 2016 at 3:59

For angular, you can wrap this in a filter, like in #31413051. – Superb 4/8, 2016 at 13:49

@CMS would it be possible for you to either update the answer so that it does not propose unsafe code, or delete it so that the next answer becomes the top answer? I hope I'm not sounding rude -- I'd like to minimize the risk that potentially risky code gets copy-pasted by someone without reading the fine print. I find it scary to link to this thread as it is now. – Shipwreck 25/2, 2018 at 14:34

This doesn't work if JS is not running in the browser, i.e. with Node. – Talkingto 31/3, 2021 at 8:34

697

Most answers given here have a huge disadvantage: if the string you are trying to convert isn't trusted then you will end up with a Cross-Site Scripting (XSS) vulnerability. For the function in the accepted answer, consider the following:

htmlDecode("<img src='dummy' onerror='alert(/xss/)'>");

The string here contains an unescaped HTML tag, so instead of decoding anything the htmlDecode function will actually run JavaScript code specified inside the string.

This can be avoided by using DOMParser which is supported in all modern browsers:

function htmlDecode(input) {
  var doc = new DOMParser().parseFromString(input, "text/html");
  return doc.documentElement.textContent;
}

console.log(  htmlDecode("&lt;img src='myimage.jpg'&gt;")  )    
// "<img src='myimage.jpg'>"

console.log(  htmlDecode("<img src='dummy' onerror='alert(/xss/)'>")  )  
// ""

This function is guaranteed to not run any JavaScript code as a side-effect. Any HTML tags will be ignored, only text content will be returned.

Compatibility note: Parsing HTML with DOMParser requires at least Chrome 30, Firefox 12, Opera 17, Internet Explorer 10, Safari 7.1 or Microsoft Edge. So all browsers without support are way past their EOL and as of 2017 the only ones that can still be seen in the wild occasionally are older Internet Explorer and Safari versions (usually these still aren't numerous enough to bother).

Thermomotor answered 3/12, 2015 at 11:9 Comment(21)

I think this answer is the best because it mentioned the XSS vulnerability. – Electrocute 30/12, 2015 at 18:4

Note that (according to your reference) DOMParser did not support "text/html" before Firefox 12.0, and there are still some latest versions of browsers that do not even support DOMParser.prototype.parseFromString(). According to your reference, DOMParser is still an experimental technology, and the stand-ins use the innerHTML property which, as you also pointed out in response to my approach, has this XSS vulnerability (which ought to be fixed by browser vendors). – Anisole 28/2, 2016 at 8:53

@PointedEars: Who cares about Firefox 12 in 2016? The problematic ones are Internet Explorer up to 9.0 and Safari up to 7.0. If one can afford not supporting them (which will hopefully be everybody soon) then DOMParser is the best choice. If not - yes, processing entities only would be an option. – Thermomotor 28/2, 2016 at 12:43

1. Please read my entire comment. 2. You do not have to use either one or the other, you can do feature tests. 3. That does not change the fact that if DOMParser is not available, it does not suffice to process “only entities”. – Anisole 28/2, 2016 at 13:12

@Anisole browser vendors cannot "fix" innerHTML, because it's working exactly as expected: you give it some HTML and the browser renders it. The problem, as they say, is between keyboard and chair: namely giving it pieces of HTML that don't come from the same website or from another trusted source. – Nepali 4/8, 2016 at 10:39

@Nepali You’re wrong. If “script elements inserted using innerHTML do not execute when they are inserted” (see reference), then at least certain event-handler values should not, too. – Anisole 7/8, 2016 at 12:24

@PointedEars: <script> tags not being executed isn't a security mechanism, this rule merely avoids the tricky timing issues if setting innerHTML could run synchronous scripts as a side-effect. Sanitizing HTML code is a tricky affair and innerHTML doesn't even try - already because the web page might actually intend to set inline event handlers. This simply isn't a mechanism intended for unsafe data, full stop. – Thermomotor 7/8, 2016 at 14:48

Since your code is most likely to reuse this many times, avoid using "new" with new DOMParser(). Just create it once and reference a member instance. – Valeryvalerye 27/7, 2017 at 5:45

@AndroidDev: Premature optimization is the root of all evil. I don't want to make assumptions about whether and how this code will be used, and I don't mean to encourage cargo cult programming either. – Thermomotor 29/7, 2017 at 14:45

do you have any idea about why the example newElement.innerHTML = "<img src='dummy' onerror='alert(/xss/)'>"; does not work? I tried it in several browsers, including IE6, tried to add an invocation of body.appendChild(newElement), but still was not able to see the alert. – Nunuance 14/11, 2017 at 10:41

@user907860: What is newElement? If it is something like <textarea> or <script>, element contents are interpreted differently for those (not safe either however). Also, there will be no alert if you run this code on about:blank rather than a regular webpage - has something to do with the way relative URLs are resolved, you'd have to use an absolute URL rather than dummy. – Thermomotor 14/11, 2017 at 12:57

the newElement is the newly created div, from the accepted answer, where it was named e : var e = document.createElement('div');. No, no "about:blank", I'm using a regular webpage on a local server. Actually, thank you for the response, by now it's enough for me, since if it is something unexpected, I'll probably ask a dedicated question a bit later – Nunuance 14/11, 2017 at 14:6

This is the solution that does work when evaluating a string within an SVG document that is encoded, in IE11. The <div> solution does NOT work, as no child nodes are created when the inner text is set. So I'm for this solution as it works more broadly. The solution needs to work without any outside frameworks - natively with what the browsers provide; this fits the bill. – Blague 2/5, 2018 at 0:1

Thanks for sharing this, but using this answer didn't help convert an escaped SVG string. Would you mind taking a peek? Thanks so much: #54003823 – Towland 2/1, 2019 at 8:35

@Crashalot: DOMParser can parse XML code as well, you merely need to change the MIME type. Somebody already pointed that out to you. – Thermomotor 2/1, 2019 at 9:37

This code is extremely slow! See my answer where I provided proofs. – Estey 13/3, 2019 at 12:52

@ИльяЗеленько: Do you plan to use this code in a tight loop or why does the performance matter? Your answer is again vulnerable to XSS, was it really worth it? – Thermomotor 13/3, 2019 at 19:39

Thank you Wladimir Palant! I've been looking for this, appreciate the example & explanation. – Gambrell 10/4, 2019 at 22:33

Worked for me :) – Drugge 18/9, 2020 at 16:22

Note: This answer also removes HTML tags. In case you want to decode entities only and keep the tags you can use return doc.body.innerHTML (instead of return doc.documentElement.textContent). – Postnasal 3/8, 2021 at 6:53

It's worth to mention that htmlDecode('1<script>alert();</script>2') or htmlDecode('1<a>alert();</b>2') currently outputs 1alert();2, so just in case, be aware. The following function variant seems currently outputs the most expected output: https://mcmap.net/q/37318/-unescape-html-entities-in-javascript (...document.createElement('textarea');...) ; Related: https://mcmap.net/q/37419/-how-to-decode-html-entities-using-jquery (...Security issues in similar approaches...) – Dermis 15/11, 2022 at 14:32

316

Do you need to decode all encoded HTML entities or just & itself?

If you only need to handle & then you can do this:

var decoded = encoded.replace(/&amp;/g, '&');

If you need to decode all HTML entities then you can do it without jQuery:

var elem = document.createElement('textarea');
elem.innerHTML = encoded;
var decoded = elem.value;

Please take note of Mark's comments below which highlight security holes in an earlier version of this answer and recommend using textarea rather than div to mitigate against potential XSS vulnerabilities. These vulnerabilities exist whether you use jQuery or plain JavaScript.

Jeremiad answered 13/9, 2010 at 12:31 Comment(8)

Beware! This is potentially insecure. If encoded='<img src="bla" onerror="alert(1)">' then the snippet above will show an alert. This means if your encoded text is coming from user input, decoding it with this snippet may present an XSS vulnerability. – Alleviation 10/7, 2015 at 20:39

@MarkAmery I not a security expert, but it looks like if you immediate set the div to null after getting the text, the alert in the img isn't fired - jsfiddle.net/Mottie/gaBeb/128 – Viable 17/7, 2015 at 16:53

@Viable note sure which browser that worked for you in, but the alert(1) still fires for me on Chrome on OS X. If you want a safe variant of this hack, try using a textarea. – Alleviation 17/7, 2015 at 16:58

+1 for the simple regexp replace alternative for just one kind of html entity. Do use this if you are expecting html data being interpolated from, say, a python flask app to a template. – Henrieta 1/3, 2017 at 21:18

How to do this on Node server? – Evite 27/6, 2018 at 10:51

@MohammadKermani: he, entities and html-entities, but this question is a duplicate of #1913001 – Blackshear 1/7, 2020 at 1:15

Please note that using a <textarea> here doesn’t prevent XSS vulnerabilities in all browsers. The input </textarea><img src="bla" onerror="alert(1)"> is still problematic. – Thermomotor 22/10, 2021 at 18:7

This fails on Firefox if there is an inline style with the font-family set, because the font's name is put in quotation marks, which are escaped, so the resulting string will look like this: style="font-family: "Roboto";" – Dallis 18/4, 2023 at 6:30

200

EDIT: You should use the DOMParser API as Wladimir suggests, I edited my previous answer since the function posted introduced a security vulnerability.

The following snippet is the old answer's code with a small modification: using a textarea instead of a div reduces the XSS vulnerability, but it is still problematic in IE9 and Firefox.

function htmlDecode(input){
  var e = document.createElement('textarea');
  e.innerHTML = input;
  // handle case of empty input
  return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

htmlDecode("&lt;img src='myimage.jpg'&gt;"); 
// returns "<img src='myimage.jpg'>"