Align the WMD editor's preview HTML with server-side HTML validation (e.g. no embedded JavaScript code)

Asked 14/5, 2010 at 20:59 Answered 14/5, 2010 at 21:2

There are many Stack Overflow questions (e.g. Whitelisting, preventing XSS with WMD control in C# and WMD Markdown and server-side) about how to do server-side scrubbing of Markdown produced by the WMD editor to ensure the HTML generated doesn't contain malicious script, like this:

<img onload="alert('haha');" 
   src="http://www.google.com/intl/en_ALL/images/srpr/logo1w.png" />

But I didn't find a good way to plug the hole on the client side too. Client validation isn't a replacement for scrubbing validation on the server of course, since anyone can pretend to be a client and POST you nasty Markdown. And if you're scrubbing the HTML on the server, an attacker can't save the bad HTML so no one else will be able to see it later and have their cookies stolen or sessions hijacked by the bad script. So there's a valid case to be made that it may not be worth enforcing no-script rules in the WMD preview pane too.

But imagine an attacker found a way to get malicious Markdown onto the server (e.g. a compromised feed from another site, or content added before an XSS bug was fixed). Your server-side whitelist applied when translating markdown to HTML would normally prevent that bad Markdown from being shown to users. But if the attacker could get someone to edit the page (e.g. by posting another entry saying the malicious entry had a broken link and asking someone to fix it), then anyone who edits the page gets their cookies hijacked. This is admittedly a corner case, but it still may be worth defending against.

Also, it's probably a bad idea to allow the client preview window to allow different HTML than your server will allow.

The Stack Overflow team has plugged this hole by making changes to WMD. How did they do it?

[NOTE: I already figured this out, but it required some tricky JavaScript debugging, so I'm answering my own question here to help others who may want to do ths same thing].

Flocculate answered 14/5, 2010 at 20:59 Comment(0)

One possible fix is in wmd.js, in the pushPreviewHtml() method. Here's the original code from the Stack Overflow version of WMD on GitHub:

if (wmd.panels.preview) {
    wmd.panels.preview.innerHTML = text; 
}

You can replace it with some scrubbing code. Here's an adaptation of the code that Stack Overflow uses in response to this post, which restricts to a whitelist of tags, and for IMG and A elements, restricts to a whitelist of attributes (and in a specific order too!). See the Meta Stack Overflow post What HTML tags are allowed on Stack Overflow, Server Fault, and Super User? for more info on the whitelist.

Note: this code can certainly be improved, e.g. to allow whitelisted attributes in any order. It also disallows mailto: URLs which is probably a good thing on Internet sites but on your own intranet site it may not be the best approach.

if (wmd.panels.preview) {

    // Original WMD code allowed JavaScript injection, like this:
    //    <img src="http://www.google.com/intl/en_ALL/images/srpr/logo1w.png" onload="alert('haha');"/>
    // Now, we first ensure elements (and attributes of IMG and A elements) are in a whitelist,
    // and if not in whitelist, replace with blanks in preview to prevent XSS attacks 
    // when editing malicious Markdown.
    var okTags = /^(<\/?(b|blockquote|code|del|dd|dl|dt|em|h1|h2|h3|i|kbd|li|ol|p|pre|s|sup|sub|strong|strike|ul)>|<(br|hr)\s?\/?>)$/i;
    var okLinks = /^(<a\shref="(\#\d+|(https?|ftp):\/\/[-A-Za-z0-9+&@#\/%?=~_|!:,.;\(\)]+)"(\stitle="[^"<>]+")?\s?>|<\/a>)$/i;
    var okImg = /^(<img\ssrc="https?:(\/\/[-A-Za-z0-9+&@#\/%?=~_|!:,.;\(\)]+)"(\swidth="\d{1,3}")?(\sheight="\d{1,3}")?(\salt="[^"<>]*")?(\stitle="[^"<>]*")?\s?\/?>)$/i;
    text = text.replace(/<[^<>]*>?/gi, function (tag) {
        return (tag.match(okTags) || tag.match(okLinks) || tag.match(okImg)) ? tag : ""
    })

    wmd.panels.preview.innerHTML = text;  // Original code 
}

Also note that this fix is not in the Stack Overflow version of WMD on GitHub-- clearly the change was made later and not checked back into GitHub.

UPDATE: in order to avoid breaking the feature where hyperlinks are auto-created when you type in a URL, you also will need to make changes to showdown.js, like below:

Original code:

var _DoAutoLinks = function(text) {

    text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"<a href=\"$1\">$1</a>");

    // Email addresses: <[email protected]>

    /*
        text = text.replace(/
            <
            (?:mailto:)?
            (
                [-.\w]+
                \@
                [-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+
            )
            >
        /gi, _DoAutoLinks_callback());
    */
    text = text.replace(/<(?:mailto:)?([-.\w]+\@[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+)>/gi,
        function(wholeMatch,m1) {
            return _EncodeEmailAddress( _UnescapeSpecialChars(m1) );
        }
    );

    return text;
}

Fixed code:

var _DoAutoLinks = function(text) {
    // use simplified format for links, to enable whitelisting link attributes
    text = text.replace(/(^|\s)(https?|ftp)(:\/\/[-A-Z0-9+&@#\/%?=~_|\[\]\(\)!:,\.;]*[-A-Z0-9+&@#\/%=~_|\[\]])($|\W)/gi, "$1<$2$3>$4");
    text = text.replace(/<((https?|ftp):[^'">\s]+)>/gi, '<a href="$1">$1</a>');
    return text;
}

Flocculate answered 14/5, 2010 at 20:59 Comment(6)

I'm not convinced that this is something that needs fixing. It looks like a solution in search of a problem. Maybe the reason you don't see this code in the StackOverflow version of WMD is because it doesn't exist, because it is not needed. – Pyrrolidine 14/5, 2010 at 21:3

Yep, I'm not convinced it's needed either. That said, the StackOverflow.com guys implemented this in order to ensure that the previewer never generated HTML that their server-side validator wouldn't accept. Seems reasonable, although I agree not a terribly high priority. See meta.stackexchange.com/questions/1227/… for more details about why SO did it. BTW I just edited my question to align with the actual reason that SO wanted to do this. – Flocculate 14/5, 2010 at 21:13

You could AJAX post the HTML and get back the sanitised HTML from the server, to get a perfect preview. – Nucleon 15/5, 2010 at 3:8

@Justin Grant: Do you have an open source repo anywhere with the changes applied? I'm considering forking WMD at Github. (Thanks for posting this question/answer!) – Derogate 5/1, 2011 at 21:54

Here is a presumably good HTML sanitizer: code.google.com/p/google-caja/wiki/JsHtmlSanitizer. – Derogate 6/1, 2011 at 3:36

@LeoMaheo - check out justingrant.net/editpage for the source code. I haven't put this up into GitHub, but feel free to take my edits from the source code you can download from that page. The readme explains which WMD fork I started from. – Flocculate 13/1, 2011 at 21:15

It is not a security issue to allow the local user to execute scripts in the page context as long as it's impossible for any third party to provide the script. Without the editor doing it, the user could always enter a javascript: url while on your page or use Firebug or something similar.

Koal answered 14/5, 2010 at 21:2 Comment(1)

initially I agreed with you, but I did find an interesting case: if there were another way for an attacker to get compromised markdown onto the server, then this WMD preview issue would be dangerous. If an attacker could get a site moderator to edit a bad page (e.g. to clear out a broken link), then he could potentially take control of the whole site. True, t'd be easy to defeat by scrubbing markdown on the server before sending down to the edit page, but output validation is often ignored. This is admittedly a corner case, but may be worth defending against for a secure site. – Flocculate 15/5, 2010 at 5:31

Recommended topics

Hot tags