How I protect against XSS attacks in attributes such as src?

Asked 7/3, 2013 at 20:54 Answered 31/3, 2023 at 14:47

So I've been building a C# html sanitizer using html agility with a white list. It works fine, except for cases like these:

<img src="javascript:alert('BadStuff');" />
<img src="jav&#x09;ascript:alert('BadStuff');">

I DO want to allow the src attribute, just not malicious stuff within it obviously. All of the stuff i've looked up has just recommended a whitelist for tags and their attributes. How would you handle something like this though? I know this won't work in any newer browser, but i'm not very familiar with security and i'm sure there are some other clever things attackers could do.

Recur answered 7/3, 2013 at 20:54 Comment(2)

You could parse every input or maybe url encode the url. But I'm not sure if this would prevent XSS. – Hammon 7/3, 2013 at 20:58

I just noticed after I posted my answer that you are building an HTML sanitizer. You should never do that. Use a reputable library for that and, even then, it's still something you should avoid if you can (e.g. work with plain text or markdown). – Papotto 31/3, 2023 at 14:59

Something like "must be valid Uri either relative or absolute with http/https scheme" is good starting point.

Yippee answered 7/3, 2013 at 21:1 Comment(1)

@Recur Not sure what you mean "starts with XXXX"? Uri.Scheme and Uri.TryCreate are the ones I'd use. I strongly suspect that you are talking about using string.StartsWith without reading and learning Uri - RFC 3986 by heart. – Yippee 8/3, 2013 at 17:25

You can safely allow the src attribute, provided that you sanitize and handle the input properly. To do this you should first sanitize it through a whitelist of valid URL characters, canonicalize it, and then verify that it points to a valid image.

The whitelist you mentioned is the first step (and an important one at that). To implement the whitelist, simply strip out every character that isn't valid for a URL. Also verify that the URL is properly formed, meaning that it points to a valid resource that the user should be able to access. For example, the user shouldn't be accessing a local file on the server by passing in file://sensitive.txt or something. If http or https are the only protocols that should be used, check that the URL starts with those. If you are extra paranoid you may reject the request altogether since it is obvious it has been tampered with. Whitelisting is important, but whitelisting alone however will not keep the feature secure.

Canonicalization is important because many attacks depend on submitting URLs that eventually take you to a certain location, but may abuse the computer's innate lack of reasoning to get at things it shouldn't. This will also help to eliminate duplicated paths to the same resource which may improve performance (or at least allow you to improve performance by not rechecking a known file that hasn't changed since the last time you checked it. Be careful with this though because it is possible to spoof a last modified date so an attacker could swap a malicious file in after you've already "checked and trusted" it).

To verify that you are pointing to a valid image, open the file and read in the first few bytes. Do not simply trust the file extension, though do check it first before opening the file (for performance and for security). Every image format has a certain pattern of bytes that you can check. A good one to look at first is JPEG. It may still be possible for a malicious user to put shellcode or other attack code in an image file that contains the proper headers, but it is much more difficult to do. This will be a performance bottleneck so plan appropriately if you implement this.

Pandybat answered 12/3, 2013 at 19:27 Comment(0)

The best way to avoid XSS is to not embed user provided code inside HTML or JavaScript. You (the developer) should be in charge of code written in all the files that you serve through your web application.

There are cases however where you need to use user provided content (not code) in your code or in your HTML page. In those cases, you need to make sure that you are aware of the exact context where this content will be embedded and use the appropriate encoding (sometimes referred to as "escaping"). Properly encoded strings means that user browsers will always interpret them as strings and never as code.

For HTML attributes in particular, take a look at this OWASP cheatsheet: Cross Site Scripting Prevention Cheat Sheet. In the section Output Encoding for “HTML Attribute Contexts” you will find all the ways you can ensure HTML attribute values are properly encoded:

Always use " or ' to surround the value (I recommend using " only, because there are security implications for ').
Encode all non-alphanumeric characters using HTML entities. In .NET you should use something like the HttpUtility.HtmlAttributeEncode method on the entire string and not try to do this by hand.
If you are using JavaScript to set the attribute value, use the appropriate API methods that handle encoding automatically.
Never inject user content in places that you are not sure if they are "Safe Sinks" (e.g. the onclick handler of an element).

Finally, keep in mind that there are contexts other than that of HTML attributes. For example, HTML text, HTML script tags and style tags are different contexts, each requiring its own kind of encoding. And then there is the context of your server-side template engine (if you use one), the context where SQL executes, the context where shell scripts execute, etc..

These are often mixed with each other. For example your template engine dynamically produces an HTML snippet that has a script tag inside where you might embed a user-provided id string. Or the user provides an image and some parameters, which you then pass to a shell script in order to process the image with ImageMagick.

In all those cases you should take care to properly encode or escape the user provided input, using the encoding method that is appropriate for the specific context. Or, if possible, avoid passing user input directly inside execution contexts and use whitelists to pass only strings that you control.

Papotto answered 31/3, 2023 at 14:47 Comment(0)

Recommended topics

Hot tags