The best way to avoid XSS is to not embed user provided code inside HTML or JavaScript. You (the developer) should be in charge of code written in all the files that you serve through your web application.
There are cases however where you need to use user provided content (not code) in your code or in your HTML page. In those cases, you need to make sure that you are aware of the exact context where this content will be embedded and use the appropriate encoding (sometimes referred to as "escaping"). Properly encoded strings means that user browsers will always interpret them as strings and never as code.
For HTML attributes in particular, take a look at this OWASP cheatsheet: Cross Site Scripting Prevention Cheat Sheet. In the section Output Encoding for “HTML Attribute Contexts” you will find all the ways you can ensure HTML attribute values are properly encoded:
- Always use
"
or '
to surround the value (I recommend using "
only, because there are security implications for '
).
- Encode all non-alphanumeric characters using HTML entities. In .NET you should use something like the HttpUtility.HtmlAttributeEncode method on the entire string and not try to do this by hand.
- If you are using JavaScript to set the attribute value, use the appropriate API methods that handle encoding automatically.
- Never inject user content in places that you are not sure if they are "Safe Sinks" (e.g. the
onclick
handler of an element).
Finally, keep in mind that there are contexts other than that of HTML attributes. For example, HTML text, HTML script
tags and style
tags are different contexts, each requiring its own kind of encoding. And then there is the context of your server-side template engine (if you use one), the context where SQL executes, the context where shell scripts execute, etc..
These are often mixed with each other. For example your template engine dynamically produces an HTML snippet that has a script
tag inside where you might embed a user-provided id string. Or the user provides an image and some parameters, which you then pass to a shell script in order to process the image with ImageMagick.
In all those cases you should take care to properly encode or escape the user provided input, using the encoding method that is appropriate for the specific context. Or, if possible, avoid passing user input directly inside execution contexts and use whitelists to pass only strings that you control.