Sanitize HTML before storing in the DB or before rendering? (AntiXSS library in ASP.NET)

Asked 13/1, 2010 at 22:53 Answered 25/5, 2010 at 21:16

Solved asp.net xss antixsslibrary html-sanitizing

I have an editor that lets users add HTML that is stored in the database and rendered on a web page. Since this is untrusted input, I plan to use Microsoft.Security.Application.AntiXsSS.GetSafeHtmlFragment to sanitize the HTML.

Should I santiize before saving to the database or before rendering the untrusted input into the webpage?
Is there an advantage in including the AntiXSS source code in my project instead of just the DLL? (Maybe I can customize the white list?)
Which class file should I look in for actual implementation of the GetSafeHtmlFragment

Yulma answered 13/1, 2010 at 22:53 Comment(0)

I disagree with the selected answer for two reasons

If you stored encoded data, you have to pick an encoder before you store. What happens if you have stored something as HTML but also want to push it out in another format, for example as a JSON response, or as part of an XML document? You now have a an HTML encoded format you must decode, then encode in the correct format.
What if we discover a bug in the encoders and push a new version out? Now, because you're not encoding at the point of output all your old data may contain things that have been incorrectly encoded. You can encode again, but then you hit double encoding issues which can be painful to filter correctly.

Generally you encode at the point of output and treat any data coming from a data store as untrusted by default - after all, what if someone manages to edit your database directly or via SQL injection?

Fruitful answered 24/2, 2010 at 14:23 Comment(4)

I came back and voted you up because I agree with your arguments, and your suggestion matches with the OWASP recommendation. Hopefully, @user102533 will switch and accept yours. – Former 26/2, 2010 at 17:51

He's not storing encoded data, he's storing sanitized data - big difference (it's still unencoded HTML) – Tremulant 5/5, 2010 at 12:59

Point 2 still stands then (and yes, I didn't pay enough attention in reading!) What if there's a bug in the sanitizer? Or you want to change the sanitzation rules? – Fruitful 5/5, 2010 at 16:28

Microsoft.Security.Application.AntiXsSS.GetSafeHtmlFragment - During the sanitizing process, it also encodes it. It's what actually caused me to search for a solution. I don't want to store encoded stuff in the database, but just the sanitized piece. Womp womp... Now I don't know what to store... – Wohlert 17/2, 2015 at 22:26

Have a listen to the OWASP podcast 67 with Jeff Williams on XSS. He talks about not sanitising or encoding before storage. The primary reason is that if (when) libraries evolve in response to new vulnerabilities your data is going to be stuck back in the old version. Of course this doesn’t stop you from running any input against a whitelist at the entry point and rejecting anything outside acceptable range.

Freitag answered 25/5, 2010 at 21:16 Comment(0)

Both
Only if you plan on changing it, which I would not do personally
The AntiXss class (since it's called as AntiXss.GetSafeHtmlFragment())

Former answered 13/1, 2010 at 22:55 Comment(3)

There is already HTML code in the database, if I sanitize it before adding it to the database, then I need to sanitize the data that already exists (for consistency). But is there any value in sanitizing before adding it in the database? – Yulma 14/1, 2010 at 3:24

To me, the primary value is that it may be some newbie who makes the next web app that reads from your DB. You never know who will take over after you've moved on to greener pastures. Also, it's just good to get in the habit of doing it so it's second nature. – Former 14/1, 2010 at 3:37

+1 for sanitizing every time, just in case. The only thing to be aware of is a possible performance hit, in which case you can sanitize just on saving to the DB but you then need to be careful that all the info in the DB is really sanitized. – Tremulant 5/5, 2010 at 13:1

-1

You can use the in the page directive the parameter ValidateRequest="true". In this way all the Request data is validated and if there's a validation problem you can always catch the error. It also prevents sql injection threads and others not only possible XSS.

With numeric data, you can validate integer overflow or misuse of data types with Int32.TryParse() or any other of the TryParse family (Byte.TryParse Int16.TryParse...)

No need to use any other class or additional sanitizer method.

Condensate answered 13/1, 2010 at 23:6 Comment(1)

Request validation is on by default but you're putting a bit too much faith in it. You need to treat it as one layer of security which is supplemented by whitelist validation at the point of input (you sort of touched on this re the int parsing) and output encoding. It also won’t do much for you in terms of SQL Injection; the statement “' or 1=1” will go right through. This might help explain it for you a little better: troyhunt.com/2010/05/… – Freitag 25/5, 2010 at 21:24

Recommended topics

Hot tags