Is it good to use htmlspecialchars() before Inserting into MySQL?
Asked Answered
L

1

13

I am a little confused on this. I have been reading about htmlspecialchars() and I am planning to use this for the textareas POST to prevent XSS attack. I understand that usually htmlspecialchars() are used to generate the HTML output that is sent to the browser. But what I am not sure is:

1) Is it a safe practice to use htmlspecialchars() to the user input data before I insert it into MySQL? I am already using PDO prepared statement with parameterized values to prevent SQL Injection.

2) Or, I really dont need to worry about using htmlspecialchars() to inserted values (provided they are parameterized) and only use htmlspecialchars() when I fetch results from MySQL and display it to users?

Laity answered 6/1, 2014 at 20:58 Comment(7)
#2 is the correct answerMahout
HTML escaping (1) is not really relevant to your database. Commonly (2) is preferred, as premature escaping might interfer with output channels other than a web page later. For newcomers early HTML escaping might be advisable still; better too early than later forgotten.Plourde
Suppose you wanted to produce some text logs. Then you would have to undo your escaping. Instead escape at the last possible moment so that you know it's appropriate.Glycolysis
Apply business rules before inserting data (the might include restricting what characters are allowed), but apply escaping at the usage site (as this is just an implementation detail of using the data in a given context).Mammillary
Thank you all for your input. It makes sense to me now. @Plourde - If I do take your newcomer advise and do early escaping before inserting and if I forget I did that and later escape again when outputting, will the output text will then have special character since it is escaped twice? Say for example this is the text: <a href='test'>Test</a>. When this is escaped during inserting and escaped again during output, will it then it show as &lt;a href=&#039;test&#039;&gt;Test&lt;/a&gt; instead of showing it as <a href='test'>Test</a> in the browser?Laity
@blackops_programmer If your stored text is <a href='test'>Test</a> with the intent of transforming it into a real working link, you would have to decode it before output. If you wanted it to show literally as HTML markup, one encoding would display in the browser as <a href='test'>Test</a> and double-encoding would display as &lt;a href=&#039;test&#039;&gt;Test&lt;/a&gt; and neither would get you a working link.Freestanding
Thank you Michael Berknowshi for clearing that for me. I will then using the htmlspecialchars() at the end & make sure I make it a habit to encode it everytime I display any fetched results to the end user. This practice will perhaps make it a good habit. I was confused about this since most articles I read about XSS said that I need to be validate & Sanitize all user inputs. So I wasnt sure what to do with textarea when I dont have anything to validate that with such as expecting a set numbers or set values, etc before insert. So all of your answers have given me a clearer understanding. :)Laity
L
18

As others have pointed out, #2 is the correct answer. Leave it "raw" until you need it, then escape appropriately.

To elaborate on why (and I will repeat/summarise the other posts), let's take scenario 1 to its logical extreme.

What happens when someone enters " ' OR 1=1 <other SQL injection> -- ". Now maybe you decide that because you use SQL you should encode for SQL (maybe because you didn't use parameterised statements). So now you have to mix (or decide on) SQL & HTML encoding.

Suddenly your boss decides he wants an XML output too. Now to keep your pattern consistent you need to encode for that as well.

Next CSV - oh no! What if there are quotes and commas in the text? More escaping!

Hey - how about a nice interactive, AJAX interface? Now you probably want to start sending JSON back to the browser so now {, [ etc. all need to be taken into consideration. HELP!!

So clearly, store the data as given (subject to domain constraints of course) and encode appropriate to your output at the time you need it. Your output is not the same as your data.

I hope this answer is not too patronising. Credit to the other respondents.

Lumbering answered 7/1, 2014 at 2:13 Comment(1)
Thank you so much for summarizing it for me and explaining why #2 will be the best practice. :)Laity

© 2022 - 2024 — McMap. All rights reserved.