DO NOT Believe USERS!
The first thing you should care of is "Do NOT believe your USERS".
If your 'HTML' is rendered by your server and can't be modified by user, it's totally OK.
Because your rendered/saved HTML is totally safe and managed by yourself, and if it's assured as "SAFE" HTML, whether you put it(html) into DOM or not is not the problem at all.
But the problem is, most of WYSIWYG editors - like draft.js
- makes "HTML" files not TEXT. I think your worry comes from here too.
Yes, It's dangerous. what we can do is NOT rendering HTML directly but "selective" HTML rendering.
Dangerous tags : <script>
, <img>
, <link>
, etc.
you can remove those tags but it can be much safer when you decide which tags will you allow, like this:
Safe tags: <H1> - <H6>
/ span
/ div
/ p
/ ol
ul
li
/ table
...
and you SHOULD remove those HTML element's attributes, like, onclick=""
, etc.
because it could be abused by users too.
Conclusion:
So what can we do when we use WYSIWYG editors?
There are 2 big strategies:
- Generate "Safe" HTML when safe to Database.
- Generate "Safe" HTML before putting it into DOM.
(3. Generate "Safe" HTML before sending html to client, but it is not in your case!)
Choose first one if you want to sure Database's text are totally safe.
First one must be processed in your server(not browser/client!), and you can use many solutions like BeautifulSoup
in python, or sanitize-html
in nodejs.
Choose second one if your web-app is complicated, and most of your service's business logic is running on front-end side.
Second one is using HTML escaping package just before mounting HTML into DOM. and still sanitize-html
can be good solution. (surelly there's more great solutions!)
You can decide which tags/attribute/values in HTML.
Links
https://github.com/punkave/sanitize-html
getCurrentContent()
now: draftjs.org/docs/api-reference-editor-state#getcurrentcontent – Doubtless