Fixing malformed HTML in PHP?
Asked Answered
C

3

6

I am constructing a large HTML document from fragments supplied by users that have the annoying habit of being malformed in various ways. Browsers are robust and forgiving enough but I want to be able to validate and (ideally) fix any malformed HTML if at all possible. For example:

<td><b>Title</td>

can be reasonably fixed to:

<td><b>Title</b></td>

Is there a way of doing this easily in PHP?

Cacique answered 1/1, 2009 at 1:14 Comment(0)
S
9

You can use HTML Tidy, man pages are here.

Sanjuana answered 1/1, 2009 at 1:16 Comment(2)
However there are still few problems with it. It used to remove my intended white spaces, causing some JS problems. Also it parses <script> tags in a way that IE6 doesn't recognize them sometimes - if you still want to optimize your webby for IE6.Plethoric
Tidy should not be used on untrusted input: htmlpurifier.org/comparison#TidyCooky
P
3

I highly recommend HTML Purifier. From their site:

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications. Tired of using BBCode due to the current landscape of deficient or insecure HTML filters? Have a WYSIWYG editor but never been able to use it? Looking for high-quality, standards-compliant, open-source components for that application you're building? HTML Purifier is for you!

Preclude answered 4/2, 2010 at 13:41 Comment(0)
N
1

If you can't use Tidy (sometimes hosting service do not activate this php module), you can use this PHP class: http://www.barattalo.it/html-fixer/

Nathanaelnathanial answered 4/2, 2010 at 13:36 Comment(1)
Note that the project is no longer maintained. Its last update was on 06/07/2010. But it's easy to use and has one single file compared to HTML Purifier.Expletive

© 2022 - 2024 — McMap. All rights reserved.