Prevent PHP Tidy from converting style tag data to CDATA
Asked Answered
T

3

9

I am using php tidy to clean a user generated HTML page which contains a style tag :

<style type="text/css">
    body {
        padding-top: 60px;
        padding-bottom: 40px;
    }
</style>

But once I run the Tidy, the style tag data is converted to CData. My main purpose of using Tidy is to repair the file as well as do proper indentation.

<style type="text/css">
/*<![CDATA[*/
    body {
            padding-top: 60px;
            padding-bottom: 40px;
    }
/*]]>*/
</style>

My Tidy config options are -

$options = array(
    'preserve-entities' => true,
    'hide-comments' => true,
    'tidy-mark' => false,
    'indent' => true,
    'indent-spaces' => 4,
    'new-blocklevel-tags' => 'article,header,footer,section,nav',
    'new-inline-tags' => 'video,audio,canvas,ruby,rt,rp',
    'doctype' => 'omit',
    'sort-attributes' => 'alpha',
    'vertical-space' => false,
    'output-xhtml' => true,
    'wrap' => 180,
    'wrap-attributes' => false,
    'break-before-br' => false,
    'vertical-space' => false,
);

$buffer = tidy_parse_string($buffer, $options, 'utf8');
tidy_clean_repair($buffer);

I tried searching a lot but the PHP Tidy library is not exactly a "well documented" one! So it came down to removing the CDATA manually after Tidy cleans/repairs the code.

$buffer = str_replace("/*<![CDATA[*/","",$buffer);
$buffer = str_replace("/*]]>*/","",$buffer);

Now the my problem with this approach is that the indentation of the style tag data is still screwed up (not exactly aligned with the rest of the page)

<style type="text/css">
    body {
        padding-top: 60px;
        padding-bottom: 40px;
    }
</style>

So again, how do I prevent TIDY from creating CDATA on the page!

Thanks a lot!

Thermocouple answered 12/3, 2013 at 15:36 Comment(0)
A
10

Turn off the output-xhtml option. The CDATA wrapping is required for XHTML, as CSS can contain unescaped > characters.

Asthenic answered 12/3, 2013 at 18:8 Comment(2)
ahhhh I can't believe I did that... spend hours on this one!! Thanks a lot :)Thermocouple
As of today, OP should both desactivate output-xhtml and activate output-html, i.e., have 'output-xhtml' => false and 'output-html' => true (tested with HTML Tidy for Linux version 5.4.0 ).Sciatica
A
2

The addition of CDATA tags is intended to help browser know they should parse characters like '<' and '&' as literal characters instead of html syntax. Tidy does not appear to have any documented configuration that would prevent generating them for inline css/javascript. The only option would be moving the css to a separate file. In which case it doesn't need the CDATA tag.

see http://tidy.sourceforge.net/docs/quickref.html and https://en.wikipedia.org/wiki/CDATA for more information.

Adhibit answered 12/3, 2013 at 17:25 Comment(1)
yeah I am aware of what CDATA is used for, my only concern is that it not exactly much useful inside a style tag, the browser is smart enough to decode that and use it appropriately... so is there a way to avoid this "unnecessary" addon!Thermocouple
C
0

One way to handle it is to use a link to an external stylesheet.

<link rel="stylesheet" type="text/css" media="screen, print" href="site.css">
Carpo answered 12/3, 2013 at 15:38 Comment(1)
as I mentioned, the page is a user generated thing, so I was looking for a solution that could just prevent the cdata tag creation.. moving to external stylesheet is obviously an option but I was hoping it to be last resort...Thermocouple

© 2022 - 2024 — McMap. All rights reserved.