I'd like to use PHP Tidy to ensure my xml is valid before I load it into a DomDocument.
However, I don't want Tidy to change something to my formatting - I only want it to repair problems like unbalanced tags, etc.
An example of the problem can be seen at this page: http://www.tek-tips.com/viewthread.cfm?qid=1654452
My own example is the following.
Input: <ex><context>собр<stress>а</stress>ние</context> акцион<stress>е</stress>ров — <stress>aa</stress>ndeelhoudersvergadering</ex>
(which is valid xml already)
Expected output: <ex><context>собр<stress>а</stress>ние</context> акцион<stress>е</stress>ров — <stress>aa</stress>ndeelhoudersvergadering</ex>
(there is breaking whitespace between </context>
and актион
)
Actual output:
<ex>
<context>собр
<stress>а</stress>ние</context>акцион
<stress>е</stress>ров —
<stress>aa</stress>ndeelhoudersvergadering</ex>
(it removed the space between </context>
and актион
which will make the text unreadable, and it inserted newlines after each tag)
My code is:
function TidyXml($inputXml)
{
$config = array(
'indent' => false,
'output-xml' => true,
'input-xml' => true,
);
$tidy = new tidy();
$tidy->parseString($inputXml, $config, 'utf8');
$tidy->cleanRepair();
$cleanXml = tidy_get_output($tidy);
return $cleanXml;
}
I tried changing several options, but didn't succeed.
'input-xml' => true
(needed because otherwise it will output a complete HTML document). However, it didn't help. Also I tried setting'output-xml' => false
, but this didn't help. Can anything be done to prevent stripping / trimming and formatting? – Maxie<context><abbr>geog.</abbr>
should be fixed to<context><abr>geog.</abr></context>
so that the tags are balanced. – Maxie