PHP SimpleXML get innerXML
Asked Answered
S

11

11

I need to get the HTML contents of answer in this bit of XML:

<qa>
 <question>Who are you?</question>
 <answer>Who who, <strong>who who</strong>, <em>me</em></answer>
</qa>

So I want to get the string "Who who, <strong>who who</strong>, <em>me</em>".

If I have the answer as a SimpleXMLElement, I can call asXML() to get "<answer>Who who, <strong>who who</strong>, <em>me</em></answer>", but how to get the inner XML of an element without the element itself wrapped around it?

I'd prefer ways that don't involve string functions, but if that's the only way, so be it.

Smallsword answered 20/12, 2009 at 21:10 Comment(0)
S
5

To the best of my knowledge, there is not built-in way to get that. I'd recommend trying SimpleDOM, which is a PHP class extending SimpleXMLElement that offers convenience methods for most of the common problems.

include 'SimpleDOM.php';

$qa = simpledom_load_string(
    '<qa>
       <question>Who are you?</question>
       <answer>Who who, <strong>who who</strong>, <em>me</em></answer>
    </qa>'
);
echo $qa->answer->innerXML();

Otherwise, I see two ways of doing that. The first would be to convert your SimpleXMLElement to a DOMNode then loop over its childNodes to build the XML. The other would be to call asXML() then use string functions to remove the root node. Attention though, asXML() may sometimes return markup that is actually outside of the node it was called from, such as XML prolog or Processing Instructions.

Siesta answered 21/12, 2009 at 3:22 Comment(0)
A
14
function SimpleXMLElement_innerXML($xml)
  {
    $innerXML= '';
    foreach (dom_import_simplexml($xml)->childNodes as $child)
    {
        $innerXML .= $child->ownerDocument->saveXML( $child );
    }
    return $innerXML;
  };
Anaphylaxis answered 20/8, 2011 at 1:33 Comment(1)
great, simple solution!Manvil
P
6

This works (although it seems really lame):

echo (string)$qa->answer;
Prestidigitation answered 2/9, 2010 at 4:5 Comment(1)
Not lame at all! saved me from juggling xml to several variables. I've seen lamer ;)Fury
S
5

To the best of my knowledge, there is not built-in way to get that. I'd recommend trying SimpleDOM, which is a PHP class extending SimpleXMLElement that offers convenience methods for most of the common problems.

include 'SimpleDOM.php';

$qa = simpledom_load_string(
    '<qa>
       <question>Who are you?</question>
       <answer>Who who, <strong>who who</strong>, <em>me</em></answer>
    </qa>'
);
echo $qa->answer->innerXML();

Otherwise, I see two ways of doing that. The first would be to convert your SimpleXMLElement to a DOMNode then loop over its childNodes to build the XML. The other would be to call asXML() then use string functions to remove the root node. Attention though, asXML() may sometimes return markup that is actually outside of the node it was called from, such as XML prolog or Processing Instructions.

Siesta answered 21/12, 2009 at 3:22 Comment(0)
W
4

most straightforward solution is to implement custom get innerXML with simple XML:

function simplexml_innerXML($node)
{
    $content="";
    foreach($node->children() as $child)
        $content .= $child->asXml();
    return $content;
}

In your code, replace $body_content = $el->asXml(); with $body_content = simplexml_innerXML($el);

However, you could also switch to another API that offers distinction between innerXML (what you are looking for) and outerXML (what you get for now). Microsoft Dom libary offers this distinction but unfortunately PHP DOM doesn't.

I found that PHP XMLReader API offers this distintion. See readInnerXML(). Though this API has quite a different approach to processing XML. Try it.

Finally, I would stress that XML is not meant to extract data as subtrees but rather as value. That's why you running into trouble finding the right API. It would be more 'standard' to store HTML subtree as a value (and escape all tags) rather than XML subtree. Also beware that some HTML synthax are not always XML compatible ( i.e.
vs ,
). Anyway in practice, you approach is definitely more convenient for editing the xml file.

Welford answered 13/6, 2011 at 5:44 Comment(1)
Thanks for this, one issue though, the code example is slightly broken, $node isn't defined.Kowalczyk
R
1

I would have extend the SimpleXmlElement class:

class MyXmlElement extends SimpleXMLElement{

    final public function innerXML(){
        $tag = $this->getName();
        $value = $this->__toString();
        if('' === $value){
            return null;
        }
        return preg_replace('!<'. $tag .'(?:[^>]*)>(.*)</'. $tag .'>!Ums', '$1', $this->asXml());
    }
}

and then use it like this:

echo $qa->answer->innerXML();
Revolutionize answered 24/8, 2012 at 8:26 Comment(0)
U
0
<?php
    function getInnerXml($xml_text) {           
        //strip the first element
        //check if the strip tag is empty also
        $xml_text = trim($xml_text);
        $s1 = strpos($xml_text,">");        
        $s2 = trim(substr($xml_text,0,$s1)); //get the head with ">" and trim (note that string is indexed from 0)

        if ($s2[strlen($s2)-1]=="/") //tag is empty
            return "";

        $s3 = strrpos($xml_text,"<"); //get last closing "<"        
        return substr($xml_text,$s1+1,$s3-$s1-1);
    }

    var_dump(getInnerXml("<xml />"));
    var_dump(getInnerXml("<xml  /  >faf <  / xml>"));
    var_dump(getInnerXml("<xml      ><  / xml>"));    
    var_dump(getInnerXml("<xml>faf <  / xml>"));
    var_dump(getInnerXml("<xml  >  faf <  / xml>"));      
?>

After I search for a while, I got no satisfy solution. So I wrote my own function. This function will get exact the innerXml content (including white-space, of course). To use it, pass the result of the function asXML(), like this getInnerXml($e->asXML()). This function work for elements with many prefixes as well (as my case, as I could not find any current methods that do conversion on all child node of different prefixes).

Output:

string '' (length=0)    
string '' (length=0)    
string '' (length=0)    
string 'faf ' (length=4)    
string '  faf ' (length=6)
Undersigned answered 29/2, 2012 at 15:30 Comment(0)
H
0
    function get_inner_xml(SimpleXMLElement $SimpleXMLElement)
    {
        $element_name = $SimpleXMLElement->getName();
        $inner_xml = $SimpleXMLElement->asXML();
        $inner_xml = str_replace('<'.$element_name.'>', '', $inner_xml);
        $inner_xml = str_replace('</'.$element_name.'>', '', $inner_xml);
        $inner_xml = trim($inner_xml);
        return $inner_xml;
    }
Hautesavoie answered 4/10, 2013 at 19:12 Comment(0)
U
0

If you don't want to strip CDATA section, comment out lines 6-8.

function innerXML($i){
    $text=$i->asXML();
    $sp=strpos($text,">");
    $ep=strrpos($text,"<");
    $text=trim(($sp!==false && $sp<=$ep)?substr($text,$sp+1,$ep-$sp-1):'');
    $sp=strpos($text,'<![CDATA[');
    $ep=strrpos($text,"]]>");
    $text=trim(($sp==0 && $ep==strlen($text)-3)?substr($text,$sp+9,-3):$text);
    return($text);
}
Unipod answered 20/3, 2014 at 1:30 Comment(0)
M
0

You can just use this function :)

function innerXML( $node )
{
    $name = $node->getName();
    return preg_replace( '/((<'.$name.'[^>]*>)|(<\/'.$name.'>))/UD', "", $node->asXML() );
}
Millrace answered 19/6, 2014 at 6:14 Comment(0)
E
0

Here is a very fast solution i created:

function InnerHTML($Text)
{   
    return SubStr($Text, ($PosStart = strpos($Text,'>')+1), strpos($Text,'<',-1)-1-$PosStart);
}

echo InnerHTML($yourXML->qa->answer->asXML());
Englishry answered 4/12, 2020 at 10:25 Comment(0)
S
-2

using regex you could do this

preg_match(’/<answer(.*)?>(.*)?<\/answer>/’, $xml, $match);
$result=$match[0];
print_r($result);
Slattern answered 20/12, 2009 at 21:17 Comment(1)
This is definately the wrong use case for regex. One should never use it for xml / dom parsing. not talking about that $match[0] always contains the full text to search in. And $xml is an object, not a string.Dottie

© 2022 - 2024 — McMap. All rights reserved.