Remove namespace from XML using PHP
Asked Answered
L

5

22

I have an XML document that looks like this:

<Data 
  xmlns="http://www.domain.com/schema/data" 
  xmlns:dmd="http://www.domain.com/schema/data-metadata"
>
  <Something>...</Something>
</Data>

I am parsing the information using SimpleXML in PHP. I am dealing with arrays and I seem to be having a problem with the namespace.

My question is: How do I remove those namespaces? I read the data from an XML file.

Thank you!

Lith answered 7/8, 2009 at 17:7 Comment(1)
If you'd like details... my original question was posted here, which a user already answered (Thanks!). But I found out that the namespace is causing his loops not to run and return an empty array. The original question located here: #1209801Lith
G
20

If you're using XPath then it's a limitation with XPath and not PHP look at this explanation on xpath and default namespaces for more info.

More specifically its the xmlns="" attribute in the root node which is causing the problem. This means that you'll need to register the namespace then use a QName thereafter to refer to elements.

$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
$feed->registerXPathNamespace("a", "http://www.domain.com/schema/data");
$result = $feed->xpath("a:Data/a:Something/...");

Important: The URI used in the registerXPathNamespace call must be identical to the one that is used in the actual XML file.

Gyre answered 7/8, 2009 at 17:31 Comment(3)
Right, so instead of removing... I just register the namespace. And this fixed my problem!!! You're the man! Thanks!Lith
Unfortunately, this seem to be the only way.Dinkins
Note the Important section. I had missed that the first time I viewed this answer.Openhearth
P
21

I found the answer above to be helpful, but it didn't quite work for me. This ended up working better:

// Gets rid of all namespace definitions 
$xml_string = preg_replace('/xmlns[^=]*="[^"]*"/i', '', $xml_string);

// Gets rid of all namespace references
$xml_string = preg_replace('/[a-zA-Z]+:([a-zA-Z]+[=>])/', '$1', $xml_string);
Photocopier answered 3/10, 2011 at 23:3 Comment(3)
I'd get rid of "all namespace references" with something like this: $xml = preg_replace('/(<\/*)[^>:]+:/', '$1', $xml);Scabrous
One of the few times in my life I've upvoted a solution to manipulate XML with regex. I really don't want to register a default namespace and needlessly clutter up my xpath queries.Kauai
Almost perfect. Needs to look for a potential space after the node name. Strips node content if it has a colon <node>Order:Num</node>, also doesn't find numeric keys <ns:addr2>Content</ns:addr2>. Try: $xml_string = preg_replace('/(<\/|<)[a-zA-Z]+:([a-zA-Z0-9]+[ =>])/', '$1$2', $xml_string);Sadness
G
20

If you're using XPath then it's a limitation with XPath and not PHP look at this explanation on xpath and default namespaces for more info.

More specifically its the xmlns="" attribute in the root node which is causing the problem. This means that you'll need to register the namespace then use a QName thereafter to refer to elements.

$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
$feed->registerXPathNamespace("a", "http://www.domain.com/schema/data");
$result = $feed->xpath("a:Data/a:Something/...");

Important: The URI used in the registerXPathNamespace call must be identical to the one that is used in the actual XML file.

Gyre answered 7/8, 2009 at 17:31 Comment(3)
Right, so instead of removing... I just register the namespace. And this fixed my problem!!! You're the man! Thanks!Lith
Unfortunately, this seem to be the only way.Dinkins
Note the Important section. I had missed that the first time I viewed this answer.Openhearth
P
2

The following PHP code automatically detects the default namespace specified in the XML file under the alias "default". No all xpath queries have to be updated to include the prefix default:

So if you want to read XML files rather they contain an default NS definition or they don't and you want to query all Something elements, you could use the following code:

$xml = simplexml_load_file($name);
$namespaces = $xml->getDocNamespaces();
if (isset($namespaces[''])) {
    $defaultNamespaceUrl = $namespaces[''];
    $xml->registerXPathNamespace('default', $defaultNamespaceUrl);
    $nsprefix = 'default:';
} else {
    $nsprefix = '';
}

$somethings = $xml->xpath('//'.$nsprefix.'Something');

echo count($somethings).' times found';
Poncho answered 30/10, 2012 at 15:45 Comment(0)
I
2

When you just want your xml, parsed to be used, and you don't care for any namespaces, you just remove them. Regular expressions are good, and way faster than my method below.

But for a safer approach when removing namespaces, one could parse the xml with SimpleXML and ask for the namespaces it has, like below:

$xml = '...';
$namespaces = simplexml_load_string($xml)->getDocNamespaces(true);
//The line bellow fetches default namespace with empty key, like this: '' => 'url'
//So we remove any default namespace from the array
$namespaces = array_filter(array_keys($namespaces), function($k){return !empty($k);});
$namespaces = array_map(function($ns){return "$ns:";}, $namespaces);

$ns_clean_xml = str_replace("xmlns=", "ns=", $xml);
$ns_clean_xml = str_replace($namespaces, array_fill(0, count($namespaces), ''), $ns_clean_xml);
$xml_obj = simplexml_load_string($ns_clean_xml);

Thus you hit replace only for the namespaces avoiding to remove anything else the xml could have.

Actually I am using it as a method:

function refined_simplexml_load_string($xml_string) {
  if(false === ($x1 = simplexml_load_string($xml_string)) ) return false;
  
  $namespaces = array_keys($x1->getDocNamespaces(true));
  $namespaces = array_filter($namespaces, function($k){return !empty($k);});
  $namespaces = array_map(function($ns){return "$ns:";}, $namespaces);
  
  return simplexml_load_string($ns_clean_xml = str_replace(
    array_merge(["xmlns="], $namespaces),
    array_merge(["ns="], array_fill(0, count($namespaces), '')),
    $xml_string
  ));
}
Iong answered 7/2, 2021 at 20:8 Comment(1)
Thanks a lot for sharing your solution. I had some other method for doing this (PHP 7.2), and it was serving me well for years. However, for some weird reason it wasn't really doing any kind of cleanup in PHP 8.1. I could not find anything relevant between releases, but your method works with both PHP versionsPlasterboard
G
0

To remove the namespace completely, you'll need to use Regular Expressions (RegEx). For example:

$feed = file_get_contents("http://www.sitepoint.com/recent.rdf");
$feed = preg_replace("/<.*(xmlns *= *[\"'].[^\"']*[\"']).[^>]*>/i", "", $feed); // This removes ALL default namespaces.
$xml_feed = simplexml_load_string($feed);

Then you've stripped any xml namespaces before you load the XML (be careful with the regex through, because if you have any fields with something like:

<![CDATA[ <Transfer xmlns="http://redeux.example.com">cool.</Transfer> ]]>

Then it will strip the xmlns from inside the CDATA which may lead to unexpected results.

Gyre answered 7/8, 2009 at 17:48 Comment(1)
Nice, but it doesn't remove closing tag'sMerrifield

© 2022 - 2024 — McMap. All rights reserved.