shredding xml recursively into the database
Asked Answered
L

1

5

I have the following XML data and the Element table.

DECLARE @input XML = '<root>
     <C1>
       <C2>
         <C3>           <C4>data1</C4>       </C3>         
       </C2>
       <C2>
         <C3>data2</C3>
       </C2>
     </C1>
     <D1>
        <D2>data3</D2>
        <D2>data4</D2>
     </D1>
    </root>'

Element table:( this is just an example so can be changed to match an appropriate solution.)

CREATE TABLE Element (  elementId INT IDENTITY PRIMARY KEY, 
elementName VARCHAR (200) NOT NULL, 
parentId INT,   
data VARCHAR(300) );

According to @input the root element is parent of C1 and D1, then C1 is C2 parent, ...

What is the solution for SQL server 2012/2014 to code a stored procedure with CTE (or any other type of SQL object) to recursively put all element names into the Element table?

data column fills with data in this case, the C4 and the second C3, and D2 elements have data the rest of element are null.

I also saw Hierarchical Data type and I wonder if that could be helpful to solve this problem?

Leesaleese answered 19/7, 2014 at 14:28 Comment(3)
Did you think HierarchyId could help you in the process to fill your table or are you open to change the table structure to use HierarchyId?Monogenic
Yeah, I can change the table structure to have HierarchyId type, the goal is shredding the XML into the database, required structure can be made accordingly, the current Element table was the first think that I could think about.Leesaleese
You may want to take a look at creating an edge table with the OpenXml statement. More info here: msdn.microsoft.com/en-us/library/ms175160.aspxKorenblat
M
8

With OpenXML you can get a table representation of your XML with ID and ParentID columns using the metaproperties.

Using the XML query in a merge will allow you to create a mapping table between the elementId identity column and the DOM node id from the XML.

The last step is to use the mapping table to update parentId in Element.

SQL Fiddle

MS SQL Server 2008 Schema Setup:

CREATE TABLE Element (  elementId INT IDENTITY PRIMARY KEY, 
elementName VARCHAR (200) NOT NULL, 
parentId INT,   
data VARCHAR(300) );

Query 1:

declare @input xml = '
<root>
  <C1>
    <C2>
      <C3>
        <C4>data1</C4>
      </C3>
    </C2>
    <C2>
      <C3>data2</C3>
    </C2>
  </C1>
  <D1>
    <D2>data3</D2>
    <D2>data4</D2>
  </D1>
</root>';

-- OpenXML handle
declare @D int;

-- Table that capture output of merge with mapping between 
-- DOM node id and the identity column elementID in Element 
declare @T table
(
  ID int,
  ParentID int,
  ElementID int
);

-- Parse XML and get a handle
exec sp_xml_preparedocument @D output, @input;

-- Add rows to Element and fill the mapping table @T
merge into dbo.Element as E
using ( 
      select *
      from openxml(@D, '//*') with 
        (
          ID int '@mp:id',
          ParentID int '@mp:parentid',
          Data varchar(300) 'text()',
          ElementName varchar(200) '@mp:localname'
        )
      ) as S
on 0 = 1
when not matched by target then
  insert (elementName, data) values (S.ElementName, S.data)
output S.ID, S.ParentID, inserted.elementID into @T;

-- Update parentId in Elemet
update E
set parentId =  T2.ElementID
from dbo.Element as E
  inner join @T as T1
    on E.elementId = T1.ElementID
  inner join @T as T2
    on T1.ParentID = T2.ID


-- Relase the XML document
exec sp_xml_removedocument @D;

select *
from Element;

Results:

| ELEMENTID | ELEMENTNAME | PARENTID |   DATA |
|-----------|-------------|----------|--------|
|         1 |        root |   (null) | (null) |
|         2 |          C1 |        1 | (null) |
|         3 |          C2 |        2 | (null) |
|         4 |          C3 |        3 | (null) |
|         5 |          C4 |        4 |  data1 |
|         6 |          C2 |        2 | (null) |
|         7 |          C3 |        6 |  data2 |
|         8 |          D1 |        1 | (null) |
|         9 |          D2 |        8 |  data3 |
|        10 |          D2 |        8 |  data4 |
Monogenic answered 20/7, 2014 at 6:23 Comment(2)
Thank you this works well, I was wondering if XQuery could do something similar, the real XML files are relatively chunky does it make sense to load them like this?Leesaleese
@FredJand I don't see any issues with loading the XML using this technique. If you are concerned about the target table structure I can only tell you that it depends on how you are going to query the data and what you want to do with it. HierarchyID might be an option for you or you could just dump the entire XML in a XML column one row per XML document. XML indexes and selective XML indexes (SQL Server 2012 SP1) could be helpful if you need to query into the XML. Another option could be to mimic the XML structure as tables with joins.Monogenic

© 2022 - 2024 — McMap. All rights reserved.