Bulk load XML referring to auto-increment parent-id
Asked Answered
V

2

3

In short: I want to do an XML Bulk Load to a SQL Server (2008) database and generate auto-increment-id's for a parent, that can be used in the child. This seems limited by the scope: the parent-node is not finished, so not inserted yet. Does anybody know a way around this?

The longer description (sorry, it's really long, but I try to be complete):

From a customer I got a lot of XML-documents with a similar structure from which to generate a test-DB. They are exported for use by another tool, my customer does not have authority nor contacts to influence the structure nor the contents. (The tools were written by another party for the mother-company.) Nor does he have a formal description of the XML or the database they are exported from.

It turns out that the 'top' XML-nodes <Registration> do have ID's, but that these are not unique across documents. (Top nodes is relative, they do have a root node and a list-node, but in the XML they are the highest element that will make it to the database.) The ID's may be used in other XML-documents, because they refer to another object <Case> that is not in the export. So I need to generate auto-increment-id's to keep all <Registration>-elements unique even across files.

My <Registration>-node has many daughters, e.g. the <Activity>-node. These nodes need to refer to their parent, so they should use the generated auto-increment-id. However, since they are part of an unfinished parent-node, the parent-node is still in scope, and it is not inserted in the table yet, as explained in "Record Subset and the Key Ordering Rule" on msdn and technet. However, the examples on these sites have an explicit unique CustomerId, not an auto-generated Id.

Although this documentation about the "Key Ordering Rule" makes it look like this can not be done, I can not believe there is no way around this for XML-files lacking (unique) ID's. Even stranger is: it does insert a parent-id in the child, but the number is one lower. So I assume this to be the auto-increment-id from the previous scope (where 0 is the default with nothing inserted yet, I did expect a NULL). So I do see one work-around: increment the parent-key in my child-table afterwards (UPDATE Activity SET RegistrationId = RegistrationId + 1). However, this does require keeping a limit (WHERE TimeStamp > ...) and no other (manual or scripting) interventions.

I have tried a lot of different relations-ships and VB-scripts (e.g. I would prefer auto-generation of my tables), but I'll just post my latest attempt. This will also serve to illustrate the insertion of the auto-increment-id from the previous scope.

My main issue is:

  • is it possible to get the right auto-incremented parent-id?

But other tips are very welcome, like:

  • what setting to use to auto-generate the auto-increment identity without an explicit CREATE TABLE-statement in SQL?

Generate the tables:

CREATE TABLE [dbo].[Registration](
  [Id] INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_Registration PRIMARY KEY,
  [XmlId] [nvarchar](40) NULL,
)
CREATE TABLE [dbo].[Activity](
  [Id] INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_Activity PRIMARY KEY,
  [RegistrationId] INT CONSTRAINT FK_Activity_Registration FOREIGN KEY (RegistrationId) REFERENCES Registration (Id),
  [XmlId] [nvarchar](1000) NULL,
)

The XML-file to import:

<Updates>
  <Registrations>
    <Registration ID="NonUniqCaseId-123">
      <Activities>
        <Activity ID="UniqActId-1234" />
        <Activity ID="UniqActId-1235" />
      </Activities>
    </Registration>
    <Registration ID="NonUniqCaseId-124">
      <Activities>
        <Activity ID="UniqActId-1241" />
        <Activity ID="UniqActId-1242" />
      </Activities>
    </Registration>
  </Registrations>
</Updates>

The VB-script to test the upload (I want to include a loop in a program later, to handle multiple files):

    Dim objBL 
Set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0")
objBL.ConnectionString = "provider=SQLOLEDB;data source=localhost;database=Test;integrated security=SSPI"
objBL.ErrorLogFile = "error.log"

objBL.CheckConstraints = False
objBL.XMLFragment = False
objBL.SchemaGen = True
objBL.SGDropTables = False
objBL.KeepIdentity = False

objBL.Execute "BulkTestMapping.xsd", "BulkTestContents.xml"
Set objBL = Nothing

The XSD:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"
    attributeFormDefault="qualified"
    elementFormDefault="qualified"
    xmlns:sql="urn:schemas-microsoft-com:mapping-schema">

  <xs:annotation>
    <xs:appinfo>
      <sql:relationship name="Registration_Activity"
            parent="Registration"
            parent-key="Id"
            child="Activity"
            child-key="RegistrationId"
            inverse="true"
            />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="Registration"
              sql:relation="Registration"
              sql:key-fields="Id" 
            >
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Activities" minOccurs="0" maxOccurs="unbounded" sql:is-constant="true">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="Activity" minOccurs="0" maxOccurs="unbounded"
                     sql:relation="Activity" 
                     sql:key-fields="RegistrationId"
                     sql:relationship="Registration_Activity"
              >
                <xs:complexType>
                  <xs:attribute name="ID" sql:field="XmlId" form="unqualified" type="xs:string" />
                  <xs:attribute name="DbId" sql:identity="ignore" sql:field="Id" msdata:AutoIncrement="true" msdata:ReadOnly="true" type="xs:int" /> 
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="ID" form="unqualified" sql:field="XmlId" />
      <xs:attribute name="DbId" sql:identity="ignore" sql:field="Id" msdata:AutoIncrement="true" type="xs:int" /> 
    </xs:complexType>
  </xs:element>
</xs:schema>

The resulting tables (note that RegistrationId is off by one):

[Registration]
Id  XmlId
1   NonUniqCaseId-123
2   NonUniqCaseId-124

[Activity]
Id  RegistrationId  XmlId
1   0   UniqActId-1234
2   0   UniqActId-1235
3   1   UniqActId-1241
4   1   UniqActId-1242

Edit: It is even worse than I thought. If I add the records again, the foreign key (child key) starts at 0 again! So it is going to be hard to impossible to determine what the correction (per table) should be:

[Registration]
Id  XmlId
1   NonUniqCaseId-123
2   NonUniqCaseId-124
3   NonUniqCaseId-123
4   NonUniqCaseId-124

[Activity]
Id  RegistrationId  XmlId
1   0   UniqActId-1234
2   0   UniqActId-1235
3   1   UniqActId-1241
4   1   UniqActId-1242
5   0   UniqActId-1234
6   0   UniqActId-1235
7   1   UniqActId-1241
8   1   UniqActId-1242
Valuator answered 15/11, 2012 at 12:52 Comment(2)
What version of SQL Server are you on?Wifely
SELECT @@version returns Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) Jun 17 2011 00:54:03 Copyright (c) Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)Valuator
V
0

Well the answer turns out to be very simple: just leave out the inverse in the XSD, so remove this line:

inverse="true"

I introduced this because I have many many-to-many-relationships. (My example is a very short extract to reproduce the problem.) But it seems that I have introduced it at too many places.

Speculation: (Unfortunately I have no time to investigate/confirm this next hypothesis.)

I am assuming now, that inverse should only be used for the side that is the daughter of relationship, not the side that is the mother. E.g. when A and B have a many-to-many relationship A_B, and the XML looks something like this:

<ListOfA>
  <A ID="Uniq_A123">
    <A_B>
      <B ID="NonUniq_B234" />
    </A_B>
    <A_B>
      <B ID="NonUniq_B235" />
    </A_B>
  </A>
  <A ID="Uniq_A124">
    <A_B>
      <B ID="NonUniq_B234" />
    </A_B>
  </A>
</ListOfA>

A is implicitly 'parent' of the relationship in A_B by being the XML-mother, B should then explicitly be inversed from child to parent by specifying inverse. However, since I am generating my own Id's for A and B, I doubt if this would work for me, and I will just run repair-queries afterwards.

Valuator answered 28/11, 2012 at 14:31 Comment(0)
W
0

I don't know bulk load with XML so here is an answer to do this using TSQL instead.

In SQL Server 2008 you can use merge in combination with output to create mappings between the source data and targets auto generated id's.

Using merge..output to get mapping between source.id and target.id

Dr. OUTPUT or: How I Learned to Stop Worrying and Love the MERGE

In this case you can merge to Registration and output the child XML nodes with the generated id to a temporary table or table variabl and then use that table for the insert to Activity.

SQL Fiddle

MS SQL Server 2008 Schema Setup:

CREATE TABLE [dbo].[Registration](
  [Id] INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_Registration PRIMARY KEY,
  [XmlId] [nvarchar](40) NULL,
);

CREATE TABLE [dbo].[Activity](
  [Id] INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_Activity PRIMARY KEY,
  [RegistrationId] INT CONSTRAINT FK_Activity_Registration FOREIGN KEY (RegistrationId) REFERENCES Registration (Id),
  [XmlId] [nvarchar](1000) NULL,
);

Query 1:

declare @XML xml = '
<Updates>
  <Registrations>
    <Registration ID="NonUniqCaseId-123">
      <Activities>
        <Activity ID="UniqActId-1234" />
        <Activity ID="UniqActId-1235" />
      </Activities>
    </Registration>
    <Registration ID="NonUniqCaseId-124">
      <Activities>
        <Activity ID="UniqActId-1241" />
        <Activity ID="UniqActId-1242" />
      </Activities>
    </Registration>
  </Registrations>
</Updates>';

declare @T table
(
  RegistrationId nvarchar(40),
  Activities xml
);

merge Registration as T
using
  (
  select R.N.value('@ID', 'nvarchar(40)') as XmlId,
         R.N.query('Activities') as Activities
  from @XML.nodes('/Updates/Registrations/Registration')  as R(N)
  ) as S
on 0 = 1
when not matched then
  insert(XmlId) values (S.XmlId)
output inserted.Id, S.Activities into  @T(RegistrationId, Activities);

insert into Activity(RegistrationId, XmlId)
select T.RegistrationId,
       A.N.value('@ID', 'nvarchar(1000)')
from @T as T
  cross apply T.Activities.nodes('Activities/Activity') as A(N);

Query 2:

select *
from Registration;

Results:

| ID |             XMLID |
--------------------------
|  1 | NonUniqCaseId-123 |
|  2 | NonUniqCaseId-124 |

Query 3:

select *
from Activity;

Results:

| ID | REGISTRATIONID |          XMLID |
----------------------------------------
|  5 |              1 | UniqActId-1234 |
|  6 |              1 | UniqActId-1235 |
|  7 |              2 | UniqActId-1241 |
|  8 |              2 | UniqActId-1242 |
Wifely answered 26/11, 2012 at 16:35 Comment(1)
Thank you for your alternative solution. Because my example is a very short extract of my real XSD (just to reproduce the problem), it would imply writing a lot of queries and paths, which I would rather not do. Fortunately I did find an alternative solution!Valuator
V
0

Well the answer turns out to be very simple: just leave out the inverse in the XSD, so remove this line:

inverse="true"

I introduced this because I have many many-to-many-relationships. (My example is a very short extract to reproduce the problem.) But it seems that I have introduced it at too many places.

Speculation: (Unfortunately I have no time to investigate/confirm this next hypothesis.)

I am assuming now, that inverse should only be used for the side that is the daughter of relationship, not the side that is the mother. E.g. when A and B have a many-to-many relationship A_B, and the XML looks something like this:

<ListOfA>
  <A ID="Uniq_A123">
    <A_B>
      <B ID="NonUniq_B234" />
    </A_B>
    <A_B>
      <B ID="NonUniq_B235" />
    </A_B>
  </A>
  <A ID="Uniq_A124">
    <A_B>
      <B ID="NonUniq_B234" />
    </A_B>
  </A>
</ListOfA>

A is implicitly 'parent' of the relationship in A_B by being the XML-mother, B should then explicitly be inversed from child to parent by specifying inverse. However, since I am generating my own Id's for A and B, I doubt if this would work for me, and I will just run repair-queries afterwards.

Valuator answered 28/11, 2012 at 14:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.