How do you merge tables with autonumber primary keys?

Asked 29/9, 2010 at 18:19 Answered 23/1 at 21:29

Solved sql-server database-design identity-column

I suppose everyone runs into this problem once in a while: you have two tables that have autonumber primary keys that need to be merged. There are many good reasons why autonumber primary keys are used in favour of say application-generated keys, but merging with other tables must be one of the biggest drawbacks.

Some problems that arise are overlapping ids and out of sync foreign keys. I would like to hear your approach for tackling this. I always run into problems, so I'm very curious if anybody has some sort of a general solution.

-- EDIT --

In response to the answers suggesting to use guids or other non-numeric keys, there are situations where in advance it just seems a better idea to use autonumber keys (and you regret this later), or you're taking over someone else's project, or you get some legacy database that you have to work with. So I'm really looking for a solution where you have no control over the database design anymore.

Parody answered 29/9, 2010 at 18:19 Comment(0)

Hm, I'm kind of enthousiastic about the idea that I just put in a comment at AlexKuznetsov's answer, so I'll make a whole answer about it.

Consider the tables to be named table1 and table2, with id1 and id2 as autonumber primary keys. They will be merged to table3 with id3 (a non-autonumber primary key).

Why not:

Remove all foreign key constraints to table1 and table2
For all foreign key fields referring to table1, execute an UPDATE table SET id1 = id1 * 2, and for FK fields referring to table2, execute an UPDATE table SET id2 = (id2) * 2 + 1
Fill table3 by executing an INSERT INTO table3 SELECT id1 * 2 AS id3, ... FROM table1 UNION ALL SELECT id2 * 2 + 1 AS id3 FROM table2
Create new foreign key constraints to table3

It can even work with 3 or more tables, just by using a higher multiplier.

Dennet answered 29/9, 2010 at 21:3 Comment(0)

Solutions include:

Use GUIDs as primary keys instead of a simpler identity field. Very likely to avoid overlaps, but GUIDs are harder to use and don't play nicely with clustered indexes.
Make the primary key into a multi-column key, the second column resolving overlapping values by identifying the source of the merged data. Portable, works better with clustered indexes, but developers hate multi-column keys.
Use natural keys instead of pseudokeys.
Allocate new primary key values for one of the merged tables, and cascade these changes to any dependent rows. This changes a merge operation into an ETL operation. This is the only solution you can use for legacy data, if you can't change the database design.

I'm not sure there's a one-size-fits-all solution. Choose one of these based on the situation.

Neptunian answered 29/9, 2010 at 18:30 Comment(0)

Hm, I'm kind of enthousiastic about the idea that I just put in a comment at AlexKuznetsov's answer, so I'll make a whole answer about it.

Consider the tables to be named table1 and table2, with id1 and id2 as autonumber primary keys. They will be merged to table3 with id3 (a non-autonumber primary key).

Why not:

Remove all foreign key constraints to table1 and table2
For all foreign key fields referring to table1, execute an UPDATE table SET id1 = id1 * 2, and for FK fields referring to table2, execute an UPDATE table SET id2 = (id2) * 2 + 1
Fill table3 by executing an INSERT INTO table3 SELECT id1 * 2 AS id3, ... FROM table1 UNION ALL SELECT id2 * 2 + 1 AS id3 FROM table2
Create new foreign key constraints to table3

It can even work with 3 or more tables, just by using a higher multiplier.

Dennet answered 29/9, 2010 at 21:3 Comment(0)

One of the standard approaches (if not the standard approach), where you're designing for such an eventuality, is to use GUIDs for primary keys rather than integers - merging is then relatively painless as you are guaranteed not to encounter an overlap.

Barring a redesign, tho', I think you're stuck with having to insert into the table, accept that you'll get new primary keys, and ensure that you maintain the mapping from old-to-new ID - then insert referencing data with FK remapped etc. etc. If you data has a "business key" that will remain unique after the insert, this would save on having to keep track of the mapping.

Rickety answered 29/9, 2010 at 18:30 Comment(0)

I fyou are sure you have only two such tables, you can just have even IDs in one table (0,2,4,6,...) and odd IDs in another (1,3,5,7,...)

Lipstick answered 29/9, 2010 at 18:33 Comment(2)

The question was more meant in a generic way, not so much where you know what's going to happen in advance (because then you could have used guids). – Parody 29/9, 2010 at 18:40

It seems an idea though, to calculate new_id = old_id * 2 for the first table and new_id = (old_id * 2) + 1 for the second table. If you do this for all tables involved everything will match again and you can re-enable foreign key constraints. – Dennet 29/9, 2010 at 20:45

Assuming you also have a natural key in the tables to be merged then the process isn't difficult. The natural key is used to deduplicate and to correctly reassign any references. You can renumber the surrogate key values at any time - that being one of the principal advantages of using a surrogate in the first place.

So I don't see this as a problem with surrogate keys - provided you always enforce the natural key (actually I much prefer the term "business key"). If you haven't got business keys for these tables, well maybe now would be a good time to redesign so that ALL the necessary keys are properly implemented.

Shift answered 29/9, 2010 at 19:13 Comment(0)

Here is how it is implemented in our MssqlMerge, kind of "maintain the mapping from old-to-new ID" suggested by @WillA

Let's say we have two tables:

[Parent] ([ParentID] PRIMARY KEY IDENTITY, [Name])
[Child] ([ChildID] PRIMARY KEY IDENTITY, [ParentID] REFERENCES [Parent] ([ParentID]), [Name]).

Now when we merge [Parent] rows to from one db to another, we do the following steps:

a. in target DB create mapping table with columns (SourceID, TargetID)

b. INSERT [Parent] row with data from source table, without specifying ID:

INSERT INTO [Parent] ([Name])
OUTPUT 123 /* Source ID */, INSERTED.[ParentID] INTO [mapping]
VALUES ('Source parent #1');

c. INSERT [Child] row, taking [ParentID] from mapping:

INSERT INTO [Child] ([ParentID], [Name])
VALUES (
    (SELECT [ParentID] FROM [mapping] WHERE [SourceID] = 123 /* Source ID */),
    'Source child #1');

Filters can be applied to the Parent table rows. Child rows are filtered to include only records with ParentID corresponding to the Parent table filter. Going further, we may have a table like Grandchild referencing Child on ChildID and we may select to import Grandchild rows as well, filtering only rows having ChildID related to selected parents. And so on for any foreign key relations depth, each table is merged with its own mapping.

Mapping tables can be left in the target database after merge and can be useful for a number of goals:

you can split large merge into smaller parts, for example merge lookup data first and then operational data
mapping tables can be used to: identify merge result rows, to match these rows with source db or even to rollback merge without need to restore the full backup.

Information about dependent tables is taken from sys.foreign_keys and sys.foreign_key_columns.

Currently the major limitation of this approach is that it can not be used for tables participating in cycled foreign keys (like A references B and B references A at the same time). There is the way around this by breaking the cycle - ignoring one of the foreign keys and then UPDATEing them after INSERTs completion, but that's the subject of further improvements.

Hobson answered 23/1 at 21:29 Comment(0)

Recommended topics

Hot tags