Checking foreign key constraint "online"

If we have a giant fact table and want to add a new dimension, we can do it like this:

BEGIN TRANSACTION

ALTER TABLE [GiantFactTable]
ADD NewDimValueId INT NOT NULL
CONSTRAINT [temp_DF_NewDimValueId] DEFAULT (-1)
WITH VALUES -- table is not actually rebuilt!

ALTER TABLE [GiantFactTable]
WITH NOCHECK
ADD CONSTRAINT [FK_GiantFactTable_NewDimValue]
FOREIGN KEY ([NewDimValueId])
REFERENCES [NewDimValue] ([Id])

-- drop the default constraint, new INSERTs will specify a value for NewDimValueId column
ALTER TABLE [GiantFactTable]
DROP CONSTRAINT [temp_DF_NewDimValueId]

COMMIT TRANSACTION

NB: all of the above only manipulate table metadata and should be fast regardless of table size. Then we can run a job to backfill GiantFactTable.NewDimValueId in small transactions, such that the FK is not violated. (At this point any INSERTs/UPDATEs - e.g. backfill operation - are verified by the FK since it's enabled, but not "trusted")

After the backfill we know the data is consistent, my question is how can SQL engine become enlightened too? Without taking the table offline.

This command will make the FK trusted but it requires a schema modification (Sch-M) lock and likely take hours (days?) taking the table offline:

ALTER TABLE [GiantFactTable]
WITH CHECK CHECK CONSTRAINT [FK_GiantFactTable_NewDimValue]

About the workload: Table has a few hundred partitions (fixed number), data is appended to one partition at a time (in a round-robin fashion), never deleted. There is also a constant read workload that uses the clustering key to get a (relatively small) range of rows from one partition at a time. Checking one partition at a time, taking it offline, would be acceptable. But I can't find any syntax to do this. Any other ideas?

Redirect workloads and run check constraint offline

Create a new table with the same structure.

Change the "insert" workload to insert into the new table

Copy the data from the partition used by the "read" workload to the new table (or a third table with the same structure)

Change the "read" workload to use the new table

Run alter table to check the constraint and let it take as long as it needs

Change the both workloads back to the main table.

Insert the new rows back into the main table

Drop new table(s)

A variation on the above is to switch the relevant partition to the new table in step 3. That should be faster than copying the data but I think you will have to copy (and not just switch) the data back after the constraint has been checked.

Insert all the data into a new table

Create a new table with the same structure and constraint enabled

Change the "insert" workload to the new table

Copy all the data from old to new table in batches and wait as long as it takes to complete

Change the "read" workload to the new table. If step 3 takes too long and the "read" workload needs rows that have only been inserted into the new table, you will have to manage this changeover manually.

Drop old table

Use index to speed up constraint check?

I have no idea if this works but you can try to create a non-clustered index on the foreign key column. Also make sure there's an index on the relevant unique key on the table referenced by the foreign key. The alter table command might be able to use them to speed up the check (at least by minimizing IO compared to doing a full table scan). The indexes, of course, can be created online to avoid any disruption.

Redirect workloads and run check constraint offline

Insert all the data into a new table

Use index to speed up constraint check?

Recommended topics

Hot tags