How to copy a huge table data into another table in SQL Server
Asked Answered
C

12

49

I have a table with 3.4 million rows. I want to copy this whole data into another table.

I am performing this task using the below query:

select * 
into new_items 
from productDB.dbo.items

I need to know the best possible way to do this task.

Censorious answered 14/3, 2011 at 8:5 Comment(4)
what are 34 lac records? MMillions? Not hugh... granted, not small, but not hugh.Euryale
Could you temporarily set the recover model to Simple?Aquarium
"Not hugh...granted" - classic @TomTom!!Gloss
How do I edit someone else's commentFehr
C
19

If you are copying into a new table, the quickest way is probably what you have in your question, unless your rows are very large.

If your rows are very large, you may want to use the bulk insert functions in SQL Server. I think you can call them from C#.

Or you can first download that data into a text file, then bulk-copy (bcp) it. This has the additional benefit of allowing you to ignore keys, indexes etc.

Also try the Import/Export utility that comes with the SQL Management Studio; not sure whether it will be as fast as a straight bulk-copy, but it should allow you to skip the intermediate step of writing out as a flat file, and just copy directly table-to-table, which might be a bit faster than your SELECT INTO statement.

Carrelli answered 14/3, 2011 at 8:20 Comment(1)
I know this is a bit late, but just wanted to share that in my case (with 70M rows of very large rows) the Import/Export utility tends to display a Deferred prepare could not be completed. Query timeout expired (Microsoft SQL Server Native Client 10.0) error.Magazine
F
103

I had the same problem, except I have a table with 2 billion rows, so the log file would grow to no end if I did this, even with the recovery model set to Bulk-Logging:

insert into newtable select * from oldtable

So I operate on blocks of data. This way, if the transfer is interupted, you just restart it. Also, you don't need a log file as big as the table. You also seem to get less tempdb I/O, not sure why.

set identity_insert newtable on
DECLARE @StartID bigint, @LastID bigint, @EndID bigint
select @StartID = isNull(max(id),0) + 1
from newtable

select @LastID = max(ID)
from oldtable

while @StartID <= @LastID
begin
    set @EndID = @StartID + 1000000

    insert into newtable (FIELDS,GO,HERE)
    select FIELDS,GO,HERE from oldtable (NOLOCK)
    where id BETWEEN @StartID AND @EndId
            
    set @StartID = @EndID + 1
end
set identity_insert newtable off
go

You might need to change how you deal with IDs, this works best if your table is clustered by ID.

Ferial answered 8/9, 2011 at 20:59 Comment(5)
A similar approach is using DECLARE @offset int SELECT COUNT(*) = @offset FROM newtable and ...ORDER BY [key(s)] OFFSET @offset ROWS FETCH NEXT @step ROWS ONLY which is easier if the pk or clustered index is multi-columnOpportina
👆 should say DECLARE @offset int SELECT @offset = COUNT(*) FROM newtableOpportina
The @offset approach may get slower as the offset gets larger, I'm not sure how SQL Server optimizes the offset selection.Ferial
I'm assuming it optimizes OFFSET using indices similarly to how it optimizes WHERE but I'm not a DBA ¯_(ツ)_/¯Opportina
I found that the OFFSET x ROWS FETCH NEXT y ROWS ONLY approach does seem to get slower as the offset gets larger.Friesland
C
19

If you are copying into a new table, the quickest way is probably what you have in your question, unless your rows are very large.

If your rows are very large, you may want to use the bulk insert functions in SQL Server. I think you can call them from C#.

Or you can first download that data into a text file, then bulk-copy (bcp) it. This has the additional benefit of allowing you to ignore keys, indexes etc.

Also try the Import/Export utility that comes with the SQL Management Studio; not sure whether it will be as fast as a straight bulk-copy, but it should allow you to skip the intermediate step of writing out as a flat file, and just copy directly table-to-table, which might be a bit faster than your SELECT INTO statement.

Carrelli answered 14/3, 2011 at 8:20 Comment(1)
I know this is a bit late, but just wanted to share that in my case (with 70M rows of very large rows) the Import/Export utility tends to display a Deferred prepare could not be completed. Query timeout expired (Microsoft SQL Server Native Client 10.0) error.Magazine
S
14

I have been working with our DBA to copy an audit table with 240M rows to another database.

Using a simple select/insert created a huge tempdb file.

Using a the Import/Export wizard worked but copied 8M rows in 10min

Creating a custom SSIS package and adjusting settings copied 30M rows in 10Min

The SSIS package turned out to be the fastest and most efficent for our purposes

Earl

Stench answered 22/12, 2015 at 16:51 Comment(1)
can you remember the "adjusting settings"?Blind
C
10

Here's another way of transferring large tables. I've just transferred 105 million rows between two servers using this. Quite quick too.

  1. Right-click on the database and choose Tasks/Export Data.
  2. A wizard will take you through the steps but you choosing your SQL server client as the data source and target will allow you to select the database and table(s) you wish to transfer.

For more information, see https://www.mssqltips.com/sqlservertutorial/202/simple-way-to-export-data-from-sql-server/

Card answered 17/9, 2015 at 11:47 Comment(1)
This technique builds and runs an SSIS package on your behalf. You do have the option to save but not execute the package it creates. This lets you open it and tweak things like rows per batch to optimize for your table characteristics and system resources before running while saving the trouble of initially writing the package from scratch.Energid
K
4

If it's a 1 time import, the Import/Export utility in SSMS will probably work the easiest and fastest. SSIS also seems to work better for importing large data sets than a straight INSERT.

BULK INSERT or BCP can also be used to import large record sets.

Another option would be to temporarily remove all indexes and constraints on the table you're importing into and add them back once the import process completes. A straight INSERT that previously failed might work in those cases.

If you're dealing with timeouts or locking/blocking issues when going directly from one database to another, you might consider going from one db into TEMPDB and then going from TEMPDB into the other database as it minimizes the effects of locking and blocking processes on either side. TempDB won't block or lock the source and it won't hold up the destination.

Those are a few options to try.

-Eric Isaacs

Kandi answered 30/10, 2015 at 22:53 Comment(0)
F
3

I like the solution from @Mathieu Longtin to copy in batches thereby minimising log file issues and created a version with OFFSET FETCH as suggested by @CervEd.

Others have suggested using the Import/Export Wizard or SSIS packages, but that's not always possible.

It's probably overkill for many but my solution includes some checks for record counts and outputs progress as well.

USE [MyDB]
GO

SET NOCOUNT ON;
DECLARE @intStart int = 1;
DECLARE @intCount int;
DECLARE @intFetch int = 10000;
DECLARE @strStatus VARCHAR(200);
DECLARE @intCopied int = 0;

SET @strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Getting count of HISTORY records currently in MyTable...';
RAISERROR (@strStatus, 10, 1) WITH NOWAIT;
SELECT @intCount = COUNT(*) FROM [dbo].MyTable WHERE IsHistory = 1;
SET @strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Count of HISTORY records currently in MyTable: ' + CONVERT(VARCHAR(20), @intCount);
RAISERROR (@strStatus, 10, 1) WITH NOWAIT;  --(note: PRINT resets @@ROWCOUNT to 0 so using RAISERROR instead)
SET @strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Starting copy...';
RAISERROR (@strStatus, 10, 1) WITH NOWAIT;

WHILE @intStart < @intCount
BEGIN

    INSERT INTO [dbo].[MyTable_History] (
        [PK1], [PK2], [PK3], [Data1], [Data2])
    SELECT
        [PK1], [PK2], [PK3], [Data1], [Data2]
    FROM [MyDB].[dbo].[MyTable]
    WHERE IsHistory = 1
    ORDER BY 
        [PK1], [PK2], [PK3]
        OFFSET @intStart - 1 ROWS 
        FETCH NEXT @intFetch ROWS ONLY;

    SET @intCopied = @intCopied + @@ROWCOUNT;
    SET @strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Records copied so far: ' + CONVERT(VARCHAR(20), @intCopied); 
    RAISERROR (@strStatus, 10, 1) WITH NOWAIT;

    SET @intStart = @intStart + @intFetch;

END

--Check the record count is correct.
IF @intCopied = @intCount
    BEGIN
        SET @strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Correct record count.'; 
        RAISERROR (@strStatus, 10, 1) WITH NOWAIT;
    END
ELSE
    BEGIN
        SET @strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Only ' + CONVERT(VARCHAR(20), @intCopied) + ' records were copied, expected: ' + CONVERT(VARCHAR(20), @intCount);
        RAISERROR (@strStatus, 10, 1) WITH NOWAIT;
    END


GO
Friesland answered 17/3, 2022 at 6:29 Comment(0)
Q
2

Simple Insert/Select sp's work great until the row count exceeds 1 mil. I've watched tempdb file explode trying to insert/select 20 mil + rows. The simplest solution is SSIS setting the batch row size buffer to 5000 and commit size buffer to 1000.

Quince answered 8/8, 2018 at 15:12 Comment(0)
C
1

I know this is late, but if you are encountering semaphore timeouts then you can use row_number to set increments for your insert(s) using something like

INSERT INTO DestinationTable (column1, column2, etc) 
 FROM ( 
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN , column1, column2, etc
FROM SourceTable ) AS A
WHERE A.RN >= 1 AND A.RN <= 10000 )

The size of the log file will grow, so there is that to contend with. You get better performance if you disable constraints and index when inserting into an existing table. Then enable the constraints and rebuild the index for the table you inserted into once the insertion is complete.

Choleric answered 16/4, 2021 at 16:35 Comment(0)
A
0

If your focus is Archiving (DW) and are dealing with VLDB with 100+ partitioned tables and you want to isolate most of these resource intensive work on a non production server (OLTP) here is a suggestion (OLTP -> DW) 1) Use backup / Restore to get the data onto the archive server (so now, on Archive or DW you will have Stage and Target database) 2) Stage database: Use partition switch to move data to corresponding stage table
3) Use SSIS to transfer data from staged database to target database for each staged table on both sides 4) Target database: Use partition switch on target database to move data from stage to base table Hope this helps.

Anemometer answered 2/5, 2019 at 14:30 Comment(0)
B
0

I am answering this 12 year old post since this is still relevant and this might be useful by referring to my blog post. It is all about breaking the large table/data into "chunks" and loading the chunks one at a time or in parallel. If there are failures, we just reset the status table that holds the "chunks" and continue from where we left off.

My blog post on copying a Huge table/dataset

The key query that creates the status table with the chunks is this query:

    --++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--Author: Jana Sattainathan (@SQLJana - https://sqljana.wordpress.com)
--Sourcecode: https://sqljana.wordpress.com/?p=7980
--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--## **** Script to split any table into chunks based on a column to chunk it by and create a status tracking table out of the chunk data **** 
--@TableName could be any schema.table 
--@ChunkByColumn can be of any data type but prefer an indexed PK ID type column for performance!
--@ChunkSize decides how big each chunk is when table is broken up into chunks (ranges of ChunkByColumn values)
 
DECLARE @StatusTrackingTable VARCHAR(50) = '[dbo].[JanasTableStatusTracker]';
DECLARE @SourceServerName VARCHAR(50) = @@SERVERNAME;
DECLARE @SourceDBName VARCHAR(50) = DB_NAME();
DECLARE @SourceTableName VARCHAR(50) = '[Sales].[SalesOrderHeader]';
DECLARE @ChunkByColumn VARCHAR(50) = '[SalesOrderID]';
DECLARE @ChunkSize INT = 100;
DECLARE @TargetServerName VARCHAR(50) = @SourceServerName;
DECLARE @TargetDBName VARCHAR(50) = 'DBAT';
DECLARE @TargetTableName VARCHAR(50) = '[dbo].[SalesOrderHeader_Jana]' --CASE WHEN @TargetServerName IS NOT NULL THEN @SourceTableName ELSE NULL END;
 
DECLARE @Batch BIGINT = CAST(FORMAT(GETDATE(),'yyyyMMddHHmmssffff') AS BIGINT);
DECLARE @BatchString VARCHAR(18) = LTRIM(STR(@Batch,18));
DECLARE @SQL NVARCHAR(1600); 
DECLARE @CreateInsertToStatusTableSQL NVARCHAR(1600); 
 
BEGIN
    SET @SQL = 'SELECT          
            [Batch], ' +
            '''' + @SourceServerName + ''' AS SourceServerName, ' +
            '''' + @SourceDBName + ''' AS SourceDBName, ' +
            '''' + @SourceTableName + ''' AS SourceTableName, ' +
            '''' + @ChunkByColumn + ''' AS ChunkByColumn, ' +
            '''' + COALESCE(@TargetServerName,'') + ''' AS TargetServerName, ' +
            '''' + COALESCE(@TargetDBName,'')  + ''' AS TargetDBName, ' +
            '''' + COALESCE(@TargetTableName,'')  + ''' AS TargetTableName, 
            ChunkNumber,
            MIN(Id) AS MinValue,
            MAX(Id) AS MaxValue,
            CAST(COUNT(1) AS BIGINT) AS ChunkRowCount, 
            /*'' SELECT *  FROM ' + @SourceDBName + '.' + @SourceTableName + ' WHERE ' + @ChunkByColumn + ' BETWEEN ''+' + 'LTRIM(STR(MIN(Id)))' + '+'' AND ''+' + 'LTRIM(STR(MAX(Id)))' + ' + '''' AS SimpleChunkSelectSQLForNumIds, */
            '' SELECT *  FROM ' + @SourceDBName + '.' + @SourceTableName + ' WHERE ' + @ChunkByColumn + ' BETWEEN (SELECT MinValue FROM ' + @SourceDBName + '.' + @StatusTrackingTable + ' WHERE [Batch] = ' + @BatchString + ' AND ChunkNumber = ''+' + 'LTRIM(STR(ChunkNumber))' + '+'') AND (SELECT MaxValue FROM ' + @SourceDBName + '.' + @StatusTrackingTable + ' WHERE [Batch] = ' + @BatchString + ' AND ChunkNumber = ''+' + 'LTRIM(STR(ChunkNumber))' + '+'')' + ' '' AS ChunkSelectSQL, ' +
            CASE WHEN (LEN(COALESCE(@StatusTrackingTable,'')) > 0)
                THEN ' CONVERT(VARCHAR(20), NULL) AS Status, 0 AS TargetBatchRowCount, CONVERT(DATETIME, NULL) AS StartDateTime,  CONVERT(DATETIME, NULL) AS EndDateTime'
                ELSE ''
            END + '
        /*INTO STATUSTABLE*/
        FROM
            (
                SELECT ' +
                    @BatchString + ' AS [Batch],
                    ' + @ChunkByColumn + '  AS Id,
                    ROW_NUMBER() OVER (ORDER BY ' + @ChunkByColumn + ') RowNum,
                    CEILING(ROW_NUMBER() OVER (ORDER BY ' + @ChunkByColumn + ')/' + LTRIM(STR(@ChunkSize)) + '.0 /*INT to REAL*/) AS ChunkNumber
                FROM 
                    ' + @SourceTableName + '
            ) splits
        GROUP BY
            [Batch], ChunkNumber
        ORDER BY
            ChunkNumber';
 
    --Create or insert the data into the status tracking table if a name is provided
    IF (LEN(COALESCE(@StatusTrackingTable,'')) = 0)
    BEGIN
        PRINT @SQL;
        EXECUTE(@SQL);
    END
    ELSE
    BEGIN
        IF OBJECT_ID(@StatusTrackingTable, 'U') IS NOT NULL
        BEGIN
            SET @CreateInsertToStatusTableSQL = 'INSERT INTO ' + @StatusTrackingTable + ' ' + @SQL;
        END;
        ELSE
        BEGIN
            SET @CreateInsertToStatusTableSQL = REPLACE(@SQL, '/*INTO STATUSTABLE*/', (' INTO ' + @StatusTrackingTable + ' '));         
        END;
 
        PRINT @CreateInsertToStatusTableSQL;
        EXECUTE(@CreateInsertToStatusTableSQL);
    END;
END;
Baca answered 7/2, 2024 at 12:38 Comment(0)
S
0

For anyone out there looking to accomplish this and remove source records as you go, keep it batched and just scoot the records over to the new table with an OUTPUT clause.

WHILE (SELECT TOP 1 1 FROM SourceTable) IS NOT NULL
BEGIN
    DELETE TOP (100000)
    FROM SourceTable
    OUTPUT DELETED.* --Either * or list columns
    INTO TargetTable
END;

If you're not looking to delete, try using a cursor to populate a table var from your source table, set a check in your cursor so that when the table var hits ~100K rows, it performs an insert then truncates the table var. (Sorry, I don't have time to dive into that at the moment.)

"What can I say, except..."

Skyway answered 5/6, 2024 at 17:52 Comment(0)
E
-17

select * into new_items from productDB.dbo.items

That pretty much is it. THis is the most efficient way to do it.

Euryale answered 14/3, 2011 at 8:8 Comment(1)
Maybe most efficient in terms of characters typed.Balbur

© 2022 - 2025 — McMap. All rights reserved.