BULK INSERT with identity (auto-increment) column
Asked Answered
T

9

84

I am trying to add bulk data in database from CSV file.

Employee table has a column ID (PK) auto-incremented.

CREATE TABLE [dbo].[Employee](
 [id] [int] IDENTITY(1,1) NOT NULL,
 [Name] [varchar](50) NULL,
 [Address] [varchar](50) NULL
) ON [PRIMARY]

I am using this query:

BULK INSERT Employee  FROM 'path\tempFile.csv ' 
WITH (FIRSTROW = 2,KEEPIDENTITY,FIELDTERMINATOR = ',' , ROWTERMINATOR = '\n');

.CSV File -

Name,Address
name1,addr test 1
name2,addr test 2

but it results in this error message:

Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (id).

Tactics answered 1/6, 2012 at 13:18 Comment(3)
Can you please post your sample data present in csv fileCapwell
I was looking for that KEEPIDENTITY thing... thanks!Epiphora
There is actually a very simple solution for this. Create a view, omitting just the ID column, then do your bulk insert into the view.Uuge
P
63

Don't BULK INSERT into your real tables directly.

I would always

  1. insert into a staging table dbo.Employee_Staging (without the IDENTITY column) from the CSV file
  2. possibly edit / clean up / manipulate your imported data
  3. and then copy the data across to the real table with a T-SQL statement like:

    INSERT INTO dbo.Employee(Name, Address) 
       SELECT Name, Address
       FROM dbo.Employee_Staging
    
Petrosal answered 1/6, 2012 at 13:24 Comment(7)
idea is good, but is there any specific reason of why you are using staging table?Tactics
@Abhi: I can (1) remove the IDENTITY column which causes grief, and (2) I can look at the data, possibly remove certain rows, update some rows, before actually importing into the real table.Petrosal
@Petrosal While this is good advice, this doesn't answer the question. jwerts should really take best answer.Topsoil
Good general advice. But even with a staging table, what do you do if you want a record of the row number from the input file?Creasy
> Don't BULK INSERT into your real tables directly. - This advice is valid for MSSQL DB only, there are no such limitations in e.g. PostgresIndigestion
And it's as valid a requirement or otherwise for MSSQL as it is for PostGres.Limitless
This does not answers the question posed, it merely suggest a workaround. Removing the id col is not a usable approch if the exact order of the records in the file is important, then the id column is motivated in the staging table.Cupreous
C
111

Add an id column to the csv file and leave it blank:

id,Name,Address
,name1,addr test 1
,name2,addr test 2

Remove KEEPIDENTITY keyword from query:

BULK INSERT Employee  FROM 'path\tempFile.csv ' 
WITH (FIRSTROW = 2,FIELDTERMINATOR = ',' , ROWTERMINATOR = '\n');

The id identity field will be auto-incremented.

If you assign values to the id field in the csv, they'll be ignored unless you use the KEEPIDENTITY keyword, then they'll be used instead of auto-increment.

Callaghan answered 12/9, 2012 at 20:22 Comment(4)
Even with a staging table, if you want a record of the row number from the input file you'll need something like this. I like it!Creasy
Hi @Josh Werts. So far your solution has been a blessing to me. It works on my local DB. Now when I connect to a remote DB with Microsoft SQL Server Management studio, and run the command, I get the error "Cannot bulk load because the file "D:\data.csv" could not be opened. Operating system error code 21 (The device is not ready). Must the csv file be on the same server has the DB. The csv file is on my computerInoperative
@FokwaBest - I would imagine the remote server has no concept of your D: drive. I think you would need to create a share folder that the remote server has access to and then reference it that way.... something like \\myshare\data.csv. I'm no expert here and really don't work in sql server much, so maybe someone else can answer if that doesn't work.Callaghan
the solution is good but what if u are not able to update the csv files .what if there is a condition that u have to download file and upload it into db what should we do in such case?Savitt
P
63

Don't BULK INSERT into your real tables directly.

I would always

  1. insert into a staging table dbo.Employee_Staging (without the IDENTITY column) from the CSV file
  2. possibly edit / clean up / manipulate your imported data
  3. and then copy the data across to the real table with a T-SQL statement like:

    INSERT INTO dbo.Employee(Name, Address) 
       SELECT Name, Address
       FROM dbo.Employee_Staging
    
Petrosal answered 1/6, 2012 at 13:24 Comment(7)
idea is good, but is there any specific reason of why you are using staging table?Tactics
@Abhi: I can (1) remove the IDENTITY column which causes grief, and (2) I can look at the data, possibly remove certain rows, update some rows, before actually importing into the real table.Petrosal
@Petrosal While this is good advice, this doesn't answer the question. jwerts should really take best answer.Topsoil
Good general advice. But even with a staging table, what do you do if you want a record of the row number from the input file?Creasy
> Don't BULK INSERT into your real tables directly. - This advice is valid for MSSQL DB only, there are no such limitations in e.g. PostgresIndigestion
And it's as valid a requirement or otherwise for MSSQL as it is for PostGres.Limitless
This does not answers the question posed, it merely suggest a workaround. Removing the id col is not a usable approch if the exact order of the records in the file is important, then the id column is motivated in the staging table.Cupreous
A
43

I had a similar issue, but I needed to be sure that the order of the ID is aligning to the order in the source file. My solution is using a VIEW for the BULK INSERT:

Keep your table as it is and create this VIEW (select everything except the ID column)

CREATE VIEW [dbo].[VW_Employee]
AS
SELECT [Name], [Address]
FROM [dbo].[Employee];

Your BULK INSERT should then look like:

BULK INSERT [dbo].[VW_Employee] FROM 'path\tempFile.csv ' 
WITH (FIRSTROW = 2,FIELDTERMINATOR = ',' , ROWTERMINATOR = '\n');
Acker answered 25/4, 2014 at 13:51 Comment(3)
This is by far the best solution to this questionMugwump
I agree this is the way to do this.Watford
The simplest and easiest way to run a bulk insert into a staging table that doesn't match the source file. However I just discovered today that apparently there is no guarantee that the data will be inserted in order of the file. Which is a killer for import of mainframe header/detail type filesFarland
T
10

You have to do bulk insert with format file:

   BULK INSERT Employee FROM 'path\tempFile.csv ' 
   WITH (FORMATFILE = 'path\tempFile.fmt');

where format file (tempFile.fmt) looks like this:

11.0
2
1 SQLCHAR 0 50 "\t"  2  Name   SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 50 "\r\n" 3  Address  SQL_Latin1_General_CP1_CI_AS

more details here - http://msdn.microsoft.com/en-us/library/ms179250.aspx

Trainbearer answered 27/7, 2014 at 7:32 Comment(1)
This should work, reading the documentation, skipping columns is covered, but not skipping PK columns is not specifically covered. In practice, I have to been able to do this without errors: Msg 4866, Level 16, State 7, Line 6 The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly. Msg 7301, Level 16, State 2, Line 6 Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".Duumvir
R
2

My solution is to add the ID field as the LAST field in the table, thus bulk insert ignores it and it gets automatic values. Clean and simple ...

For instance, if inserting into a temp table:

CREATE TABLE #TempTable 
(field1 varchar(max), field2 varchar(max), ... 
ROW_ID int IDENTITY(1,1) NOT NULL)

Note that the ROW_ID field MUST always be specified as LAST field!

Rafa answered 4/4, 2016 at 10:16 Comment(3)
This may work in some situations, but when I attempted this, it just gave me a blank table.Kaif
does not seem to work in 2008R2. I get same error as OP doing a bulk-insert on 4 columns, with 4 columns in my data file, with a 5th identity column in the target table. In error.txt file I get Row 2 File Offset 528 ErrorFile Offset 0 - HRESULT 0x80020005 and in my stderr: msgtext = 'Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 5 (QMid).'Smokeless
With SQL Server 2022 this did not work. I got an error message about unexpected end of file. I confirmed that the view trick mentioned above does work.Kei
B
2
  1. Create a table with Identity column + other columns;
  2. Create a view over it and expose only the columns you will bulk insert;
  3. BCP in the view
Borax answered 16/10, 2018 at 15:37 Comment(1)
Welcome to Stack Overflow. The best answers include more explanation about how your answer is helpful, and possibly about how your answer is different from others.Transubstantiation
I
1

I had this exact same problem which made loss hours so i'm inspired to share my findings and solutions that worked for me.

1. Use an excel file

This is the approach I adopted. Instead of using a csv file, I used an excel file (.xlsx) with content like below.

id  username   email                token website

    johndoe   [email protected]        divostar.com
    bobstone  [email protected]        divosays.com

Notice that the id column has no value.

Next, connect to your DB using Microsoft SQL Server Management Studio and right click on your database and select import data (submenu under task). Select Microsoft Excel as source. When you arrive at the stage called "Select Source Tables and Views", click edit mappings. For id column under destination, click on it and select ignore . Don't check Enable Identity insert unless you want to mantain ids incases where you are importing data from another database and would like to maintain the auto increment id of the source db. Proceed to finish and that's it. Your data will be imported smoothly.

2. Using CSV file

In your csv file, make sure your data is like below.

id,username,email,token,website
,johndoe,[email protected],,divostar.com
,bobstone,[email protected],,divosays.com

Run the query below:

BULK INSERT Metrics FROM 'D:\Data Management\Data\CSV2\Production Data 2004 - 2016.csv '
WITH (FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '\n');

The problem with this approach is that the CSV should be in the DB server or some shared folder that the DB can have access to otherwise you may get error like "Cannot opened file. The operating system returned error code 21 (The device is not ready)".

If you are connecting to a remote database, then you can upload your CSV to a directory on that server and reference the path in bulk insert.

3. Using CSV file and Microsoft SQL Server Management Studio import option

Launch your import data like in the first approach. For source, select Flat file Source and browse for your CSV file. Make sure the right menu (General, Columns, Advanced, Preview) are ok. Make sure to set the right delimiter under columns menu (Column delimiter). Just like in the excel approach above, click edit mappings. For id column under destination, click on it and select ignore .

Proceed to finish and that's it. Your data will be imported smoothly.

Inoperative answered 18/11, 2016 at 13:35 Comment(0)
C
1

This is a very old post to answer, but none of the answers given solves the problem without changing the posed conditions, which I can't do.

I solved it by using the OPENROWSET variant of BULK INSERT. This uses the same format file and works in the same way, but it allows the data file be read with a SELECT statement.

Create your table:

CREATE TABLE target_table(
id bigint IDENTITY(1,1),
col1 varchar(256) NULL,
col2 varchar(256) NULL,
col3 varchar(256) NULL)

Open a command window an run:

bcp dbname.dbo.target_table format nul -c -x -f C:\format_file.xml -t; -T

This creates the format file based on how the table looks.

Now edit the format file and remove the entire rows where FIELD ID="1" and COLUMN SOURCE="1", since this does not exist in our data file.
Also adjust terminators as may be needed for your data file:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="256" COLLATION="Finnish_Swedish_CI_AS"/>
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="256" COLLATION="Finnish_Swedish_CI_AS"/>
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="256" COLLATION="Finnish_Swedish_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="2" NAME="col1" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="3" NAME="col2" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="4" NAME="col3" xsi:type="SQLVARYCHAR"/>
 </ROW>
</BCPFORMAT>

Now we can bulk load the data file into our table with a select, thus having full controll over the columns, in this case by not inserting data into the identity column:

INSERT INTO target_table (col1,col2, col3)
SELECT * FROM  openrowset(
bulk 'C:\data_file.txt',
formatfile='C:\format_file.xml') as t;
Cupreous answered 5/2, 2019 at 17:45 Comment(0)
B
0

Another option, if you're using temporary tables instead of staging tables, could be to create the temporary table as your import expects, then add the identity column after the import.

So your sql does something like this:

  1. If temp table exists, drop
  2. Create temp table
  3. Bulk Import to temp table
  4. Alter temp table add identity
  5. < whatever you want to do with the data >
  6. Drop temp table

Still not very clean, but it's another option... might have to get locks to be safe, too.

Betake answered 17/11, 2015 at 11:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.