Fastest way for inserting very large number of records into a Table in SQL
Asked Answered
N

7

12

The problem is, we have a huge number of records (more than a million) to be inserted into a single table from a Java application. The records are created by the Java code, it's not a move from another table, so INSERT/SELECT won't help.

Currently, my bottleneck is the INSERT statements. I'm using PreparedStatement to speed-up the process, but I can't get more than 50 recods per second on a normal server. The table is not complicated at all, and there are no indexes defined on it.

The process takes too long, and the time it takes will make problems.

What can I do to get the maximum speed (INSERT per second) possible?

Database: MS SQL 2008. Application: Java-based, using Microsoft JDBC driver.

Noblenobleman answered 4/5, 2010 at 14:15 Comment(0)
T
7

Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.

Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.

The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.

Trumpeter answered 4/5, 2010 at 14:37 Comment(1)
I actually have one index on the PK which is clustered, and the data are inserted in PK order, so I don't think it will have any effect. I will be trying BULK INSERT, I guess it's my solution.Noblenobleman
E
10

Batch the inserts. That is, only send 1000 rows at a time, rather then one row at a time, so you hugely reduce round trips/server calls

Performing Batch Operations on MSDN for the JDBC driver. This is the easiest method without reengineering to use genuine bulk methods.

Each insert must be parsed and compiled and executed. A batch will mean a lot less parsing/compiling because a 1000 (for example) inserts will be compiled in one go

There are better ways, but this works if you are limited to generated INSERTs

Excommunicative answered 4/5, 2010 at 14:19 Comment(5)
I think the round-trip is very small part of the delay, because with 50 transactions per second, it takes 20ms for each query to run. The round-trip is smaller than 1ms. I have done other optimizations to remove round-trips but they didn't help much. Unless batching the INSERTs will cause a big part of SQL internal processings more efficient. Does it?Noblenobleman
@Irchi: Each insert must be parsed and compiled and executed. A batch will mean a lot less parsing/compiling because a 1000 (for example) inserts will be compiled in one goExcommunicative
@Irchi: I'd try this before to re-engineer the code to us a BCP approachExcommunicative
one of the reasons that this is more effective is that the mysql query parser does not have to parse each query. I changed a piece of my code in Java (talking to clustered mysql) to use batch inserts of 1000, and the speed increased by 100x (10000%)Elora
I have been testing this, and noticed that performance goes from 25rows/second to 107 rows/second if i set conn.setAutoCommit(false); This seems an essential setting, not mentioned in the MSDN link...Genoese
T
7

Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.

Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.

The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.

Trumpeter answered 4/5, 2010 at 14:37 Comment(1)
I actually have one index on the PK which is clustered, and the data are inserted in PK order, so I don't think it will have any effect. I will be trying BULK INSERT, I guess it's my solution.Noblenobleman
N
2

Have you looked into bulk operations bulk operations?

Nupercaine answered 4/5, 2010 at 14:18 Comment(1)
I will try it, I guess it will be my best solution. The only problem is I have to create files and then run the operation, and I will have to code for the different scenarios that can happen for file storage and network conditions.Noblenobleman
W
1

Have you considered to use batch updates?

Wrens answered 4/5, 2010 at 14:23 Comment(1)
Thanks, I guess this can be helpful too. But I will try BULK INSERT first, it seems more promising!Noblenobleman
G
1

Is there any integrity constraint or trigger on the table ? If so, droping it before inserts will help, but you have to be sure that you can afford the consequences.

Guib answered 4/5, 2010 at 15:31 Comment(1)
There are two FK constraints, I was planning to remove them and give it a try. But BULK INSERT have the option of ignoring the constraints, so I guess using BULK INSERT I will have all the advantages I need.Noblenobleman
S
0

Look into Sql Server's bcp utility.

This would mean a big change in your approach in that you'd be generating a delimited file and using an external utility to import the data. But this is the fastest method for inserting a large number of records into a Sql Server db and will speed up your load time by many orders of magnitude.

Also, is this a one-time operation you have to perform or something that will occur on a regular basis? If it's one time I would suggest not even coding this process but performing an export/import with a combination of db utilities.

Stroller answered 4/5, 2010 at 14:23 Comment(0)
S
0

I would recommend using an ETL engine for it. You can use Pentaho. It's free. The ETL engines are optimized for doing bulk loading on data and also any forms of transformation/validation that are required.

Sargent answered 4/5, 2010 at 14:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.