What is the fastest way to get 1,000,000 lines of fixed width text into a SQL Server database?
Asked Answered
S

5

5

I have a file with about 1,000,000 lines of fixed width data in it.

I can read it, parse it, do all that.

What I don't know is the best way to put it into a SQL Server database programmatically. I need to do it either via T-SQL or Delphi or C# (in other words, a command line solution isn't what I need...)

I know about BULK INSERT, but that appears to work with CSV only.....?

Should I create a CSV file from my fixed width data and BULK INSERT that?

By "fastest" I mean "Least amount of processing time in SQL Server".

My desire is to automate this so that it is easy for a "clerk" to select the input file and push a button to make it happen.

What's the best way to get the huge number of fixed width records into a SQL Server table?

Strengthen answered 1/5, 2013 at 15:10 Comment(7)
Is this a one-time thing, or will you need to do it repeatedly? It sounds like you already know how to get the column values and do an INSERT INTO. Is it a question of performance?Valet
Too bad CLI isn't an option as BCP is probably the fastest..Shere
Bulk insert does not work from only a csv file. You can specify field and row terminators. I believe the default fieldterminator is tab and the default rowterminator is newline so if you have a straight text file with 1,000,000 rows that contain no tabs and each input is on a separate row you should be good to go with the default command.Pulpit
@Love2Learn: That may not work if the file is truly fixed width (there will be multiple, varying numbers of whitespace characters between each column), unless you can throw out all the redundant delimiters.Valet
Bulk insertion can made from tsql. BULK INSERT [#MyTestTable] FROM 'D:\MyTextFile.txt'. Provided you can massage the data into a parsable format for bulk, it should be fine.Purpurin
@RobertHarvey -- I'll need to do it repeatedly. I'm trying to automate it. I'll edit the question to indicate that.Strengthen
@NickHodges: you can create a bulkcopy format file, with the format you require, for the bcp/bulk copy operation.Herrenvolk
M
8

I assume that by "fastest" you mean run-time:

The fastest way to do this from compiled code is to use the SQLBulkCopy methods to insert the data directly into your target table. You will have to write your own code to open and read the source file and then split it into the appropriate columns according to their fixed-width offsets and then feed that to SQLBulkCopy. (I think that I have an example of this somewhere, if you want to go this route)

The fastest way to do this from T-SQL would be to shell out to DOS and then use BCP to load the file directly into your target table. You will need to make a BCP Format File that defines the fixed-width columns for this appraoch.

The fastest way to do this from T-SQL, without using any CLI, is to use BULK INSERT to load the file into a staging table with only one column as DATA VARCHAR(MAX) (make that NVARCHAR(MAX) if the file has unicode data in it). Then execute a SQL query you write to split the DATA column into its fixed-width fields and then insert them into your target file. This should only take a single INSERT statement, though it could be a big one. (I have an example of this somewhere as well)

Your other 'fastest' option would be to use an SSIS package or the SQL Server Import Wizard (they're actually the same thing, under the hood). SSIS has a pretty steep learning curve, so it's only really worth it if you expect to be doing this (or things like this) for other cases in the future as well.

On the other hand, the Wizard is fairly easy to use as a one-off. The Wizard can also make a schedulable job, so if you need to repeat the same thing every night, that's certainly the easiest, as long as it actually works on your case/file/data. If it doesn't then it can be a real headache to get it right, but fixed-width data should not be a problem.

The fastest of all of these options has always been (and likely will always be) BCP.

Manganese answered 1/5, 2013 at 15:40 Comment(0)
N
5

I personally would do this with an SSIS package. It has the flexibility to handle a fixed width defintion.

If this is one time load, use the wizard to import the data. If not. create a package yourself and then schedule it to run periodically.

Nowhere answered 1/5, 2013 at 15:25 Comment(0)
M
1

What I do is load an IDataReader that is wired to the import file.

Then I loop over the IDataReader, validate each row, sometimes massage the data in each row, then push that into Xml (or a DataSet and piggy back of the ds.GetXml() method).

Then every so many rows (every 1,000 let's say), I push them down to a stored procedure that can handle an xml input.

If a single row fails validation, I log it for later. (If I had 1,000,000 rows, and its ok to miss one so I have 999,999 rows properly imported, I handle the errant entry later).

If my bulk insert xml fails (with 1,000 rows in it), I log that entire xml. You could go over a failed set (of 1,000) and import those 1 by 1, and log the bad ones that way I guess. Aka, do 1,000 at a time, until 1,000 fails, then do them 1 by 1.

I have an example written here:

http://granadacoder.wordpress.com/2009/01/27/bulk-insert-example-using-an-idatareader-to-strong-dataset-to-sql-server-xml/

Manas answered 1/5, 2013 at 15:19 Comment(1)
What I like about my solution is that you can "tweak" the "1,000" to a number that is the goldie locks number. And you have in place a code base to pre-validate and to pre-massage. And you have a "what if something goes wrong" mechanism. And if the IMPORT file changes, you basically only have to alter the IDataReader "wire up".Manas
S
1

You have a number of choices, but depends what you mean by fastest. Fastest for one completion timed from I'll do it now? There is a wizard in SQL managment studio. Fastest to do it on a monthly basis with minimum learning curve. There is the DTS wizard in SQL Managment studio. Minimum SQL engine cycles for doing it every night? SSIS http://en.wikipedia.org/wiki/SQL_Server_Integration_Services

Steiner answered 1/5, 2013 at 15:21 Comment(1)
I agree. Does "Fastest" mean "Quickest to Develop" or "Least Amount of Time to Excecute the actual Import"................Manas
H
1

The bulk insert or bcp is the fastest way to do this, because it's an unlogged operation. You can easily be able to insert 10k rows per second, from my experience.

In order to bulk insert fixed width data, you need to create a bulk copy format file:

http://msdn.microsoft.com/en-us/library/ms178129.aspx

Herrenvolk answered 1/5, 2013 at 16:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.