How do you convert a CSV string into a table in SQL?
Asked Answered
D

3

6

Let's say you want to import a CSV file into a table in your SQL server, but you don't have the file, just a sting in CSV format. In our case, our design won't let us use one of the built-in file import functions (i.e. we don't actually have access to the file, just a string in CSV format; if you can access the file, see Underhill Julian's response below to save yourself the headache).

After hours of Googling, I was desperately surprised to find that there isn't a built-in function to do this (at least in SQL Server 2017). Further, I couldn't find anyone that posted a function to accomplish this in a real-world scenario (outside of being passed a perfectly formatted string). The problem is that it isn't as easy as just splitting a string on commas and line breaks. What happens if you have a comma or line break within a column?

After much, much head-banging-against-the-wall: Here you go. A truly dynamic procedure that will read in any CSV and spit out a dynamic table. See answer below.

Dehydrate answered 15/10, 2020 at 1:24 Comment(2)
If I read that proc correctly.. you still need to manually create the final table right?Canvasback
Yes, if you want to create a new table. In our case, we actually want to load records to an existing table. This procedure was created as a utility so we can reuse it for CSV strings in other formats. If you call it from another procedure, you can use the table it passes back to insert, update, or create new, or simply modify this procedure to actually create a table rather than using a temporary table.Dehydrate
D
3
    /*
    
    --- Description ---
    
    CSV text exported from Excel allows cells to contain commas and line breaks, so splitting
    the CSV isn't as simple as splitting on comma and end-of-line. We need to temporarly
    replace any LF (line feed) or comma characters that are in a cell so that we can correctly
    split the CSV string. 
    
    --- Legend ---
            
    CHAR(10) = LF (line feed/end of line)
    CHAR(13) = CR (carraige return/end of line)
    CHAR(128) = Represents line feed within quoted column
    CHAR(129) = Respesents comma within quoted column
    CHAR(130) = Represents quoted text within quoted column
    
    --- Instructions ---
    
    1. This assumes you will create this procedure exactly (so you can reuse it) and build a separate import procedure. 
    In your procedure, get the @csv text from the file (or whatever your source is). Ex:
    
    DECLARE @csv NVARCHAR(MAX)
    
    select @csv = <get your CSV string how ever you need to>
    
    2. In your procedure, create your temporary table based on the CSV data (must match 
    what you expect to be returned) Ex:
    
    CREATE TABLE #import(
        ...
    )
    
    3. In your procedure, call this procedure and insert the restults. Ex:
    
    INSERT INTO #import EXEC sp_CSVtoTable @csv 

    4. Now you're ready to do whatever you need with the data (insert into permanent table, etc.)     
    */
    
    -- exec sp_00_CSVtoTable
    CREATE PROCEDURE [dbo].[sp_00_CSVtoTable](
        @csv NVARCHAR(MAX)
    )
    AS
    BEGIN
        SET NOCOUNT ON;
        SET ANSI_WARNINGS OFF
    
        /*************************************
        Clean Text
        **************************************/
        SELECT @csv = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@csv
            , CHAR(239), '') --Some funky character at the beginning of most CSV strings pulled from Excel
            , CHAR(187), '')
            , CHAR(191), '')
            , CHAR(13), CHAR(10)) --Windows line ending is CR LF; remove CR and keep LF for consitency across OS
            , CHAR(10) + CHAR(10), CHAR(10)) --Replace LF LF with LF
    
        
        /*************************************
        Replace LF and commas in-cell with
        temporary characters
        **************************************/
        
        --Separate lines that are surrounded by quotes
        DECLARE @csvReplaceTextInCell TABLE(
            row INT IDENTITY(1,1)
            , m_text NVARCHAR(MAX)
        )
    
        SELECT @csv = REPLACE(@csv, '""', CHAR(130)) --Double quotes represent an escaped quotation mark (a quotation mark used within a cell)
    
        INSERT INTO @csvReplaceTextInCell SELECT value FROM STRING_SPLIT(@csv, '"') -- Each quoted column will be on a separate row. The rows preceding and following will be unquoted
        
        UPDATE @csvReplaceTextInCell SET m_text = REPLACE(REPLACE(m_text
                                                    , CHAR(10), CHAR(128)) --Replace LF on quoted lines with temp character
                                                    , ',', CHAR(129)) --Replace commas on quoted lines with temp character
        WHERE row % 2 = 0 --Every other row will be a quoted column (i.e. even rows will be quoted columns)
    
        --Stuff and string_agg seem to be very unreliable. Re-concatenate string the old fashioned way
        DECLARE @thisrow INT, @totalrows INT
    
        SELECT @thisrow = 1, @totalrows = COUNT(*), @csv = ''
        FROM @csvReplaceTextInCell
    
        WHILE @thisrow <= @totalrows
        BEGIN
            SELECT @csv = @csv + m_text
            FROM @csvReplaceTextInCell
            WHERE row = @thisrow
            
            SELECT @thisrow = @thisrow + 1
        END
        
        /*************************************
        Split CSV string into rows
        **************************************/
    
        CREATE TABLE  #csvRows (
            attdata NVARCHAR(MAX)
        )
        
        INSERT INTO #csvRows SELECT value FROM STRING_SPLIT(@csv, CHAR(10))
    
        DELETE FROM #csvRows WHERE attdata = '' OR attdata = ',' --remove blank row
    
        ALTER TABLE #csvRows ADD row INT IDENTITY(1,1)
    
        /*************************************
        Create the table
        **************************************/
    
        CREATE TABLE #csvTable(
            row INT
        )
    
        DECLARE @sql NVARCHAR(MAX)= 'ALTER TABLE #csvTable ADD '
    
        SELECT @sql = @sql + 'c' + CONVERT(VARCHAR, ROW_NUMBER() OVER (ORDER BY value)) + ' NVARCHAR(MAX),' 
        FROM STRING_SPLIT((SELECT attdata FROM #csvRows WHERE row = 1), ',')
    
        SELECT @sql = LEFT(@sql, LEN(@sql) - 1)
    
        --PRINT @sql
    
        EXEC (@sql)
    
        ALTER TABLE #csvTable DROP COLUMN row
    
        /*************************************
        Split CSV string into columns for each row
        **************************************/
    
        SELECT @totalrows = COUNT(*), @thisrow = 1 FROM #csvRows
    
        WHILE @thisrow <= @totalrows
        BEGIN
            SELECT @sql = 'INSERT INTO #csvTable SELECT '
    
            SELECT @sql = @sql + '''' 
                + REPLACE(REPLACE(REPLACE(REPLACE(value
                    , '''', '''''')
                    , CHAR(128), CHAR(10)) --Re-add the line break
                    , CHAR(129), ',') --Re-add commas
                    , CHAR(130), '"') --Re-add quotation mark
                + ''','
            FROM STRING_SPLIT((SELECT attdata FROM #csvRows WHERE row = @thisrow), ',')
    
            SELECT @sql = LEFT(@sql, LEN(@sql) - 1)
    
            --PRINT @sql
            
            BEGIN TRY
                EXEC (@sql)
            END TRY
            BEGIN CATCH
                PRINT 'Error on row ' + CONVERT(VARCHAR, @thisrow) + ': ' + ERROR_MESSAGE()
                PRINT @SQL
                SELECT * FROM #csvRows
                RETURN
            END CATCH
            
            SET @thisrow = @thisrow + 1
        END
    
        /*************************************
        Final output
        **************************************/
    
        SELECT * FROM #csvTable
        
        DROP TABLE #csvTable
       
    END
Dehydrate answered 15/10, 2020 at 1:24 Comment(2)
Appreciate this is 2 years old but this is an excellent solution. My upvote alone isnt enough. Had to remove the I character as it stripped out all I's in my collation but it works really well. Excellent job rphello101Fritts
also consider a @jsonTable NVARCHAR(MAX) OUTPUT param, and changing the second to last like to SET @jsonTable = (SELECT * FROM #csvTable FOR JSON PATH) . With the JSON output you might have less temp-table pre-configuration to deal withFootprint
F
2

Some CSV is close enough to JSON to leverage those functions in SQL 2017 and above (i.e. if all strings are quoted). If you know how many columns you have, then this version will work. If you need dynamic number of columns you could attach the dynamic SQL to the results of the openjson instead of the pivot:

Declare @newline char(2) = CHAR(13) + CHAR(10)

Declare @csv nvarchar(4000) = 
'1,2,3,"2020-01-01","fred"
4,5,6,"2020-01-03","Sarah"
7,8,9,"2020-01-05","Jo"'

Select pvt.*
from
(
    Select row_no = [rows].[key]
    ,      col_no = [cols].[key]
    ,       cols.[value]
    from (
        Select [json] = '[[' + replace(@csv,@newline,'],[') + ']]'
    ) step1
    cross apply openjson([json]) [rows]
    cross apply openjson([rows].[value]) cols
) base
pivot (
    max(base.[value])
    for base.col_no in ([0],[1],[2],[3],[4])
) pvt
Flavio answered 15/10, 2020 at 4:36 Comment(3)
CSV usually contains unquoted texts like 1,2,3,some text here,4,5. Quoting is needed only for texts which contain comma or newline (the field and record separators) and are optional in all other cases. JSON cannot have newlines in strings, these have to be quoted with \n and backslash has to be quoted with \ itself so conversion can become messy very quickly. . .Crozier
Yes, good point. This only works for "friendly" csv where all text has been quoted, which is not required in the format standard.Flavio
Using this to pass a multivalued parameter in SSRS and this solution works like a charm to deserialize the complex parameter. Nailed it, James. Thanks heaps :)Deidradeidre
B
1

Lots of ways to do it.

1.
bcp dbo.ImportTest in 'C:\ImportData.txt' -T -SserverName\instanceName

2.
IF OBJECT_ID('dbo.SampleCSVTable') IS NOT NULL
    DROP TABLE dbo.SampleCSVTable
GO
 
CREATE TABLE dbo.SampleCSVTable
(
    PersonID INT,
    FullName VARCHAR(512),
    PreferredName VARCHAR(512),
    SearchName VARCHAR(512),
    IsPermittedToLogon BIT,
    LogonName VARCHAR(512)
)
GO

BULK INSERT dbo.SampleCSVTable
FROM 'C:\Sample CSV File.txt'
WITH
(
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n'
)
GO

3. 
INSERT INTO dbo.ImportTest
SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\ImportData.xls', [Sheet1$])

OR 
INSERT INTO dbo.ImportTest
SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\ImportData.xls', 'SELECT * FROM [Sheet1$]')

4.
INSERT INTO dbo.ImportTest
SELECT * FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\ImportData.xls;Extended Properties=Excel 8.0')...[Sheet1$]

5. 
EXEC sp_addlinkedserver 'ImportData',
   'Jet 4.0', 'Microsoft.Jet.OLEDB.4.0',
   'C:\ImportData.xls',
   NULL,
   'Excel 8.0'
GO

5. 
INSERT INTO dbo.ImportTest
SELECT *
FROM OPENQUERY(ImportData, 'SELECT * FROM [Sheet1$]')

6. POWERSHELL
Function CreateStagingTable($location, $file, $extension, $server, $database)
{
    $full = $location + $file + $extension
    $all = Get-Content $full
    $columns = $all[0]
    $columns = $columns.Replace(" ","")
    $columns = $columns.Replace(",","] VARCHAR(100), [")
    $table = "CREATE TABLE " + $file + "([" + $columns + "] VARCHAR(100))"
    $connection = New-Object System.Data.SqlClient.SqlConnection
    $buildTable = New-Object System.Data.SqlClient.SqlCommand
    $connection.ConnectionString = "Data Source=" + $server + ";Database=" + $database + ";integrated security=true"
    $buildTable.CommandText = $table
    $buildTable.Connection = $connection
    $connection.Open()
    $buildTable.ExecuteNonQuery()
    $connection.Close()
}
CreateStagingTable -location "C:\files\" -file "savingsrate" -extension ".txt" -server "OURSERVER\OURINSTANCE" -database "StagingDatabase"

ALSO you could use:

Function AutoImportCommaFlatFiles($location, $file, $extension, $server, $database)
{
    $full = $location + $file + $extension
    $all = Get-Content $full
    $columns = $all[0]
    $columns = $columns.Replace(" ","")
    $columns = $columns.Replace(",","] VARCHAR(100), [")
    $table = "CREATE TABLE " + $file + "([" + $columns + "] VARCHAR(100))"
    $connection = New-Object System.Data.SqlClient.SqlConnection
    $buildTable = New-Object System.Data.SqlClient.SqlCommand
    $insertData = New-Object System.Data.SqlClient.SqlCommand
    $connection.ConnectionString = "Data Source=" + $server + ";Database=" + $database + ";integrated security=true"
    $buildTable.CommandText = $table
    $buildTable.Connection = $connection
    ## Added to function
    $x = 0
    $insertData.CommandText = "EXECUTE stp_CommaBulkInsert @1,@2"
    $insertData.Parameters.Add("@1", $full)
    $insertData.Parameters.Add("@2", $file)
    $insertData.Connection = $connection
    $connection.Open()
    $buildTable.ExecuteNonQuery()
    $connection.Close()
    ## Added to function
    $x = 1
    if ($x = 1)
    {
        $connection.Open()
        $insertData.ExecuteNonQuery()
        $connection.Close()
    }
}
AutoImportCommaFlatFiles -location "C:\files\" -file "savingsrate" -extension ".txt" -server "OURSERVER\OURINSTANCE" -database "StagingDatabase"



 
  1. SSIS packages - you can google a tutorial on them. You can even use SSMS to import the file and then save it as a job which you can schedule as a SSIS package.
Brothers answered 15/10, 2020 at 3:57 Comment(3)
Thank you for adding this information as well for anyone else looking. I am well aware that most of these were possible (and is probably the exact reason I couldn't find what I was looking for). However, the major restriction to every option listed above is that you need access to the file (which our team does not have). The files themselves are actually converted to binary strings and stored on a table in the DB. table. The DBA did not want to make application changes to allow us to access the files, so we had to use the CSV string stored to a table in the database.Dehydrate
Have you tried this. #43574168 ?Brothers
Yes, we use that on a regular basis when using SMSS. This utility is actually being called by another procedure that is in turn being called by our application. Essentially we created a tool to allow our users without database access to upload a CSV file that will then load the records to an existing table. The application was already designed to read in the CSV file to a binary string and store it on the DB. We utilized this existing functionality, which unfortunately lead to the headache of manually processing a CSV string instead of the file itself.Dehydrate

© 2022 - 2024 — McMap. All rights reserved.