Import CSV File Error : Column Value containing column delimiter
Asked Answered
E

3

1

I am trying to Import a Csv File into SQL SERVER using SSIS

Here's an example how data looks like

Student_Name,Student_DOB,Student_ID,Student_Notes,Student_Gender,Student_Mother_Name
Joseph Jade,2005-01-01,1,Good listener,Male,Amy
Amy Jade,2006-01-01,1,Good in science,Female,Amy
....

Csv Columns are not containing text qualifiers (quotations)

I Created a simple package using SSIS to import it into SQL but sometime the data in SQL looked like below

Student_Name    Student_DOB Student_ID  Student_Notes   Student_Gender  Student_Mother_Name
Ali Jade    2004-01-01  1   Good listener   Bad in science  Male,Lisa

The Reason was that somtimes [Student_Notes] column contains Comma (,) that is used as column delimiter so the Row are not imported Correctly

Any suggestions

Edenedens answered 3/11, 2016 at 22:21 Comment(9)
Change the column delimiter, it is bad practice to have the column delimiter inside column dataYandell
how to do it... i am not the ont who created these csv filesEdenedens
Maybe you can create a program in some programming language so you read these file line by line and process data as you wantYandell
Can you automatically determine which is the offending comma? If so you can write a script to clean it up. It seems like one way to fix this would be to count the number of commas per row and if there are too many, remove them starting from the right. Can you manually apply this algorithm to your data and see if it is correct. If so I can help you write a script that cleans it up beforehand.Cramer
i can add a script component to my package and do some code... can u help me writing itEdenedens
First you need to confirm that that is the algorithm you want to use. No use writing code if it doesn't work. Open your CSV file in notepad and comfirm that removing commas working back from the right hand side will fix it. i.e. do commas appear in any other fields?Cramer
Commas does not appear in any other fields.Edenedens
I really dont understand the algorithm u r talking about. Can you give me some code plz?Edenedens
I will give you some code but a word of advice: never say 'code plz' it's kind of annoying.Cramer
C
1

A word of warning: I'm not a regular C# coder.

But anyway this code does the following:

It opens a file called C:\Input.TXT

It searches each line. If the line has more than 5 commas, it takes all the extra commas out of the third last field (notes)

It writes the result to C:\Output.TXT - that's the one you need to actually import

There are many improvements that could be made:

  • Get file paths from connection managers
  • Error handling
  • An experienced C# programmer could probably do this in hlaf the code

Keep in mind your package will need write access to the appropriate folder

public void Main()
{
    // Search the file and remove extra commas from the third last field
    // Extended from code at
    // https://mcmap.net/q/137091/-open-a-file-and-replace-strings-in-c
    // Nick McDermaid        

    string sInputLine;
    string sOutputLine;
    string sDelimiter = ",";
    String[] sData;
    int iIndex;

    // open the file for read
    using (System.IO.FileStream inputStream = File.OpenRead("C:\\Input.txt"))
    {
        using (StreamReader inputReader = new StreamReader(inputStream))
        {
            // open the output file
            using (StreamWriter outputWriter = File.AppendText("C:\\Output.txt"))
            {
                // Read each line
                while (null != (sInputLine = inputReader.ReadLine()))
                {
                    // Grab each field out
                    sData = sInputLine.Split(sDelimiter[0]);
                    if (sData.Length <= 6)
                    {
                        // 6 or less fields - just echo it out
                        sOutputLine = sInputLine;
                    }
                    else
                    {
                        // line has more than 6 pieces 
                        // We assume all of the extra commas are in the notes field                                

                        // Put the first three fields together
                        sOutputLine =
                            sData[0] + sDelimiter +
                            sData[1] + sDelimiter +
                            sData[2] + sDelimiter;

                        // Put the middle notes fields together, excluding the delimiter
                        for (iIndex=3; iIndex <= sData.Length - 3; iIndex++)
                        {
                            sOutputLine = sOutputLine + sData[iIndex] + " ";
                        }

                        // Tack on the last two fields
                        sOutputLine = sOutputLine +
                            sDelimiter + sData[sData.Length - 2] +
                            sDelimiter + sData[sData.Length - 1];


                    }

                    // We've evaulted the correct line now write it out
                    outputWriter.WriteLine(sOutputLine);
                }
            }
        }
    }


    Dts.TaskResult = (int)Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success;
}
Cramer answered 4/11, 2016 at 12:6 Comment(1)
Thanks for helpingEdenedens
O
1

In The Flat File Connection Manager. Make the File as only one column (DT_STR 8000)

Just add a script Component in the dataflowtask and Add Output Columns (Same as Example Shown)

in The script component split each row using the following Code:

\\Student_Name,Student_DOB,Student_ID,Student_Notes,Student_Gender,Student_Mother_Name

Dim strCells() as string = Row.Column0.Split(CChar(","))

Row.StudentName = strCells(0)
Row.StudentDOB = strCells(1)
Row.StudentID = strCells(2)
Row.StudentMother = strCells(strCells.Length - 1)
Row.StudentGender = strCells(strCells.Length - 2)

Dim strNotes as String = String.Empty

For int I = 3 To strCells.Length - 3

strNotes &= strCells(I)

Next

Row.StudentNotes = strNotes

it worked fine for me

Olden answered 5/11, 2016 at 17:22 Comment(0)
S
0

If import CSV file is not a routine

  1. Import CSV file in Excel
  2. Search error rows with Excel rows filter and rewrite them
  3. Save Excel file in TXT Tab delimited
  4. Import TXT file with SSIS Else make a script that search comma in the Student Notes column range
Sisterhood answered 4/11, 2016 at 5:26 Comment(2)
Please just explain the second step. When working with hundreds of rows !!Edenedens
Order the last column desc to identify wrong rows. Then you can make some data cleaning with Excel only on wrong rows (append cells, etc...). Obviously this workaround works only if there is a single comma in Student Note cell. Another solution is forbidding comma in web application text fieldSisterhood

© 2022 - 2024 — McMap. All rights reserved.