Reading CSV files using C#
Asked Answered
U

12

219

I'm writing a simple import application and need to read a CSV file, show result in a DataGrid and show corrupted lines of the CSV file in another grid. For example, show the lines that are shorter than 5 values in another grid. I'm trying to do that like this:

StreamReader sr = new StreamReader(FilePath);
importingData = new Account();
string line;
string[] row = new string [5];
while ((line = sr.ReadLine()) != null)
{
    row = line.Split(',');

    importingData.Add(new Transaction
    {
        Date = DateTime.Parse(row[0]),
        Reference = row[1],
        Description = row[2],
        Amount = decimal.Parse(row[3]),
        Category = (Category)Enum.Parse(typeof(Category), row[4])
    });
}

but it's very difficult to operate on arrays in this case. Is there a better way to split the values?

Understudy answered 17/8, 2010 at 22:30 Comment(2)
Thanks for your solution. Consider posting it as an answer post - including it in the question doesn't help its readability.Bateman
see also #1941892Alecalecia
P
424

Don't reinvent the wheel. Take advantage of what's already in .NET BCL.

  • add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL)
  • use the Microsoft.VisualBasic.FileIO.TextFieldParser class to parse CSV file

Here is the sample code:

using (TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv"))
{
    parser.TextFieldType = FieldType.Delimited;
    parser.SetDelimiters(",");
    while (!parser.EndOfData) 
    {
        //Processing row
        string[] fields = parser.ReadFields();
        foreach (string field in fields) 
        {
            //TODO: Process field
        }
    }
}

It works great for me in my C# projects.

Here are some more links/informations:

Phosphene answered 18/8, 2010 at 2:42 Comment(17)
+1: I just broke the lumenworks Fast CSV reader on a 53Mb file. Looks like the line caching failed after 43,000 rows and scrambled the buffer. Tried the VB TextFieldParser and it did the trick. ThanksRosannrosanna
+1 Great answer, as I find many people don't know this class exists. One thing for future viewers to note is that setting parser.TextFieldType = FieldType.Delimited; is not necessary if you call parser.SetDelimiters(",");, as the method sets the TextFieldType property for you.Triplex
Also check this out: dotnetperls.com/textfieldparser. TextFieldParser has worse performance than String.Split and StreamReader. However, there is a big difference between string.Split and TextFieldParser. TextFieldParser handles strange cases like having a comma in a column: you can name a column like "text with quote"", and comma", and you can get the correct value text with quote", and comma instead of wrongly separated values. So you may want to opt for String.Split if you csv is very simple.Valaria
I've used this to parse files over 12 GB and it never breaks.Yokel
Note that you may need to add a reference to Microsoft.VisualBasic to use this. Right-click on your project in Visual Studio, then choose Add > Reference, and check the box for Microsoft.VisualBasic.Shondrashone
For beginners like me who didn't even know what IL is, check out codeproject.com/Articles/3778/….Cowey
awesome, just check if field is not null. To void a possible null reference exceptionAlf
The "Microsoft.VisualBasic.FileIO.TextFieldParser" will will also parse .txt files, it's very useful.Devest
if there is an error, do i have to close anything, or will the above code automatically close anything that needs to be closed?Germangermana
@BKSpurgeon The parser is in a using block so that ensures it will be disposed of. See here for more detail msdn.microsoft.com/en-us/library/…Poona
Um, @Doug, I think that's it's main purpose.Goddord
DOESN'T WORK if any data contains a COMMA. Normally CSV would have Quotes around each value.Tribromoethanol
@Tribromoethanol is correct. Also : if your data contains mixed fields at all, e.g. 1,"two", then it chokes. Fields can either be surrounded by quotes always, or never. You control which in TextFieldParser.HasFieldsEnclosedInQuotes . Otherwise this would be perfect. It's really disappointing real CSV support isn't core in 2017.Kumkumagai
By the way, this "VB" code is the code used in Excel, Access, and SQL Server, so it is very well-tested.Pickax
using (var parser = FileIO.FileSystem.OpenTextFieldParser(@"c:\1.csv", ",")) does the first few statements before the loopMargerymarget
This DOES work if the text contains a comma, just set TextFieldParser.HasFieldsEnclosedInQuotes = true. This line by line approach does however mean that there cannot be newlines embedded, which is valid in CSV depending on who you ask.Mordred
Thank you! This one works in .NET Framework 4.7.2,Salpingotomy
S
62

I recommend CsvHelper from NuGet.



PS: Regarding other more upvoted answers, I'm sorry but adding a reference to Microsoft.VisualBasic is:

  • Ugly
  • Not cross-platform, because it's not available in .NETCore/.NET5 (and Mono never had very good support of Visual Basic, so it may be buggy).
Such answered 10/12, 2015 at 8:29 Comment(8)
It's exactly as cross-platform as C# is.Pickax
wrong, Microsoft.VisualBasic.dll in Linux comes from Mono sources, which has a different implementation than Microsoft's and there are some things that are not implemented, for example: #6644665Such
(Plus, VB language has never had any focus under the companies that have been involved in creating/developing the Mono project, so it's way behind in terms of efforts, compared to the C# ecosystem/tooling.)Such
Having played with both, I'd add that CsvHelper comes with a built-in row to class mapper; it allows for variations in the column headers (if present), and even apparently variations in the column order (though I've not tested the latter myself). All in all it feels much more "high-level" than TextFieldParser.Garzon
yup, Microsoft.VisualBasic namespace is not available on .NET Core 2.1Felder
This should be the new answer, I couldn't get Microsoft.VisualBasic.FileIO.TextFieldParser in my MVC 5 appVoltmeter
I ran into an issue with CSVHelper because the CSV had XML in it. The XML had attributes that CSVHelper stripped out. I had to find an alternative to prevent this from happening.Aalborg
@Brett did you file a bug about it?Such
S
41

My experience is that there are many different csv formats. Specially how they handle escaping of quotes and delimiters within a field.

These are the variants I have ran into:

  • quotes are quoted and doubled (excel) i.e. 15" -> field1,"15""",field3
  • quotes are not changed unless the field is quoted for some other reason. i.e. 15" -> field1,15",fields3
  • quotes are escaped with \. i.e. 15" -> field1,"15\"",field3
  • quotes are not changed at all (this is not always possible to parse correctly)
  • delimiter is quoted (excel). i.e. a,b -> field1,"a,b",field3
  • delimiter is escaped with \. i.e. a,b -> field1,a\,b,field3

I have tried many of the existing csv parsers but there is not a single one that can handle the variants I have ran into. It is also difficult to find out from the documentation which escaping variants the parsers support.

In my projects I now use either the VB TextFieldParser or a custom splitter.

Stella answered 18/8, 2010 at 9:47 Comment(3)
Love this answer for the test cases you provided!Antepast
The main problem is that most implementations don't care about RFC 4180 that describes the CSV format and how delimiters should be escaped.Phew
RFC-4180 is from 2005, which seems old now, but remember: the .Net framework was first out in 2001. Also, RFCs aren't always official standards, and in this case it doesn't carry the same weight as, say, ISO-8601 or RFC-761.Affaire
S
23

Sometimes using libraries are cool when you do not want to reinvent the wheel, but in this case one can do the same job with fewer lines of code and easier to read compared to using libraries. Here is a different approach which I find very easy to use.

  1. In this example, I use StreamReader to read the file
  2. Regex to detect the delimiter from each line(s).
  3. An array to collect the columns from index 0 to n

using (StreamReader reader = new StreamReader(fileName))
    {
        string line; 

        while ((line = reader.ReadLine()) != null)
        {
            //Define pattern
            Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");

            //Separating columns to array
            string[] X = CSVParser.Split(line);

            /* Do something with X */
        }
    }
Symbolism answered 14/12, 2015 at 11:25 Comment(3)
Surely that has problems with data that itself contains new lines?Cytologist
Now CSV datafiles are not know for containing empty lines between data, but if you have a source that does that, in that case i would just simple done a simple regex test to remove whitespaces or lines containing nothing before running the reader. check here for different examples: #7648216Symbolism
Surely a char-based approach is more natural for this sort of problem than a regex. Depending on the presence of quotation marks the behavior is supposed to be different.Tommie
L
8

CSV can get complicated real fast.

Use something robust and well-tested:
FileHelpers: www.filehelpers.net

The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.
Lease answered 18/8, 2010 at 7:35 Comment(1)
I think FileHelper is trying to do to much in one go. Parsing files is a 2 step process where you first split lines into fields and then parse the fields into data. Combining the functions makes it difficult to handle things like master-detail and line filtering.Stella
L
8

Another one to this list, Cinchoo ETL - an open source library to read and write CSV files

For a sample CSV file below

Id, Name
1, Tom
2, Mark

Quickly you can load them using library as below

using (var reader = new ChoCSVReader("test.csv").WithFirstLineHeader())
{
   foreach (dynamic item in reader)
   {
      Console.WriteLine(item.Id);
      Console.WriteLine(item.Name);
   }
}

If you have POCO class matching the CSV file

public class Employee
{
   public int Id { get; set; }
   public string Name { get; set; }
}

You can use it to load the CSV file as below

using (var reader = new ChoCSVReader<Employee>("test.csv").WithFirstLineHeader())
{
   foreach (var item in reader)
   {
      Console.WriteLine(item.Id);
      Console.WriteLine(item.Name);
   }
}

Please check out articles at CodeProject on how to use it.

Disclaimer: I'm the author of this library

Levison answered 12/11, 2017 at 20:13 Comment(2)
Hi, can you load csv to Sql table - I dont know the header in the CSV table before hand. Just mirror whats in csv to Sql tableRatable
Yes, you can. please see this link #20759802Levison
F
4

I use this here:

http://www.codeproject.com/KB/database/GenericParser.aspx

Last time I was looking for something like this I found it as an answer to this question.

Fibrilliform answered 18/8, 2010 at 9:51 Comment(0)
C
3

Here's a solution I coded up today for a situation where I needed to parse a CSV without relying on external libraries. I haven't tested performance for large files since it wasn't relevant to my particular use case but I'd expect it to perform reasonably well for most situations.

        static List<List<string>> ParseCsv(string csv) {
            var parsedCsv = new List<List<string>>();
            var row = new List<string>();
            string field = "";
            bool inQuotedField = false;

            for (int i = 0; i < csv.Length; i++) {
                char current = csv[i];
                char next = i == csv.Length - 1 ? ' ' : csv[i + 1];

                // if current character is not a quote or comma or carriage return or newline (or not a quote and currently in an a quoted field), just add the character to the current field text
                if ((current != '"' && current != ',' && current != '\r' && current != '\n') || (current != '"' && inQuotedField)) {
                    field += current;
                } else if (current == ' ' || current == '\t') {
                    continue; // ignore whitespace outside a quoted field
                } else if (current == '"') {
                    if (inQuotedField && next == '"') { // quote is escaping a quote within a quoted field
                        i++; // skip escaping quote
                        field += current;
                    } else if (inQuotedField) { // quote signifies the end of a quoted field
                        row.Add(field);
                        if (next == ',') {
                            i++; // skip the comma separator since we've already found the end of the field
                        }
                        field = "";
                        inQuotedField = false;
                    } else { // quote signifies the beginning of a quoted field
                        inQuotedField = true; 
                    }
                } else if (current == ',') { //
                    row.Add(field);
                    field = "";
                } else if (current == '\n') {
                    row.Add(field);
                    parsedCsv.Add(new List<string>(row));
                    field = "";
                    row.Clear();
                }
            }

            return parsedCsv;
        }

Columbuscolumbyne answered 1/8, 2020 at 6:1 Comment(2)
Great answer. When using StringBuilder instead of string concatenation, it's a bit more performant for long CSV filesDormeuse
One small bug I found. this method does not add the last row to the "parsedCsv" collection if the string does not end with a newlineDormeuse
G
2
private static DataTable ConvertCSVtoDataTable(string strFilePath)
        {
            DataTable dt = new DataTable();
            using (StreamReader sr = new StreamReader(strFilePath))
            {
                string[] headers = sr.ReadLine().Split(',');
                foreach (string header in headers)
                {
                    dt.Columns.Add(header);
                }
                while (!sr.EndOfStream)
                {
                    string[] rows = sr.ReadLine().Split(',');
                    DataRow dr = dt.NewRow();
                    for (int i = 0; i < headers.Length; i++)
                    {
                        dr[i] = rows[i];
                    }
                    dt.Rows.Add(dr);
                }

            }

            return dt;
        }

        private static void WriteToDb(DataTable dt)
        {
            string connectionString =
                "Data Source=localhost;" +
                "Initial Catalog=Northwind;" +
                "Integrated Security=SSPI;";

            using (SqlConnection con = new SqlConnection(connectionString))
                {
                    using (SqlCommand cmd = new SqlCommand("spInsertTest", con))
                    {
                        cmd.CommandType = CommandType.StoredProcedure;

                        cmd.Parameters.Add("@policyID", SqlDbType.Int).Value = 12;
                        cmd.Parameters.Add("@statecode", SqlDbType.VarChar).Value = "blagh2";
                        cmd.Parameters.Add("@county", SqlDbType.VarChar).Value = "blagh3";

                        con.Open();
                        cmd.ExecuteNonQuery();
                    }
                }

         }
Goiter answered 20/7, 2017 at 14:38 Comment(2)
from where did you copy this solution?Fabled
what if your CSV uses qualifiers?Windham
F
1

To complete the previous answers, one may need a collection of objects from his CSV File, either parsed by the TextFieldParser or the string.Split method, and then each line converted to an object via Reflection. You obviously first need to define a class that matches the lines of the CSV file.

I used the simple CSV Serializer from Michael Kropat found here: Generic class to CSV (all properties) and reused his methods to get the fields and properties of the wished class.

I deserialize my CSV file with the following method:

public static IEnumerable<T> ReadCsvFileTextFieldParser<T>(string fileFullPath, string delimiter = ";") where T : new()
{
    if (!File.Exists(fileFullPath))
    {
        return null;
    }

    var list = new List<T>();
    var csvFields = GetAllFieldOfClass<T>();
    var fieldDict = new Dictionary<int, MemberInfo>();

    using (TextFieldParser parser = new TextFieldParser(fileFullPath))
    {
        parser.SetDelimiters(delimiter);

        bool headerParsed = false;

        while (!parser.EndOfData)
        {
            //Processing row
            string[] rowFields = parser.ReadFields();
            if (!headerParsed)
            {
                for (int i = 0; i < rowFields.Length; i++)
                {
                    // First row shall be the header!
                    var csvField = csvFields.Where(f => f.Name == rowFields[i]).FirstOrDefault();
                    if (csvField != null)
                    {
                        fieldDict.Add(i, csvField);
                    }
                }
                headerParsed = true;
            }
            else
            {
                T newObj = new T();
                for (int i = 0; i < rowFields.Length; i++)
                {
                    var csvFied = fieldDict[i];
                    var record = rowFields[i];

                    if (csvFied is FieldInfo)
                    {
                        ((FieldInfo)csvFied).SetValue(newObj, record);
                    }
                    else if (csvFied is PropertyInfo)
                    {
                        var pi = (PropertyInfo)csvFied;
                        pi.SetValue(newObj, Convert.ChangeType(record, pi.PropertyType), null);
                    }
                    else
                    {
                        throw new Exception("Unhandled case.");
                    }
                }
                if (newObj != null)
                {
                    list.Add(newObj);
                }
            }
        }
    }
    return list;
}

public static IEnumerable<MemberInfo> GetAllFieldOfClass<T>()
{
    return
        from mi in typeof(T).GetMembers(BindingFlags.Public | BindingFlags.Instance | BindingFlags.Static)
        where new[] { MemberTypes.Field, MemberTypes.Property }.Contains(mi.MemberType)
        let orderAttr = (ColumnOrderAttribute)Attribute.GetCustomAttribute(mi, typeof(ColumnOrderAttribute))
        orderby orderAttr == null ? int.MaxValue : orderAttr.Order, mi.Name
        select mi;            
}
Fortney answered 6/9, 2016 at 8:42 Comment(0)
M
1

First of all need to understand what is CSV and how to write it.

  1. Every next string ( /r/n ) is next "table" row.
  2. "Table" cells is separated by some delimiter symbol. Most often used symbols is \t or ,
  3. Every cell possibly can contain this delimiter symbol (cell must to start with quotes symbol and ends with this symbol in this case)
  4. Every cell possibly can contains /r/n sybols (cell must to start with quotes symbol and ends with this symbol in this case)

The easiest way for C#/Visual Basic to work with CSV files is to use standard Microsoft.VisualBasic library. You just need to add needed reference, and the following string to your class:

using Microsoft.VisualBasic.FileIO;

Yes, you can use it in C#, don't worry. This library can read relatively big files and supports all of needed rules, so you will be able to work with all of CSV files.

Some time ago I had wrote simple class for CSV read/write based on this library. Using this simple class you will be able to work with CSV like with 2 dimensions array. You can find my class by the following link: https://github.com/ukushu/DataExporter

Simple example of using:

Csv csv = new Csv("\t");//delimiter symbol

csv.FileOpen("c:\\file1.csv");

var row1Cell6Value = csv.Rows[0][5];

csv.AddRow("asdf","asdffffff","5")

csv.FileSave("c:\\file2.csv");
Moulden answered 1/4, 2017 at 4:13 Comment(0)
A
1

I'd highly suggest using CsvHelper.

Here's a quick example:

public class csvExampleClass
{
    public string Id { get; set; }
    public string Firstname { get; set; }
    public string Lastname { get; set; }
}

var items = DeserializeCsvFile<List<csvExampleClass>>( csvText );

public static List<T> DeserializeCsvFile<T>(string text)
{
    CsvReader csv = new CsvReader( new StringReader( text ) );
    csv.Configuration.Delimiter = ",";
    csv.Configuration.HeaderValidated = null;
    csv.Configuration.MissingFieldFound = null;
    return (List<T>)csv.GetRecords<T>();
}

Full documentation can be found at: https://joshclose.github.io/CsvHelper

Auger answered 12/6, 2020 at 0:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.