Import CSV file to strongly typed data structure in .Net [closed]
Asked Answered
L

11

108

What's the best way to import a CSV file into a strongly-typed data structure?

Lienlienhard answered 5/8, 2008 at 4:43 Comment(5)
Check out FileHelpers Open Source Library.Alessi
This is a duplicate of #1103995Sliwa
Considering this was created a year earlier than 1103495, I think that question is a duplicate of this one.Lienlienhard
Thanks, Matt. I was just trying to link them together, not indicate which one came first. You'll see I have exactly the same text on the other question pointing at this one.Is there a better way to tie two questions together?Sliwa
Reading a CSV file in .NET? Is an open duplicate of this questionHectare
C
78

Microsoft's TextFieldParser is stable and follows RFC 4180 for CSV files. Don't be put off by the Microsoft.VisualBasic namespace; it's a standard component in the .NET Framework, just add a reference to the global Microsoft.VisualBasic assembly.

If you're compiling for Windows (as opposed to Mono) and don't anticipate having to parse "broken" (non-RFC-compliant) CSV files, then this would be the obvious choice, as it's free, unrestricted, stable, and actively supported, most of which cannot be said for FileHelpers.

See also: How to: Read From Comma-Delimited Text Files in Visual Basic for a VB code example.

Countermove answered 1/4, 2009 at 19:58 Comment(3)
There's actually nothing VB-specific about this class other than its unfortunately-named namespace. I would definitely choose this library if I only needed a "simple" CSV parser, because there's nothing to download, distribute, or worry about in general. To that end I've edited the VB-focused phrasing out of this answer.Cordwainer
@Cordwainer I think your edits are mostly an improvement. Although that RFC is not necessarily authoritative, as many CSV writers do not comply with it e.g. Excel does not always use a comma in "CSV" files. Also didn't my previous answer already say the class could be used from C#?Countermove
The TextFieldParser will work for tab-delimited and other weird Excel-generated cruft too. I realize that your previous answer wasn't claiming that the library was VB-specific, it just came across to me as implying that it was really meant for VB, and not intended to be used from C#, which I don't think is the case - there are some really useful classes in MSVB.Cordwainer
D
22

Use an OleDB connection.

String sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\InputDirectory\\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();
Doggo answered 5/11, 2008 at 14:41 Comment(5)
This requires file system access. As far as i know there is no way to make OLEDB work with in-memory streams :(Ferreby
@UserControl, of course it requires file system access. He asked about importing a CSV fileDoggo
I'm not complaining. In fact I'd prefer OLEDB solution over the rest but I was frustrated so many times when needed to parse CSV in ASP.NET applications so wanted to note it.Ferreby
This appears to require some kind of driver: System.InvalidOperationException: 'The 'Microsoft.Jet.OLEDB.4.0' provider is not registered on the local machine.'. Not really the "out-of-the-box" solution I was hoping for...Terrence
@KyleMcClellan I'm sorry my 14 year old answer didn't meet a criteria of your search today that wasn't part of the original questionಠ_ಠDoggo
A
13

If you're expecting fairly complex scenarios for CSV parsing, don't even think up of rolling our own parser. There are a lot of excellent tools out there, like FileHelpers, or even ones from CodeProject.

The point is this is a fairly common problem and you could bet that a lot of software developers have already thought about and solved this problem.

Ally answered 16/8, 2008 at 23:44 Comment(2)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewPilcher
Thanks @Pilcher I hope you did note that this post was from the beta period of StackOverflow :D That being said nowadays CSV tools are better sourced from Nuget packages -- so I'm not sure if even link answers are immune from 8-year-old evolution cycles of technologyAlly
S
10

I agree with @NotMyself. FileHelpers is well tested and handles all kinds of edge cases that you'll eventually have to deal with if you do it yourself. Take a look at what FileHelpers does and only write your own if you're absolutely sure that either (1) you will never need to handle the edge cases FileHelpers does, or (2) you love writing this kind of stuff and are going to be overjoyed when you have to parse stuff like this:

1,"Bill","Smith","Supervisor", "No Comment"

2 , 'Drake,' , 'O'Malley',"Janitor,

Oops, I'm not quoted and I'm on a new line!

Slusher answered 16/8, 2008 at 23:53 Comment(0)
P
9

Brian gives a nice solution for converting it to a strongly typed collection.

Most of the CSV parsing methods given don't take into account escaping fields or some of the other subtleties of CSV files (like trimming fields). Here is the code I personally use. It's a bit rough around the edges and has pretty much no error reporting.

public static IList<IList<string>> Parse(string content)
{
    IList<IList<string>> records = new List<IList<string>>();

    StringReader stringReader = new StringReader(content);

    bool inQoutedString = false;
    IList<string> record = new List<string>();
    StringBuilder fieldBuilder = new StringBuilder();
    while (stringReader.Peek() != -1)
    {
        char readChar = (char)stringReader.Read();

        if (readChar == '\n' || (readChar == '\r' && stringReader.Peek() == '\n'))
        {
            // If it's a \r\n combo consume the \n part and throw it away.
            if (readChar == '\r')
            {
                stringReader.Read();
            }

            if (inQoutedString)
            {
                if (readChar == '\r')
                {
                    fieldBuilder.Append('\r');
                }
                fieldBuilder.Append('\n');
            }
            else
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();

                records.Add(record);
                record = new List<string>();

                inQoutedString = false;
            }
        }
        else if (fieldBuilder.Length == 0 && !inQoutedString)
        {
            if (char.IsWhiteSpace(readChar))
            {
                // Ignore leading whitespace
            }
            else if (readChar == '"')
            {
                inQoutedString = true;
            }
            else if (readChar == ',')
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();
            }
            else
            {
                fieldBuilder.Append(readChar);
            }
        }
        else if (readChar == ',')
        {
            if (inQoutedString)
            {
                fieldBuilder.Append(',');
            }
            else
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();
            }
        }
        else if (readChar == '"')
        {
            if (inQoutedString)
            {
                if (stringReader.Peek() == '"')
                {
                    stringReader.Read();
                    fieldBuilder.Append('"');
                }
                else
                {
                    inQoutedString = false;
                }
            }
            else
            {
                fieldBuilder.Append(readChar);
            }
        }
        else
        {
            fieldBuilder.Append(readChar);
        }
    }
    record.Add(fieldBuilder.ToString().TrimEnd());
    records.Add(record);

    return records;
}

Note that this doesn't handle the edge case of fields not being deliminated by double quotes, but meerley having a quoted string inside of it. See this post for a bit of a better expanation as well as some links to some proper libraries.

Potpie answered 8/8, 2008 at 16:20 Comment(0)
B
6

I was bored so i modified some stuff i wrote. It try's to encapsulate the parsing in an OO manner whle cutting down on the amount of iterations through the file, it only iterates once at the top foreach.

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.IO;

namespace ConsoleApplication1
{
    class Program
    {

        static void Main(string[] args)
        {

            // usage:

            // note this wont run as getting streams is not Implemented

            // but will get you started

            CSVFileParser fileParser = new CSVFileParser();

            // TO Do:  configure fileparser

            PersonParser personParser = new PersonParser(fileParser);

            List<Person> persons = new List<Person>();
            // if the file is large and there is a good way to limit
            // without having to reparse the whole file you can use a 
            // linq query if you desire
            foreach (Person person in personParser.GetPersons())
            {
                persons.Add(person);
            }

            // now we have a list of Person objects
        }
    }

    public abstract  class CSVParser 
    {

        protected String[] deliniators = { "," };

        protected internal IEnumerable<String[]> GetRecords()
        {

            Stream stream = GetStream();
            StreamReader reader = new StreamReader(stream);

            String[] aRecord;
            while (!reader.EndOfStream)
            {
                  aRecord = reader.ReadLine().Split(deliniators,
                   StringSplitOptions.None);

                yield return aRecord;
            }

        }

        protected abstract Stream GetStream(); 

    }

    public class CSVFileParser : CSVParser
    {
        // to do: add logic to get a stream from a file

        protected override Stream GetStream()
        {
            throw new NotImplementedException();
        } 
    }

    public class CSVWebParser : CSVParser
    {
        // to do: add logic to get a stream from a web request

        protected override Stream GetStream()
        {
            throw new NotImplementedException();
        }
    }

    public class Person
    {
        public String Name { get; set; }
        public String Address { get; set; }
        public DateTime DOB { get; set; }
    }

    public class PersonParser 
    {

        public PersonParser(CSVParser parser)
        {
            this.Parser = parser;
        }

        public CSVParser Parser { get; set; }

        public  IEnumerable<Person> GetPersons()
        {
            foreach (String[] record in this.Parser.GetRecords())
            {
                yield return new Person()
                {
                    Name = record[0],
                    Address = record[1],
                    DOB = DateTime.Parse(record[2]),
                };
            }
        }
    }
}
Brown answered 8/8, 2008 at 9:39 Comment(0)
F
5

There are two articles on CodeProject that provide code for a solution, one that uses StreamReader and one that imports CSV data using the Microsoft Text Driver.

Feebleminded answered 5/8, 2008 at 5:24 Comment(0)
D
2

A good simple way to do it is to open the file, and read each line into an array, linked list, data-structure-of-your-choice. Be careful about handling the first line though.

This may be over your head, but there seems to be a direct way to access them as well using a connection string.

Why not try using Python instead of C# or VB? It has a nice CSV module to import that does all the heavy lifting for you.

Darin answered 5/8, 2008 at 4:49 Comment(1)
Don't jump to python from VB for the sake of a CSV parser. There's one in VB. Although weirdly it seems to have been ignored in the answers to this question. msdn.microsoft.com/en-us/library/…Countermove
C
1

I had to use a CSV parser in .NET for a project this summer and settled on the Microsoft Jet Text Driver. You specify a folder using a connection string, then query a file using a SQL Select statement. You can specify strong types using a schema.ini file. I didn't do this at first, but then I was getting bad results where the type of the data wasn't immediately apparent, such as IP numbers or an entry like "XYQ 3.9 SP1".

One limitation I ran into is that it cannot handle column names above 64 characters; it truncates. This shouldn't be a problem, except I was dealing with very poorly designed input data. It returns an ADO.NET DataSet.

This was the best solution I found. I would be wary of rolling my own CSV parser, since I would probably miss some of the end cases, and I didn't find any other free CSV parsing packages for .NET out there.

EDIT: Also, there can only be one schema.ini file per directory, so I dynamically appended to it to strongly type the needed columns. It will only strongly-type the columns specified, and infer for any unspecified field. I really appreciated this, as I was dealing with importing a fluid 70+ column CSV and didn't want to specify each column, only the misbehaving ones.

Croquet answered 16/8, 2008 at 22:15 Comment(1)
Why not the VB.NET built in CSV parser? msdn.microsoft.com/en-us/library/…Countermove
S
1

I typed in some code. The result in the datagridviewer looked good. It parses a single line of text to an arraylist of objects.

    enum quotestatus
    {
        none,
        firstquote,
        secondquote
    }
    public static System.Collections.ArrayList Parse(string line,string delimiter)
    {        
        System.Collections.ArrayList ar = new System.Collections.ArrayList();
        StringBuilder field = new StringBuilder();
        quotestatus status = quotestatus.none;
        foreach (char ch in line.ToCharArray())
        {                                
            string chOmsch = "char";
            if (ch == Convert.ToChar(delimiter))
            {
                if (status== quotestatus.firstquote)
                {
                    chOmsch = "char";
                }                         
                else
                {
                    chOmsch = "delimiter";                    
                }                    
            }

            if (ch == Convert.ToChar(34))
            {
                chOmsch = "quotes";           
                if (status == quotestatus.firstquote)
                {
                    status = quotestatus.secondquote;
                }
                if (status == quotestatus.none )
                {
                    status = quotestatus.firstquote;
                }
            }

            switch (chOmsch)
            {
                case "char":
                    field.Append(ch);
                    break;
                case "delimiter":                        
                    ar.Add(field.ToString());
                    field.Clear();
                    break;
                case "quotes":
                    if (status==quotestatus.firstquote)
                    {
                        field.Clear();                            
                    }
                    if (status== quotestatus.secondquote)
                    {                                                                           
                            status =quotestatus.none;                                
                    }                    
                    break;
            }
        }
        if (field.Length != 0)            
        {
            ar.Add(field.ToString());                
        }           
        return ar;
    }
Stegman answered 9/9, 2011 at 10:2 Comment(0)
T
0

If you can guarantee that there are no commas in the data, then the simplest way would probably be to use String.split.

For example:

String[] values = myString.Split(',');
myObject.StringField = values[0];
myObject.IntField = Int32.Parse(values[1]);

There may be libraries you could use to help, but that's probably as simple as you can get. Just make sure you can't have commas in the data, otherwise you will need to parse it better.

Truncation answered 5/8, 2008 at 5:2 Comment(3)
this is not an optimal solutionStepdaughter
very bad on memory usage and lots of overhead. Small should be less thank a few kilobytes. Definitely not good for a 10mb csv!Brutal
It depends on the size of your memory and the file.Rattly

© 2022 - 2024 — McMap. All rights reserved.