Load csv into oleDB and force all inferred datatypes to string
Asked Answered
F

2

6

Im trying to load a csv file into a datatable using oledb.

This is no problem but unfortunately one of the fields which looks numeric has a string value in about 3% of the fields and so is not being populated.

because im converting the csv into xml i really don't care about inferring datatypes and simply need the data in a string as i can cast it later in a Linq2XMl phase.

I am hoping to be able to do this in the connection string.

I don't want to just copy the table, set it up with new columns with the datatype I want and then write the data into it because that would involve loading the csv file twice.

any ideas?

my current connection string is

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + thefile.DirectoryName + ";Extended Properties='text;HDR=Yes;FMT=Delimited'";

Foramen answered 6/11, 2009 at 15:57 Comment(0)
F
7

Did some researchand the answer is use a schema.ini but generate it on the fly for your dataset.

http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx

contains the info required. to construct the schema:

   public static void ConstructSchema(FileInfo theFile)
    {
        StringBuilder schema = new StringBuilder(); 
        DataTable data = LoadCSV(theFile); 
        schema.AppendLine("[" + theFile.Name + "]");
        schema.AppendLine("ColNameHeader=True"); 
        for (int i = 0; i < data.Columns.Count; i++)
        {
            schema.AppendLine("col" + (i + 1).ToString() + "=" + data.Columns[i].ColumnName + " Text");
        }   
        string schemaFileName = theFile.DirectoryName + @"\Schema.ini";
        TextWriter tw = new StreamWriter(schemaFileName);   
        tw.WriteLine(schema.ToString());
        tw.Close();  
    }

to load the csv as datatable

public static DataTable LoadCSV(FileInfo theFile)
    {   
        string sqlString = "Select * FROM [" + theFile.Name + "];";
        string conStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
            + theFile.DirectoryName + ";" + "Extended Properties='text;HDR=YES;'";
        DataTable theCSV = new DataTable();

        using (OleDbConnection conn = new OleDbConnection(conStr))
        {
            using (OleDbCommand comm = new OleDbCommand(sqlString, conn))
            {
                using (OleDbDataAdapter adapter = new OleDbDataAdapter(comm))
                {
                    adapter.Fill(theCSV);
                }
            }
        }
        return theCSV;
    }

to convert to xml

 public static XElement GetXMLFromCSV(FileInfo theFile, string rootNodeName, string itemName)
    {
        XElement retVal;
        DataTable data;
        data = CrateCsvAndSchema(theFile); 
        DataSet ds = new DataSet(rootNodeName);
        data.TableName = itemName;
        ds.Tables.Add(data); 
        retVal = XElement.Parse(ds.GetXml());  
        return retVal;
    }
Foramen answered 8/11, 2009 at 12:37 Comment(3)
Sorry for dragging up an old question but surely this loads the csv before there is a schema.ini file present and as such when you call LoadCSV you're loading your datatable with the incorrect data again? Ultimately you're going to end up having to call loadcsv again to actually use the generated schema.ini no?Cirrate
i load the csv to create the schema file then load it again to get the data.read the link to the microsoft article on how schema files work. The code here is just snippets of the key parts really.Foramen
np glad this is still useful to people! I haven't had to do this since - everyone uses json now ;pForamen
R
0

For reading a CSV into a DataTable I recommend this CSV parser.

It's really easy to use. Here's how you can use it to fill a DataTable with data from a comma delimited, quote qualified CSV:

    DataTable dt = null;
    using (GenericParserAdapter gp = new GenericParser.GenericParserAdapter(yourCsvFullname)) {
        dt = gp.GetDataTable();
    }

There are a number of options you can set: the delimiter, the text qualifer character(s)whether the first line in the CSV show column headers (if true, each DataColumn in your DataTable will be named accordingly), etc.

There are a number of fast, flexible CSV parsers out there but for simple requirements this one can't be beat.

Relly answered 6/11, 2009 at 16:46 Comment(2)
thankyou, ill will check that library out. I really want to solve this without using external dlls if at all possible. But thanks.Foramen
You don't need external DLLs for this. The article includes the class which you can include in your project.Relly

© 2022 - 2024 — McMap. All rights reserved.