OleDB & mixed Excel datatypes : missing data
Asked Answered
M

6

55

I have an Excel worksheet I want to read into a datatable - all is well except for one particular column in my Excel sheet. The column, 'ProductID', is a mix of values like ########## and n#########.

I tried to let OleDB handle everything by itself automatically by reading it into a dataset/datatable, but any values in 'ProductID' like n###### are missing, ignored, and left blank. I tried manually creating my DataTable by looping through each row with a datareader, but with the exact same results.

Here's the code :

// add the column names manually to the datatable as column_1, column_2, ...
for (colnum = 0; colnum < num_columns; colnum ++){
  ds.Tables["products"].Columns.Add("column_" +colnum , System.Type.GetType("System.String")); 
}
while(myDataReader.Read()){
  // loop through each excel row adding a new respective datarow to my datatable 
  DataRow a_row = ds.Tables["products"].NewRow();
  for (col = 0; col < num_columns; col ++){
    try {  a_row[col] = rdr.GetString(col);  }
    catch {  a_row[col] = rdr.GetValue(col).ToString(); }
  }
  ds.Tables["products"].Rows.Add(a_row);
}

I don't understand why it won't let me read in values like n######. How can I do this?

Month answered 12/7, 2010 at 21:7 Comment(0)
W
106

Using .Net 4.0 and reading Excel files, I had a similar issue with OleDbDataAdapter - i.e. reading in a mixed data type on a "PartID" column in MS Excel, where a PartID value can be numeric (e.g. 561) or text (e.g. HL4354), even though the excel column was formatted as "Text".

From what I can tell, ADO.NET chooses the data type based on the majority of the values in the column (with a tie going to numeric data type). i.e. if most of the PartID's in the sample set are numeric, ADO.NET will declare the column to be numeric. Therefore ADO.Net will attempt to cast each cell to a number, which will fail for the "text" PartID values and not import those "text" PartID's.

My solution was to set the OleDbConnection connectionstring to use Extended Properties=IMEX=1;HDR=NO to indicate this is an Import and that the table(s) will not include headers. The excel file has a header row, so in this case tell ado.net not to use it. Then later in the code, remove that header row from the dataset and voilà you have mixed data type for that column.

string sql = "SELECT F1, F2, F3, F4, F5 FROM [sheet1$] WHERE F1 IS NOT NULL";

OleDbConnection connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + PrmPathExcelFile + @";Extended Properties=""Excel 8.0;IMEX=1;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text""");

OleDbCommand cmd = new OleDbCommand(sql, connection);
OleDbDataAdapter da = new OleDbDataAdapter(cmd);

DataSet ds = new DataSet();
ds.Tables.Add("xlsImport", "Excel");
da.Fill(ds, "xlsImport");

// Remove the first row (header row)
DataRow rowDel = ds.Tables["xlsImport"].Rows[0];
ds.Tables["xlsImport"].Rows.Remove(rowDel);

ds.Tables["xlsImport"].Columns[0].ColumnName = "LocationID";
ds.Tables["xlsImport"].Columns[1].ColumnName = "PartID";
ds.Tables["xlsImport"].Columns[2].ColumnName = "Qty";
ds.Tables["xlsImport"].Columns[3].ColumnName = "UserNotes";
ds.Tables["xlsImport"].Columns[4].ColumnName = "UserID";

connection.Close(); 

// now you can use LINQ to search the fields

    var data = ds.Tables["xlsImport"].AsEnumerable();
    var query = data.Where(x => x.Field<string>("LocationID") == "COOKCOUNTY").Select(x =>
                new Contact
                {
                    LocationID= x.Field<string>("LocationID"),
                    PartID = x.Field<string>("PartID"),
                    Quantity = x.Field<string>("Qty"),
                    Notes = x.Field<string>("UserNotes"),
                    UserID = x.Field<string>("UserID")
                });
Wafture answered 19/4, 2011 at 19:20 Comment(11)
man "Excel 8.0;IMEX=1;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text" saved the day for somebodySnailpaced
This article shows how to set the parameters in the registry: blog.lab49.com/archives/196 I didn't know you could set TypeGuessRows = 0; ImportMixedTypes = Text in the connection string.Equitant
It appears that the ImportMixedTypes can't be set in the connection string, with the ACE OleDb driver, which you need for Excel 12.0Tracheid
Actually "Extended Properties=IMEX=1;" fix the issues. There's no need to include HDR=NO. But thanks anywayEmbry
@Embry - your solution worked for me too! On a sheet that office warned that "the number in this cell is formatted as text or preceded by an apostrophe", using Extended Properties=\"Excel 8.0;IMEX=1;\"; solved itFrictional
To avoid the installable ISAM error you need to add quotes around the Extended Properties content. Extended Properties="Excel8.0;IMEX=1"Bloodstream
It turns out that if you change your SQL like this "SELECT F1, F2, F3, F4, F5 FROM [sheet1$] WHERE F1 IS NOT NULL And F1<>'PartID'" It still works and you don't need to exclude the PartID-row later on in code. This can be of great advantage if you are doing more complex queries which for example directly update a database from the Excel rows.Kilo
After trying everything else, some of which I was already aware of, it was setting HDR=No and skipping that row that finally did the trick for me. The other steps let the data load without error, but it was silently generating empty strings for non-numeric values until that change. Thanks!Earleneearley
IMEX=1 was all that was needed, was driving me crazy why it was working for some imports with mixed types and not for others. Worse that instead of erroring (which it should do) it just inserted blanks in where the mixed type wasnt the one it decided to use.Agglomerate
Watch out for "IMEX=1" only solution, I thought that it works, but when I imported over 1k Excel files for some of them there were still empty cells in some files. All excel files were copy of same "template", all cells in each file were "text" type (but contained only numbers sometime) so I have no idea why it didn't work for some files. However adding HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text part solved problem.Remonaremonetize
When I execute update query by setting IMEX=1 is throwing error like, "Execute Non Query for Failed Records: Operation must use an updateable query."Interrelate
C
10

Several forums I found claim that by adding IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text to the Extended Properties in the connection string would fix the problem, but this was not the case. I finally solved this problem by adding "HDR=NO" to the Extended Properties in the connection string (as Brian Wells shows above) so that I could import mixed types.

I then added some generic code to name the columns after the first row of data, then remove the first row.

    public static DataTable ImportMyDataTableFromExcel(string filePath)
    {
        DataTable dt = new DataTable();

        string fullPath = Path.GetFullPath(filePath);

        string connString =
           "Provider=Microsoft.Jet.OLEDB.4.0;" +
           "Data Source=\"" + fullPath + "\";" +
           "Extended Properties=\"Excel 8.0;HDR=No;IMEX=1;\"";

        string sql = @"SELECT * FROM [sheet1$]";

        using (OleDbDataAdapter dataAdapter = new OleDbDataAdapter(sql, connString))
        {
            dataAdapter.Fill(dt);
        }

        dt = BuildHeadersFromFirstRowThenRemoveFirstRow(dt);

        return dt;
    }

    private static DataTable BuildHeadersFromFirstRowThenRemoveFirstRow(DataTable dt)
    {
        DataRow firstRow = dt.Rows[0];

        for (int i = 0; i < dt.Columns.Count; i++)
        {
            if(!string.IsNullOrWhiteSpace(firstRow[i].ToString())) // handle empty cell
              dt.Columns[i].ColumnName = firstRow[i].ToString().Trim();
        }

        dt.Rows.RemoveAt(0);

        return dt;
    }
Capricorn answered 21/8, 2012 at 21:5 Comment(4)
All of this difficulty with accessing data in Excel spreadsheets raises an obvious question: Why doesn't Microsoft provide a more modern (and performant) mechanism to get at the data than using the crappy Jet driver? There are any number of 3rd party tools for this that do a much better job than anything MS provides.Dexter
@Dexter do you know of something that can read an XLS file. I tried a few things but they can only read the newer XLSX grrrLaetitia
@Laetitia The mechanism discussed in this thread, as clunky as it is, provides a reliable (but extremely non-performant) means to extract data from an XLS file. If you need something that's faster, you are either going to have to write your own parser or buy a 3rd party component (such as Spreadsheet Gear)Dexter
This technique works well and is a much more generic solution than other answers - except it throws an exception if the file has duplicate data in 2 cells of row 1. I proposed a revision but it was rejected, so use care if copying this solution!Goulash
W
6

No problem sh4, glad it helps w/ the mixed type issue.

The DateTime column is whole other animal that I recall caused me grief in the past... we have one excel file we process that the OleDbDataAdapter will sometimes convert dates to a double data type (apparently Excel stores dates as doubles, which encode the number of days elapsed since January 0, 1900 ).

The workaround was to use:

OleDbConnection mobjExcelConn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + txtExcelFile.Text + @";Extended Properties=""Excel 8.0;IMEX=1;HDR=Yes;""");

OleDbDataAdapter mobjExcelDataAdapter = new OleDbDataAdapter("Select * from [" + txtSheet.Text + "$] where [Supplier ID] <> '' ", mobjExcelConn);


DateTime dtShipStatus = DateTime.MinValue;
shipStatusOrig = excelRow["Est Ship Date"].ToString(); // excelRow is DataRow in the DataSet via the OleDbDataAdapter             

if (shipStatusOrig != string.Empty)
{
    // Date may be read in via oledb adapter as a double
    if (IsNumeric(shipStatusOrig))
    {
        double d = Convert.ToDouble(shipStatusOrig);
        dtShipStatus = DateTime.FromOADate(d);

        if (DateTime.TryParse(dtShipStatus.ToString(), out dtShipStatus))
        {
            validDate = true;
            Debug.WriteLine("{0} converted: ", dtShipStatus.ToString("s"));
        }
    }
    else
    {
        if (ValidateShipDate(shipStatusOrig))
        {
            dtShipStatus = DateTime.Parse(shipStatusOrig);
            validDate = true;
            Debug.WriteLine("{0} converted: ", dtShipStatus.ToString("s"));
        }
        else
        {
            validDate = false;
            MessageBox.Show("Invalid date format in the Excel spreadsheet.\nLine # " + progressBar1.Value + ", the 'Ship Status' value '" + shipStatusOrig + "' is invalid.\nDate should be in a valid date time format.\ne.g. M/DD/YY, M.D.Y, YYYY-MM-DD, etc.", "Invaid Ship Status Date");
        }
    }
...
}
        public static Boolean IsNumeric (Object Expression)
        {
            if(Expression == null || Expression is DateTime)
                return false;

            if(Expression is Int16 || Expression is Int32 || Expression is Int64 || Expression is Decimal || Expression is Single || Expression is Double || Expression is Boolean)
                return true;

            try
            {
                if(Expression is string)
                    Double.Parse(Expression as string);
                else
                   Double.Parse(Expression.ToString());
                return true;
            } catch {} // just dismiss errors but return false

            return false;
        }

        public bool ValidateShipDate(string shipStatus)
        {
            DateTime startDate;
            try
            {
                startDate = DateTime.Parse(shipStatus);
                return true;
            }
            catch
            {
                return false;
            }
        }
Wafture answered 29/4, 2011 at 14:48 Comment(0)
M
5

There are two ways to handle mixed datatypes & excel.

Method 1

  • Open up your excel spreadsheet and set the column format to the desired format manually. In this case, 'Text'.

Method 2

  • There is a "hack" that consists of appending "IMEX=1" to your connection string like so:

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=myfile.xls;Extended Properties=Excel 8.0;IMEX=1

  • This will attempt to handle mixed Excel formats according to how it is set in your registry. This can be set locally by you, but for a server, this is probably not an option.

Month answered 12/7, 2010 at 21:22 Comment(4)
It gives error : System.Data.OleDb.OleDbException was unhandled by user code Message="Could not find installable ISAM." Source="Microsoft JET Database Engine" ErrorCode=-2147467259Duque
I know this thread's now old but I have the same issue as Pratik... I can't specify IMEX=1 because of "Could not find installable ISAM" error :(Osullivan
To advoid the installable ISAM error you need to add quotes around the Extended Properties content. Extended Properties="Excel8.0;IMEX=1"Alkmaar
I can't believe I had to read this far to find the most obvious solution. I had thought this might work, but for some reason never bothered to try it. This is the best solution: Open up your excel spreadsheet and set the column format to the desired format manually. In this case, 'Text'. I may post some code in a bit that automates this process in VBA.Nice
C
1

@Brian Wells Thank you, your suggestion did the trick, but not totally... Worked for the mixed field int-string, but the datetime columns went with strange characters after that, so i applied a "hack" over the "hack".

1.- Do a System.Io.File.Copy and create a copy of the excel file.

2.- Modify the Datetime column headers programatically at runtime to something in datetime format, i.e. "01/01/0001".

3.- Save the excel, and then apply your trick doing the query with HDR=NO to the modified file.

Tricky, yes, but worked, and reasonabily fast, if anyone has any alternative to this, i will be glad to hear.

Greetings.

P.D. Excuse my english, it isn't my native language.

Calesta answered 28/4, 2011 at 14:33 Comment(1)
No problem glad it helps! I'll post a DateTime resolution I've used previously in a different Answer in this Post (not enough chars given here).Wafture
E
1

Shortcut --> if you have a mixed type column in Excel: Sort your column Z to A

I pretty much went through all of the answers here and some of them worked for me and some did not, however none was desirable for me because somehow ADO did not pick the data in a mixed type column that I had in my Excel file. I had to set HDR=NO to make ADO read my spreadsheet column that is a mix of text and numbers and that way I lose the ability of using column headers in my SQL statements which is not good. If the order of columns change in the Excel file, the SQL statement will result in error or wrong output.

In a mixed data type column the key is the first 8 rows. ADO determines the data type for the column based on the first 8 rows So if you still want to modify your connection string with the extended parameters, simply sort your column Z to A on your Excel file before reading the data by ADO so this way the rows on top are the text ones and then your column will be picked as text.

If your initial rows are numbers (regardless if your column is set to format TEXT in Excel) ADO will determine that columns as a numeric type, so once it read the text rows below, it cannot cast those into number. On the opposite case, if the column is determined text, if any row if number, it can be cast as text.

Endogamy answered 5/6, 2018 at 17:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.