Accessing Excel Spreadsheet with C# occasionally returns blank value for some cells
Asked Answered
A

10

11

I need to access an excel spreadsheet and insert the data from the spreadsheet into a SQL Database. However the Primary Keys are mixed, most are numeric and some are alpha-numeric.

The problem I have is that when the numeric and alpha-numeric Keys are in the same spreadsheet the alpha-numeric cells return blank values, whereas all the other cells return their data without problems.

I am using the OleDb method to access the Excel file. After retrieving the data with a Command string I put the data into a DataAdapter and then I fill a DataSet. I iterate through all the rows (dr) in the first DataTable in the DataSet.

I reference the columns by using, dr["..."].ToString()

If I debug the project in Visual Studio 2008 and I view the "extended properties", by holding my mouse over the "dr" I can view the values of the DataRow, but the Primary Key that should be alpha-numeric is {}. The other values are enclosed in quotes, but the blank value has braces.

Is this a C# problem or an Excel problem?

Has anyone ever encountered this problem before, or maybe found a workaround/fix?

Thanks in advance.

Alinealinna answered 18/11, 2008 at 13:35 Comment(1)
Possible duplicate: https://mcmap.net/q/333111/-oledb-amp-mixed-excel-datatypes-missing-data/2291Sisile
D
27

Solution:

Connection String:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";

  1. HDR=Yes; indicates that the first row contains columnnames, not data. HDR=No; indicates the opposite.

  2. IMEX=1; tells the driver to always read "intermixed" (numbers, dates, strings etc) data columns as text. Note that this option might affect excel sheet write access negative.

SQL syntax SELECT * FROM [sheet1$]. I.e. excel worksheet name followed by a $ and wrapped in [ ] brackets.

Important:

  • Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance.

  • If the Excel workbook is protected by a password, you cannot open it for data access, even by supplying the correct password with your connection string. If you try, you receive the following error message: "Could not decrypt file."

Doddered answered 10/8, 2009 at 4:41 Comment(2)
Perfect! Helped me with the same problem. Escape the double quote around: Extended Properties="Excel 8.0;HDR=Yes;IMEX=1" becomes Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\" And all your problems will be fine!Cheeks
The registry key is different for the ACE Excel driver: Excel 2007: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows Excel 2010: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows Excel 2013: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows sourceHydrogenize
S
3

The Excel data source picks a column type for the entire column. If one of the cells doesn't match that type exactly, it leaves blanks like that. We had issues where our typist entered a " 8" (a space before the number, so Excel converted it to a string for that cell) in a numeric column. It would make sense to me that it would try the .Net Parse methods as they are more robust, but I guess that's not how the Excel driver works.

Our fix, since we were using database import services, was to log all the rows that 'failed' this way. Then, we went back to the XLS document and re-typed those cells, to ensure the underlying type was correct. (We found just deleting the space didn't fix it--we had to Clear the whole cell first, than re-type the '8'.) Feels hacky and isn't elagent, but that was the best method we found. If the Excel driver can't read it in correctly by itself, there's nothing you can do to get that data out of there once you're in .Net.

Just another case where Office hides the important details from users in the name of simplicity, and therefore making it more difficult when you have to be exact for power uses.

Sisile answered 5/3, 2009 at 16:47 Comment(2)
Ditto on this, and if Excel is being incredibly picky, you can try just accessing that one single cell: SELECT F1 FROM [MyWorksheet$B12:B12]Gastroenteritis
The option of IMEX=1 in the answer by @Doddered does work for us in some situations.Sisile
S
1

The {} means this is some sort of empty object and not a string. When you hover over the object you should be able to see its type. Likewise, when you use quickwatch to view dr["..."] you should see the object type. What type is the object you receive?

Schulte answered 18/11, 2008 at 14:35 Comment(0)
A
1

The ItemArray is an Object Array. So I assume that the "column" in the DataRow, that I am trying to reference, is of type object.

Alinealinna answered 19/11, 2008 at 4:48 Comment(1)
I checked the data type of the column when it gets a primary key and the column in the DataRow is double, the primary key has a letter in it. I understand why it is not getting the value, but how do I force the DataRow column's data type to string.Anthodium
G
1

For VISTA compatibility you can use EXCEL 12.0 driver in connection string. This should resolve your issue. It did mine.

Geller answered 18/2, 2009 at 15:49 Comment(0)
S
1

Solution:

  1. You put HDR=No so that the first row is not considered the column header. Connection String: Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended Properties="Excel 8.0;HDR=No;IMEX=1";
  2. You ignore the first row and you acces the data by any means you want (DataTable, DataReader ect). You acces the columns by numeric indexes, instead of column names.

It worked for me. This way you don't have to modify registers!

Subclimax answered 3/9, 2009 at 9:53 Comment(0)
C
1

I answered a similar question here. Here I've copied and pasted the same answer for your convenience:

I had this same problem, but was able to work around it without resorting to the Excel COM interface or 3rd party software. It involves a little processing overhead, but appears to be working for me.

  1. First read in the data to get the column names
  2. Then create a new DataSet with each of these columns, setting each of their DataTypes to string.
  3. Read the data in again into this new dataset. Voila - the scientific notation is now gone and everything is read in as a string.

Here's some code that illustrates this, and as an added bonus, it's even StyleCopped!

public void ImportSpreadsheet(string path)
{
    string extendedProperties = "Excel 12.0;HDR=YES;IMEX=1";
    string connectionString = string.Format(
        CultureInfo.CurrentCulture,
        "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"{1}\"",
        path,
        extendedProperties);

    using (OleDbConnection connection = new OleDbConnection(connectionString))
    {
        using (OleDbCommand command = connection.CreateCommand())
        {
            command.CommandText = "SELECT * FROM [Worksheet1$]";
            connection.Open();

            using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
            using (DataSet columnDataSet = new DataSet())
            using (DataSet dataSet = new DataSet())
            {
                columnDataSet.Locale = CultureInfo.CurrentCulture;
                adapter.Fill(columnDataSet);

                if (columnDataSet.Tables.Count == 1)
                {
                    var worksheet = columnDataSet.Tables[0];

                    // Now that we have a valid worksheet read in, with column names, we can create a
                    // new DataSet with a table that has preset columns that are all of type string.
                    // This fixes a problem where the OLEDB provider is trying to guess the data types
                    // of the cells and strange data appears, such as scientific notation on some cells.
                    dataSet.Tables.Add("WorksheetData");
                    DataTable tempTable = dataSet.Tables[0];

                    foreach (DataColumn column in worksheet.Columns)
                    {
                        tempTable.Columns.Add(column.ColumnName, typeof(string));
                    }

                    adapter.Fill(dataSet, "WorksheetData");

                    if (dataSet.Tables.Count == 1)
                    {
                        worksheet = dataSet.Tables[0];

                        foreach (var row in worksheet.Rows)
                        {
                            // TODO: Consume some data.
                        }
                    }
                }
            }
        }
    }
}
Cumming answered 9/9, 2010 at 16:18 Comment(0)
H
1

Order the records in the xls file by ascii code in descending order so that alpha-numeric fields will appear at the top below the header row. This ensures that the first row of data read will define the data type as "varchar" or "nvarchar"

Hera answered 25/10, 2011 at 16:40 Comment(0)
A
0

hi all this code is gets alphanumeric values also

using System.Data.OleDb;

string ConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + filepath + ";" + "Extended Properties="+(char)34+"Excel 8.0;IMEX=1;"+(char)34;

string CommandText = "select * from [Sheet1$]";

OleDbConnection myConnection = new OleDbConnection(ConnectionString);
myConnection.Open();

OleDbDataAdapter myAdapter = new OleDbDataAdapter(CommandText, myConnection);

ds = null;
ds = new DataSet();
myAdapter.Fill(ds);
Actaeon answered 13/5, 2011 at 8:48 Comment(0)
T
0

This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1. Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:

try
{
    Console.Write(wsReader.GetDouble(j).ToString());
}
catch   //Lame unfixable bug
{
    Console.Write(wsReader.GetString(j));
}

Tolly answered 16/7, 2012 at 19:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.