Problem with using OleDbDataAdapter to fetch data from a Excel sheet
Asked Answered
T

3

6

First, I want to say that I'm out on deep water here, since I'm just doing some changes to code that is written by someone else in the company, using OleDbDataAdapter to "talk" to Excel and I'm not familiar with that. There is one bug there I just can't follow.

I'm trying to use a OleDbDataAdapter to read in a excel file with around 450 lines.

In the code it's done like this:

connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source='" + path + "';" + "Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1;\"");
connection.Open();
OleDbDataAdapter objAdapter = new OleDbDataAdapter(objCommand.CommandText, connection);
objAdapter.Fill(objDataSet, "Excel");

foreach (DataColumn dataColumn in objTable.Columns) {
  if (dataColumn.Ordinal > objDataSet.Tables[0].Columns.Count - 1) {
    objDataSet.Tables[0].Columns.Add();
  }
  objDataSet.Tables[0].Columns[dataColumn.Ordinal].ColumnName = dataColumn.ColumnName;
  objImport.Columns.Add(dataColumn.ColumnName);
}

foreach (DataRow dataRow in objDataSet.Tables[0].Rows) {
   ...
}

Everything seems to be working fine except for one thing. The second column is filled with mostly four digit numbers like 6739, 3920 and so one, but fice rows have alphanumeric values like 8201NO and 8205NO. Those five cells are reported as having blank contents instead of their alphanumeric content. I have checked in excel, and all the cells in this columns are marked as Text.

This is an xls file by the way, and not xlsx.

Do anyone have any clue as why these cells are shown as blank in the DataRow, but the numeric ones are shown fine? There are other columns with alphanumeric content that are shown just fine.

Takao answered 28/9, 2010 at 22:6 Comment(1)
Thanks to all for helping me out with this issue. You made me understand why this happened, so I could find the proper solution. I still think the way it works is quite horrible, but that's another story :)Kinross
U
8

What's happening is that excel is trying to assign a data type to the spreadsheet column based on the first several values in that column. I suspect that if you look at the properties in that column it will say it is a numerical column.

The problem comes when you start trying to query that spreadsheet using jet. When it thinks it's dealing with a numerical column and it finds a varchar value it quietly returns nothing. Not even a cryptic error message to go off of.

As a possible work around can you move one of the alpha numeric values to the first row of data and then try parsing. I suspect you will start getting values for the alpha numeric rows then...

Take a look at this article. It goes into more detail on this issue. it also talks about a possible work around which is:

However, as per JET documentation, we can override the registry setting thru the Connection String, if we set IMEX=1( as part of Extended Properties), the JET will set the all column type as UNICODE VARCHAR or ADVARWCHAR irrespective of ‘ImportMixedTypes’ key value.hey

Unchartered answered 28/9, 2010 at 22:15 Comment(1)
I have tested this now, and indeed if I put the first row alphanumeric, then it works as expected. My problem is that I can't make this a general rule, as the customers will read in their own sheets. However, my solution was to "cheat", so that I change HDR= No in the connection string to make sure that the alphanumeric header is read to make the column alphanumeric, then I cut the first line of the resulting DataTable. It's quite nasty, but I don't see any other options here. Thanks a lot for your help getting me in the right direction.Kinross
F
1

IMEX=1 means "Read mixed data as text."

There are some gotchas, however. Jet will only use several rows to determine whether the data is mixed, and if so happens these rows are all numeric, you'll get this behaviour.

See connectionstrings.com for details:

Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance. Please also note that adding the IMEX=1 option might cause the IMEX feature to set in after just 8 rows. Use IMEX=0 instead to be sure to force the registry TypeGuessRows=0 (scan all rows) to work.

Ferial answered 28/9, 2010 at 22:21 Comment(2)
I don't know how Jet is exposed in Excel, but in Access, you can change things like that at runtime in your current instance of the Jet db engine without having to alter the registry and restart Access.Hebe
Thanks for this information. It worked but was suprisingly slow, so I had to go with the "read and dispose header" trick instead.Kinross
E
1

I would advise against using the OleDb data provider stuff to access Excel if you can help it. I've had nothing but problems, for exactly the reasons that others have pointed out. The performance tends to be atrocious as well when you are dealing with large spreadsheets.

You might try this open source solution: http://exceldatareader.codeplex.com/

Enterectomy answered 28/9, 2010 at 23:24 Comment(1)
I totally agree with you Mark. I think it's quite horrible, but in this case I don't have any options as I'm assigned to fix this bug in an existing program, and have not been alloted the time to do any large refactorings. I'll keep your link in mind if I need to do this from scratch later on.Kinross

© 2022 - 2024 — McMap. All rights reserved.