Excel Cell Formatting Problem
Asked Answered
H

2

7

I'm using the OleDB DataReader to get data from my excel files (but this problem occurs in DataTable.Fill as well). The thing is that I have a column that should return strings. All's fine and working but recently, an issue came up, because the column's cells had different formats. Some were in numbers and others in text. When I checked using dataReader.GetSchema() it showed that the column in question was inferred as type System.String. The problem with this was that all non-text cells were immediately set to null.

Is there a way to suggest to the reader that that column should just parse the columns as System.Object instead of inferring it as System.String and dumping all non-string cells?

The connection string I'm using is:

string connString = "Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + filePath + ";" + "Extended Properties=Excel 8.0;";

and the code is:

using ( OleDbConnection connection = new OleDbConnection( connString ) )
{
    connection.Open();
    foreach ( string worksheetName in worksheetNames )
    {
        using ( OleDbCommand command = 
            new OleDbCommand( "SELECT * FROM [" + worksheetName + "]", connection ) )
        {
            TEntity entity;
            using ( OleDbDataReader dataReader = command.ExecuteReader() )
            {
                while ( dataReader.Read() )
                {
                    entity = GetDataFromDataTable( dataReader );

                    if ( entity != null )
                    {
                        entityList.Add( entity );
                    }
                }
            }
        }
    }
    connection.Close();
}
Heritor answered 21/6, 2010 at 1:46 Comment(0)
P
7

You need to add IMEX or MAXSCANROWS to the connection string.

MAXSCANROWS - The number of rows to scan to determine the data type of each column. The data type is determined given the maximum number of kinds of data found. If data is encountered that does not match the data type guessed for the column, the data type will be returned as a NULL value.

For the Microsoft Excel driver, you can enter a number from 1 to 16 for the rows to scan. The value defaults to 8; if it is set to 0, all rows are scanned. (A number outside the limit will return an error.)

Moreover, check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance.

Prattle answered 21/6, 2010 at 15:33 Comment(1)
FYI: The MaxScanRows=0 is ignored when using the Microsoft.Jet.OLEDB.4.0 provider; it still only scans the first 8 rows. The Microsoft.ACE.OLEDB.12.0 provider does not seem to suffer the same problem. See the section on Rows to Scan in the KBHanna
T
2

Two things that come to mind:

  1. What type is 'TEntity'. Is it restricted to strings or another type?

  2. What type of list is entityList. Is it Generic, or is it a List(of TEntity), or some other type?

Turnaround answered 21/6, 2010 at 15:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.