Excel data extraction - Issue with column data type
Asked Answered
J

1

7

I am writing a C# library to read in Excel files (both xls and xlsx) and I'm coming across an issue.

Exactly the same as what was expressed in this question, if my Excel file has a column that has string values, but has a numeric value in the first row, the OLEDB provider assumes that column to be numeric and returns NULL for the values in that column that are not numeric.

I am aware that, as in the answer provided, I can make a change in the registry, but since this is a library I plan to use on many machines and don't want to change every user's registry values, I was wondering if there is a better solution.

Maybe a DB provider other than ACE.OLEDB (and it seems JET is no longer supported well enough to be considered)?

Also, since this needs to work on XLS / XLSX, options such as EPPlus / XML readers won't work for the xls version.

Joyless answered 28/7, 2015 at 15:37 Comment(4)
The way I get around this issue currently is by using the Excel COM interface and some C#, to open the file in Excel and re-save it as a .csv file. I also have a macro which is loaded and run to tidy stuff up but this way removes all the formatting and allows an easier import.Banlieue
Thanks, @AeroX, I was thinking about that, but am trying to avoid Interop if at all possible since that opens up dozens of other cans of worms :) ... I'm hoping someone else has another solution, but thanks so much for sharing your method!!Joyless
Use OpenXml Sdk instead.Nolte
@andrei.ciprian, that has the issue with only working on xlsx, not xls, no?Joyless
A
0

Your connection string should look like this

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\myFolder\myExcelfile.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1";

IMEX=1 in the connection string is the part that you need to treat the column as mixed datatype. This should work fine without the need to edit the registry.

HDR=Yes is simply to mark the first row as column headers and is not needed in your particular problem, however I've included it anyways.

To always use IMEX=1 is a safer way to retrieve data for mixed data columns.

Source: https://www.connectionstrings.com/excel/

Edit:

Here is the data I'm using:

data

Here is the output:

enter image description here

This is the exact code I used:

string connString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\test.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=YES;IMEX=1""";

using (DbClass db = new DbClass(connString))
{
    var x = db.dataReader("SELECT * FROM [Sheet1$]");
    while (x.Read())
    {
        for (int i = 0; i < x.FieldCount; i++)
            Console.Write(x[i] + "\t");
        Console.WriteLine("");
    }
}

The DbClass is a simple wrapper I made in order to make life easier. It can be found here:

http://tech.reboot.pro/showthread.php?tid=4713

Ashelyashen answered 29/7, 2015 at 19:16 Comment(7)
FYI, Sadly, Kobe, I found this online and it seems to make sense: set "IMEX=1" in the Extended Properties section of the connection string. This enforces the ImportMixedTypes=Text registry setting. You can change the enforcement of type by changing [HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\ImportMixedTypes] to numeric as well. - Basically IMEX=1 reinforces the registry setting, but is not good enough in-and-of itself.Joyless
I added the exact code I used to test. Can you provide a link to where you found that? Also, I do not have those registry settings, for either JET or ACE. So, I do not see how this can be so.Ashelyashen
It's not on MSDN or anything, but it seems to verify what I've been seeing to be true - codeproject.com/Articles/37055/… - about 1/4 way down the page....Joyless
Thanks for the update @KoBe - Are you sure you don't have the regstry keys set? I'm trying, but stil lnot getting the data returned.... You FOR SURE have a column beginning with (and most consisting of) numbers and with strings in there too and you're getting the string values back too??Joyless
Updated the post again. I'm sure. I've checked the Wow6432Node. What is the path you're looking at? Are you x64 or x86?Ashelyashen
This drives me crazy - I'm obviously doing something wrong, but have no clue where, then!!!! - In reply to your question, it's a x64 machine. Regardless, than you so much for your help and perseverance....Joyless
Ha, computers are crazy. You may not be doing anything wrong, as well as there could be some wild issue that's causing either your computer to not work, or mine to function without the keys. Regardless, I hope you figure out a solution. Perhaps pulling the Excel automation dlls into your project and going down that path.Ashelyashen

© 2022 - 2024 — McMap. All rights reserved.