Is there a better way to indicate "null" values in Excel?
Asked Answered
R

1

4

I have an Excel 2007 workbook that contains tables of data that I'm importing into DataTable objects using ADO.NET.

Through some experimentation, I've managed to find two different ways to indicate that a cell should be treated as "null" by ADO.NET:

  1. The cell is completely blank.
  2. The cell contains #N/A.

Unfortunately, both of these are problematic:

  1. Most of my columns of data in Excel are generated via formulas, but it's not possible in Excel to generate a formula that results in a completely blank cell. And only a completely blank cell will be considered null (an empty string will not work).

  2. Any formula that evaluates to #N/A (either due to an actual lookup error or because the NA() function was used) will be considered null. This seemed like the ideal solution until I discovered that the Excel workbook must be open for this to work. As soon as you close the workbook, OLEDB suddenly starts seeing all those #N/As as strings. This causes exceptions like the following to be thrown when filling the DataTable:

    Input string was not in a correct format. Couldn't store <#N/A> in Value Column. Expected type is Int32.

Question: How can I indicate a null value via an Excel formula without having to have the workbook open when I fill the DataTable? Or what can be done to make #N/A values be considered null even when the workbook is closed?

In case it's important, my connection string is built using the following method:

var builder = new OleDbConnectionStringBuilder
{
    Provider = "Microsoft.ACE.OLEDB.12.0",
    DataSource = _workbookPath
};
builder.Add("Extended Properties", "Excel 12.0 Xml;HDR=Yes;IMEX=0");
return builder.ConnectionString;

(_workbookPath is the full path to the workbook).

I've tried both IMEX=0 and IMEX=1 but it makes no difference.

Ruination answered 21/4, 2011 at 1:7 Comment(18)
I can't really see the question? Would it be possible for you to modify the formulars?Sarcophagus
@Cilvic, Sorry, you're right. I'll add one.Ruination
@Cilvic, And yes, I can definitely edit the formulas. What I need to know is what formula result will in null. Once I know that, I can edit the formulas to produce that value any time I need null.Ruination
@Civic what is evaluating for null values later that makes them necessary? And is there a sample of the existing formula generating the #N/A values?Landlocked
@datatoo, an example formula: =IF(A1="",NA(),A1).Ruination
@datatoo, I'm not sure I understand your other question. Are you asking why I need null values in my database?Ruination
@DanM - Does the field type of #N/A only get represented when the formatting of field is specific to a certain type?Pycnometer
@Robbie, All my data in Excel is formatted as "General" at the moment.Ruination
@DanM - Have you tried to Format Text fields as Text and Number Fields as Number, i know it might be a very simple and basic question, but i have a process that Converts Excel to CSV for me, and i was getting the #N/A coming through when i had the formatting incorrectly stated. To be honest i dont think #N/A is anything like NULL, and dont believe you can do anything apart from In your DataTable updating #N/A with NULLPycnometer
@Robbie, You're probably right...it's just strange to me that #N/A works perfectly when the workbook is open. Somehow, the way the data is transmitted changes when the workbook is closed.Ruination
Could You check in the Script for the String #N/A ard replace it wich null? Maybe even treat the Wohle column as string and convert to it later.Sarcophagus
Also you could try to keep the empty strings =IF(A1="","",A1)Sarcophagus
@DanM I think Civic is right, leave the empty strings empty rather than making them #N/A, also do you have excel setup in options>calculation>save external link values ? If this is a large range calculation might be an issue.Landlocked
Does it help to treat the column as string (so that it can have #N/A and numbers converted to string)? Also, when importing it will require code to translate. i.e. if it is #N/A, assume it to be null, else cast it to an int32.Shiva
@Cilvic, @datatoo, I already know empty strings won't work. That was my whole point that a cell has to be "completely blank" (as in you have to clear the contents of the cell) for it to be treated as null.Ruination
@datatoo, I do have "Save external link values" checked in my Excel options, but I don't have any external data in my workbook.Ruination
And to those who are suggesting writing a script to check for #N/A and convert it to null, yes, I could do that, but scanning through every value of every table will definitely slow things down. I was hoping for a more elegant solution.Ruination
There's a reason people compare strings for empty string OR null you know. You're going to have to set a dummy value or something.Ehr
A
7

You're hitting the brickwall that many very frustrated users of Excel are experiencing. Unfortunately Excel as a company tool is widespread and seems quite robust, unfortunately because each cell/column/row has a variant data type it makes it a nightmare to handle with other tools such as MySQL, SQL Server, R, RapidMiner, SPSS and the list goes on. It seems that Excel 2007/2010 is not very well supported and even more so when taking 32/64 bit versions into account, which is scandalous in this day and age.

The main problem is that when ACE/Jet access each field in Excel they use a registry setting 'TypeGuessRows' to determine how many rows to use to assess the datatype. The default for "Rows to Scan" is 8 rows. The registry setting 'TypeGuessRows' can specify an integer value from one (1) to sixteen (16) rows, or you can specify zero (0) to scan all existing rows. If you can't change the registry setting (such as in 90% of office environments) it makes life difficult as the rows to guess are limited to the first 8.

For example, without the registry change If the first occurrence of #N/A is within the first 8 rows then IMEX = 1 will return the error as a string "#N/A". If IMEX = 0 then an #N/A will return 'Null'.

If the first occurrence of #N/A is beyond the first 8 rows then both IMEX = 0 & IMEX = 1 both return 'Null' (assuming required data type is numeric).

With the registry change (TypeGuessRows = 0) then all should be fine.

Perhaps there are 4 options:

  1. Change the registry setting TypeGuessRows = 0

  2. List all possible type variations in the first 8 rows as 'dummy data' (eg memo fields/nchar(max)/ errors #N/A etc)

  3. Correct ALL data type anomalies in Excel

  4. Don't use Excel - Seriously worth considering!

Edit: Just to put the boot in :) another 2 things that really annoy me are; if the first field on a sheet is blank over the first 8 rows and you can't edit the registry setting then the whole sheet is returned as blank (Many fun conversations telling managers they're fools for merging cells!). Also, if in Excel 2007/2010 you have a department return a sheet with >255 columns/fields then you have huge problems if you need non-contiguous import (eg key in col 1 and data in cols 255+)

Acquit answered 22/4, 2011 at 23:34 Comment(4)
+1 and thanks. I think I've been able to avoid the whole "guesses data types using the first 8 rows issue" by importing the schema of my database into the DataTable prior to filling it with the Excel data. This doesn't solve the null issue, though. I've come up with a solution using com interop that seems to provide me with both the speed and flexibility I need. So far, it's actually faster than my OLEDB solution. Finally, I'm sticking with Excel for now because it's just such a good fit for the kind of work I do (and all my coworkers use it), but there's definitely room for improvement.Ruination
Be careful.. using a schema is fine and the correct thing to do but the data/recordset returned from ACE/Jet does not necessarily match the schema you've defined.Acquit
True...I guess what I was thinking was that by using a schema, at least an exception would be thrown to indicate that the Excel-determined data type isn't correct. That said, using interop instead of OLEDB avoids the issue altogether.Ruination
If you have a standard template being used a dirty workaround to help in the future would be to create a VBA Workbook_BeforeClose event and paste the data to a named/hidden import sheet as values and replace all cells containing #.Acquit

© 2022 - 2024 — McMap. All rights reserved.