SSIS Excel Data Source - Is it possible to override column data types?
Asked Answered
H

10

22

When an excel data source is used in SSIS, the data types of each individual column are derived from the data in the columns. Is it possible to override this behaviour?

Ideally we would like every column delivered from the excel source to be string data type, so that data validation can be performed on the data received from the source in a later step in the data flow.

Currently, the Error Output tab can be used to ignore conversion failures - the data in question is then null, and the package will continue to execute. However, we want to know what the original data was so that an appropriate error message can be generated for that row.

Hydropathy answered 12/2, 2009 at 10:55 Comment(1)
I've answered one question in another thread similar to yours, please refer to the link below. dba.stackexchange.com/questions/39252/…Bijouterie
I
14

Yes, you can. Just go into the output column list on the Excel source and set the type for each of the columns.

To get to the input columns list right click on the Excel source, select 'Show Advanced Editor', click the tab labeled 'Input and Output Properties'.

A potentially better solution is to use the derived column component where you can actually build "new" columns for each column in Excel. This has the benefits of

  1. You have more control over what you convert to.
  2. You can put in rules that control the change (i.e. if null give me an empty string, but if there is data then give me the data as a string)
  3. Your data source is not tied directly to the rest of the process (i.e. you can change the source and the only place you will need to do work is in the derived column)
Incendiarism answered 12/2, 2009 at 11:8 Comment(5)
Robert - Thanks for your response - I'm still having a problem seeing how to do this. I am using the Excel Source component which has a columns form, but the this only has the name of each column, not the datatype. The Excel Connection manager does not have a columns form as far as I can see.Hydropathy
I think you are going to the wrong property editor (Excel source has two, a simple and an advanced). I have updated the answer with a bit more clarification and a step-by-step to get to the right place.Incendiarism
Getting there now - I have used the advanced editor and changed the input and output columns to all have a datatype of DT_WSTR. However the error output column data typs have to match the input columns data types, and it does not seem possible to change these... Thanks for your continuing help!Hydropathy
Why was this question marked as answered? The above comment clearly states that this does not work.Jaquiss
We need to set the source column type, not the output type.Rigid
D
15

According to this blog post, the problem is that the SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows:

  • If the top 8 records contain equal number of numeric and character types – then the priority is numeric
  • If the majority of top 8 records are numeric then it assigns the data type as numeric and all character values are read as NULLs
  • If the majority of top 8 records are of character type then it assigns the data type as string and all numeric values are read as NULLs

The post outlines two things you can do to fix this:

  1. First, add IMEX=1 to the end of your Excel driver connection string. This will allow Excel to read the values as Unicode. However, this is not sufficient if the data in the first 8 rows are numeric.
  2. In the registry, change the value for HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Nod\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows to 0. This will ensure that the driver looks at all the rows to determine the data type for the column.
Denounce answered 13/8, 2014 at 17:19 Comment(0)
I
14

Yes, you can. Just go into the output column list on the Excel source and set the type for each of the columns.

To get to the input columns list right click on the Excel source, select 'Show Advanced Editor', click the tab labeled 'Input and Output Properties'.

A potentially better solution is to use the derived column component where you can actually build "new" columns for each column in Excel. This has the benefits of

  1. You have more control over what you convert to.
  2. You can put in rules that control the change (i.e. if null give me an empty string, but if there is data then give me the data as a string)
  3. Your data source is not tied directly to the rest of the process (i.e. you can change the source and the only place you will need to do work is in the derived column)
Incendiarism answered 12/2, 2009 at 11:8 Comment(5)
Robert - Thanks for your response - I'm still having a problem seeing how to do this. I am using the Excel Source component which has a columns form, but the this only has the name of each column, not the datatype. The Excel Connection manager does not have a columns form as far as I can see.Hydropathy
I think you are going to the wrong property editor (Excel source has two, a simple and an advanced). I have updated the answer with a bit more clarification and a step-by-step to get to the right place.Incendiarism
Getting there now - I have used the advanced editor and changed the input and output columns to all have a datatype of DT_WSTR. However the error output column data typs have to match the input columns data types, and it does not seem possible to change these... Thanks for your continuing help!Hydropathy
Why was this question marked as answered? The above comment clearly states that this does not work.Jaquiss
We need to set the source column type, not the output type.Rigid
L
10

If your Excel file contains a number in the column in question in the first row of data, it seems that the SSIS engine will reset the type to a numeric type. It kept resetting mine. I went into my Excel file and changed the numbers to "Numbers stored as text" by placing a single quote in front of them. They are now read as text.

I also noticed that SSIS uses the first row to IGNORE what the programmer has indicated is the actual type of the data (I even told Excel to format the entire column as TEXT, but SSIS still used the data, which was a bunch of digits), and reset it. Once I fixed that by putting a single-quote in my Excel file in front of the number in the first row of data, I thought it would get it right, but no, there is additional work.

In fact, even though the SSIS External DataSource Column now has the type DT_WSTR, it will still read 43567192 as 4.35671E+007. So you have to go back into your Excel file and put single quotes in front of all the numbers.

Pretty LAME, Microsoft! But there's your solution. I have no idea what to do if the Excel file is not under your control.

Logistician answered 23/9, 2009 at 17:4 Comment(0)
I
5

I was looking for a solution for the similar issue, but didn't find anything on the internet. Although most of the found solutions work at design time, they don't work when you want to automate your SSIS package.

I resolved the issue and made it work by changing the properties of "Excel Source". By default the AccessMode property is set to OpenRowSet. If you change it to SQL Command, you can write your own SQL to convert any column as you wish.

For me SSIS was treating the NDCCode column as float, but I needed it as a string and so I used following SQL:

Select [Site], Cstr([NDCCode]) as NDCCode From [Sheet1$]

enter image description here

enter image description here

Isoline answered 18/8, 2017 at 17:34 Comment(1)
I'm upvoting because in theory this solution could work. I tried it but Windows 8 + Visual Studio 2017 + the latest BIDS is crashing every time I get to a certain point with it. So I can't tell for sureUncovered
F
4

Excel source is SSIS behaves crazy. SSIS determines the type of data in a particualr column by reading first 10 rows.. hence the issue. If you have a text column with null values in first 10 roes, SSIS takes the data type as Int. With a bit of struggle, here is a workaround

  1. Insert a dummy row (preferrably first row) in the worksheet. I prefer doing this thru a Script task, you may consider using some service to preprocess the file before SSIS connects to it

  2. With the duummy row, you are sure that the datatypes will be set as you need

  3. Read the data using Excel source and filter out the dummy row before you take it for further processing.

I know it is a bit shabby, but it works :)

Foveola answered 28/2, 2014 at 22:21 Comment(0)
R
1

I could fix this issue. while creating the SSIS package, I manually changed the specific column to text (Open the excel file select the column, right click on column, select format cells, in number tab select Text and save the excel). Now create the SSIS package and test it. It works. Now try to use the excel file where this column was not set as text.

It worked for me and I could execute the package successfully.

Rozele answered 26/6, 2014 at 2:24 Comment(0)
M
0

This should be resolved simply, just untick the box "Frist row as column names" and all data will be collected as text data type. Only downside of this choice is that you have to manage the columns names from the auto names (column 1, 2 etc) and handle the first row which contains the column names.

Menagerie answered 11/12, 2019 at 12:9 Comment(0)
R
0

We are not permitted to edit the source files due to data integrity and audit trails, etc, so we cannot open the files and export the sheets as CSV or insert dummy rows. Formatting the columns as "Text" instead of "General" works but we need a fully automated solution. I designed a C# script task to programmatically open the workbooks and format the columns as text, then closes and saves it with an incremented version number. The issue there was that using it as a script task fails because the Excel Interops are not available on the Dts server, which we cannot control as far as installation of Office, etc. So...I made the code into an executable that runs in a job step before the package runs and passes the new name in to the variable that supplies the file name. That works well and also avoids the way Excel is sometimes left open if there is a failure and that locks the file. If there is interest I will share my custom classes here as soon as I get a second to breathe. I am working on a basic POC now but plan to parameterize it further for more use and to make it atomic.

Rhombic answered 28/6, 2023 at 20:5 Comment(0)
S
-1

I had trouble implementing the solution here - I could follow the instructions, but it only gave new errors.

I solved my conversion issues by using a Data Conversion entity. This can be found on the SSIS Toolbox under Data Flow Transformations. I placed the Data Conversion between my Excel Source and OLE DB Destination, linked Excel to Data C, Data C to OLE DB, double clicked Data C to bring up a list of the data columns. Gave the problem column a new Alias, and changed the Data Type column.

Lastly, in the Mappings of the OLE DB Destination, use the Alias column name, rather than the original Excel column name. Job done.

Schonfield answered 14/1, 2014 at 2:50 Comment(4)
It looks like you forgot to link to the solution your referenceingHaihaida
This is an elaboration of Farouq's suggestion from two years earlier, and has the same problem.Allotrope
Yes, it was an elaboration - to show all the steps necessary to make the process work - and it did work. Did you try it before deciding that it was wrong and needed a down vote?Schonfield
This does have the same problem as the other solution (yes, I tried it). The problem is that the excel source determines the data type on the INPUT into the excel source. By the time it gets to either output or onto the next object like a data conversion, it's too late. In my case, if the first eight rows are six strings and two numerics, the numerics come in null. I attempted to fix it using Alok B's solution of making it a SQL command writing a query instead. This might work but Windows 8 + Visual Studio 2017 + the latest BIDS is crashing every time I get to a certain point with it.Uncovered
G
-2

You can use a Data Conversion component to convert to the desired data types.

Gopak answered 30/3, 2012 at 21:18 Comment(1)
If SSIS, in its infinite wisdom, decides that a column is numeric, reading text values will cause the data source to fail. There's no opportunity to convert the data downstream.Allotrope

© 2022 - 2025 — McMap. All rights reserved.