SSIS: Code page goes back to 65001
Asked Answered
B

7

16

In an SSIS package that I'm writing, I have a CSV file as a source. On the Connection Manager General page, it has 65001 as the Code page (I was testing something). Unicode is not checked.

The columns map to a SQL Server destination table with varchar (among others) columns.

There's an error at the destination: The column "columnname" cannot be processed because more than one code page (65001 and 1252) are specified for it.

My SQL columns have to be varchar, not nvarchar due to other applications that use it.

On the Connection Manager General page I then change the Code page to 1252 (ANSI - Latin I) and OK out, but when I open it again it's back to 65001. It doesn't make a difference if (just for test) I check Unicode or not.

As a note, all this started happening after the CSV file and the SQL table had columns added and removed (users, you know.) Before that, I had no issues whatsoever. Yes, I refreshed the OLE DB destination in the Advanced Editor.

This is SQL Server 2012 and whichever version of BIDS and SSIS come with it.

Bradway answered 26/1, 2018 at 0:56 Comment(6)
Try removing and readding the Flat File Connection ManagerRoundel
I wanted to avoid that and going and redefining the types of the columns for hundreds of columns and then remapping them to the SQL table. But I'll give it a try.Bradway
I have not done this for this property, but I have done this in other places. Open the dtsx using notepad and edit the property in there.Oder
There's no property with 65001 value in dtsx or commgr files. I started from scratch, a new solution, and still, it forces code page to 65001. Must be something in the data.Bradway
It could be that the file is marked unicodeOder
The flat file code page is 65001 = Unicode (UTF-8), you cannot change it because the Code Page property in the flat file connection manager is designed to Specify the code page for non-Unicode text. check my answer for more details.Roundel
R
9

65001 Code page = Unicode (UTF-8)

Based on this Microsoft article (Flat File Connection Manager):

Code page
Specify the code page for non-Unicode text.

Also

You can configure the Flat File connection manager in the following ways:

Specify the file, locale, and code page to use. The locale is used to interpret locale-sensitive data such as dates, and the code page is used to convert string data to Unicode.

So when the flat file has a Unicode encoding:

Then this property cannot be changed, it will always return to it original encoding.

For more infor about the Code Page identifiers, you can refer to this article:

Roundel answered 27/1, 2018 at 11:31 Comment(9)
I'm not sure I follow. Does it mean that always a CSV file will be 65001? Or does it mean that it detects something in the data of this particulate file that forces it to be 65001? There's no "header" or any config data in a CSV file other than the actual data.Bradway
In any case, what I did is to import the CSV file into a SQL table with nvarchars that I'm using just as a staging table. Then I copy the data to my " real" table.Bradway
@Bradway i think that staging table is a good idea, also you can use a script task to change the file encoding. Also you can use Data Conversion Transformation to achieve this within the SSIS. But i think that your choice (staging table) is the easiest way.Roundel
@Bradway The text file is saved using an encoding, this encoding can be detected by reading some bytes from the text metadata which is not visible in the user interface. Also there are many topic and article about Characters Set detection or File Encoding Detection you can search to find more detailed information. Anyway if the file is saved with encoding 65001 or other unicode character set, it cannot be changed from the flat file connection manager.Roundel
@Bradway the simplest way to change a file encoding just open it using notepad, choose to Save As, in the save file dialog, you will see a drop down list to choose the encoding of the file, you can choose other encodingsRoundel
Thanks. I never noticed that! In any case, this CSV is being generated by another system and this will be set as SQL job that invokes the SSIS package, so I can't open it and save as. The staging table is my safest bet. Thanks for the info, though.Bradway
@Bradway its something that i learned while searching for this issue. So thank youRoundel
@Bradway Do you need any other help?Roundel
no. I'm sticking to the stating table solution. Thanks!Bradway
C
15

If it is a CSV file column text stream [DT_TEXT] to SQL varchar(max) data type that you want to convert to, change the flat file Connection Manager Editor property Code page to 1252 (ANSI - Latin I).

enter image description here

Concernment answered 7/2, 2019 at 10:19 Comment(0)
R
9

65001 Code page = Unicode (UTF-8)

Based on this Microsoft article (Flat File Connection Manager):

Code page
Specify the code page for non-Unicode text.

Also

You can configure the Flat File connection manager in the following ways:

Specify the file, locale, and code page to use. The locale is used to interpret locale-sensitive data such as dates, and the code page is used to convert string data to Unicode.

So when the flat file has a Unicode encoding:

Then this property cannot be changed, it will always return to it original encoding.

For more infor about the Code Page identifiers, you can refer to this article:

Roundel answered 27/1, 2018 at 11:31 Comment(9)
I'm not sure I follow. Does it mean that always a CSV file will be 65001? Or does it mean that it detects something in the data of this particulate file that forces it to be 65001? There's no "header" or any config data in a CSV file other than the actual data.Bradway
In any case, what I did is to import the CSV file into a SQL table with nvarchars that I'm using just as a staging table. Then I copy the data to my " real" table.Bradway
@Bradway i think that staging table is a good idea, also you can use a script task to change the file encoding. Also you can use Data Conversion Transformation to achieve this within the SSIS. But i think that your choice (staging table) is the easiest way.Roundel
@Bradway The text file is saved using an encoding, this encoding can be detected by reading some bytes from the text metadata which is not visible in the user interface. Also there are many topic and article about Characters Set detection or File Encoding Detection you can search to find more detailed information. Anyway if the file is saved with encoding 65001 or other unicode character set, it cannot be changed from the flat file connection manager.Roundel
@Bradway the simplest way to change a file encoding just open it using notepad, choose to Save As, in the save file dialog, you will see a drop down list to choose the encoding of the file, you can choose other encodingsRoundel
Thanks. I never noticed that! In any case, this CSV is being generated by another system and this will be set as SQL job that invokes the SSIS package, so I can't open it and save as. The staging table is my safest bet. Thanks for the info, though.Bradway
@Bradway its something that i learned while searching for this issue. So thank youRoundel
@Bradway Do you need any other help?Roundel
no. I'm sticking to the stating table solution. Thanks!Bradway
A
4

I solved this in SSIS through Derived Column Transformation

Derived Column

Arizona answered 1/7, 2020 at 15:22 Comment(1)
I tried the Data Conversion Transformation, and while it would allow me to edit the code page on each column, it never saved the changes. I looked at the Derived Column Transformation, but this seems to be a column-by-column operation, not bulk. I've got 56 columns coming in an utf-8 encoded csv out of my sql. It's insane that SSIS is a major regression from Import Wizard which would just do the conversions automatically.Residentiary
R
1

If it's a csv file, you can still use code page 1252 to process it. When you open the flat file connection manager it shows you the code page for the file, but you don't need to save that setting. If you have other changes to make in the connection manager, change the code page back to 1252 before you save the changes. It will process fine if there are no unicode characters in the file.

Republican answered 2/6, 2020 at 14:25 Comment(0)
N
1

In my case the file was generated in Excel and (mistakenly) saved as CSV UTF-8 (Comma delimited) (*.csv) instead of simply CSV (Comma delimited) (*.csv). Once I saved the file as the correct form of CSV, the code page no longer changed from 1252 (ANSI - Latin I).

Northwester answered 19/11, 2021 at 17:21 Comment(1)
It worked for me, also I changed the Sql-Server Destination to OLE DB Destination.Sherrisherrie
C
1

If you encounter this issue ensure that the DataType for your columns in the Flat File Connection Manager are set to Unicode string [DT_WSTR] and not string [DT_STR]

You can then use the Data Conversion task to convert to DT_STR with code page 1252 and it will retain the settings.

Chablis answered 28/3, 2023 at 15:40 Comment(0)
I
0

I was running into a similar challenge, which is how I ended up on this page looking for a solution. I resolved it using a different approach. I opened the csv in Notepad++. One of the menu options is called Encoding. If you select that, it will give you the option to "Convert to ANSI." I knew that my file did not contain any Unicode specific characters. When I went back to the SSIS package, I edited the flat file connection and it automatically changed it to 1252.

Inalterable answered 16/4, 2021 at 12:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.