Microsoft.Jet.OLEDB.4.0 Converting Characters
Asked Answered
T

2

7

I'm working with a CSV that contains characters like:

” and •

I am reading the CSV via OleDb and the provider is Microsoft.Jet.OLEDB.4.0. when the data is loaded into the OleDbCommand, the characters are converted to the following respectively:

“ and •

I suspected there might be a collation setting in the connection string but I was unable to find anything about this.

I can confirm the following:

  1. I can see the original character in the CSV when I open it.
  2. If I run a select on the file through OleDb WHERE [field] LIKE '%•%' I get 0 rows but if SELECT WHERE [field] LIKE '%“%' I get rows returned.

Any thoughts?

Trisaccharide answered 26/4, 2011 at 17:25 Comment(2)
what exactly is your question?Bulbar
How can I get the data to load using an OleDbConnection and stop it from converting the characters when a select is run?Trisaccharide
T
15

Finally! Thanks to @HABJAN I was able to get to the resolution which is as simple as setting the CharacterSet in the Extended Properties of the connection string. For my situation it was UTF-8... commonly used by default in PHPMyAdmin which is where my data was retrieved from.

Resulting working connection string:

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"text;HDR=Yes;FMT=Delimited;CharacterSet=65001;\""

Key is CharacterSet=65001 (Code Page Identifiers) which might have been obvious to some collation-savvy individuals but I've somehow managed to avoid these issues over the years and never come across it in this respect.

I was also able to get HABJAN's solution to work when also following the documentation found at https://learn.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver and setting the CharacterSet to the same as above.

For my situation, this is the better method as it is a simpler/more maintainable solution, but +1 to HABJAN for helping me get there!

Thanks

Trisaccharide answered 26/4, 2011 at 20:27 Comment(1)
as per MSDN i see there are only 2 options to set for CharacterSet as ANSII or OEM. But, I see you mentioned the code explicitely for UTF8. Is this working and any other code like 1252 works as well? learn.microsoft.com/en-us/sql/odbc/microsoft/…Rubinrubina
B
4

You can create schema.ini file and play with format and CharacterSet properties.

Take a look at this sample: How to read data from Unicode formatted text file and import to Data Table using .Net

And here is another sample that will show you how to read csv file with schema.ini: Importing CSV file into Database with Schema.ini

Bicentennial answered 26/4, 2011 at 18:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.