OleDB, Misses the first character of data
Asked Answered
K

4

5

I have a CSV Reading code for ASP.NET application I maintain. This ASP.NET website is running fine from 3 yrs now, and CSV reading code that use Ole.JetDB.4.0 is doing its work fine, except that once in a while some CSV with more than 4K-5K records create a problem. Usually the problem is that a record at random position [random row] miss the first character of it.

CSV File is just bunch of name and addresses per row, and they are in ASNI Format. CSV is comma seperate, no data have "comma" in data and now enclosing of field in Single or Double quote. Also, it doesn't happen often, We use the same code for say 70K record upload they works fine, but some time say in 3 yrs about 3-4 files have this problem only, we upload about one file daily.

For those who need what I did

using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection
    ("Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties='text;HDR=Yes;FMT=Delimited';Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
{
   string sql_select = "select * from [" + this.FileName + "]";
   System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter();
   da.SelectCommand = new System.Data.OleDb.OleDbCommand(sql_select, conn);
   DataSet ds = new DataSet();
   // Read the First line of File to know the header
   string[] lines = System.IO.File.ReadAllLines(HttpContext.Current.Server.MapPath("/System/SaleList/") + FileName);
   string header = "";
   if (lines.Length > 0)
      header = lines[0];

   string[] headers = header.Split(',');
   CreateSchema(headers, FileName);
   da.Fill(ds, "ListData");
   DataTable dt = ds.Tables["ListData"];
}

And this code is working fine except the mention thing. I cut some unrelated part so, might not work by copy paste.

EDIT: More information

  1. I try to use ODBC with Microsoft Text Driver, then I use ACE Driver with OleDB. the result is same with all three drive.

  2. If I swap the problem record, with the preceding Row those rows are read quite well, until the next problem row [if more than one row is having problem in original file], if those are only problem row it works fine.

So from above it looks like that something is there that distract character counter, but how I can ensure it working smooth is still a quiz.

EDIT 2: I have submitted it as bug to Microsoft here : https://connect.microsoft.com/VisualStudio/feedback/details/811869/oledb-ace-driver-12-jet-4-0-or-odbc-text-driver-all-fail-to-read-data-properly-from-csv-text-file

Kingkingbird answered 17/12, 2013 at 18:38 Comment(7)
That looked like a real file with private information. I removed it from the question but you should remove the source.Pedanticism
Sumit Gupta do you have a way of debugging with any of those files that fail ?Issykkul
Yes, I can use Visual Studio or any tool you want me to debug with. But I am not really sure how to debug it. My Dataset shows all Record been read except the one record missing just the first character.Kingkingbird
@LarsTech, thanks, I just thought someone want to see the file and hence I upload it.Kingkingbird
@SumitGupta This question might help!Parasynthesis
Sumit, I have two questions 1) does this file open properly in other editors like Excel / Notepad++, etc. 2) Can you please provide me the erroneous file along with the line number where you face this problem. Probably I might take a look at it.Eagan
@ConsultYarla Yes, the file gets open in Excel/Notepad, infact the file I got was last edit using Excel 2010, then I save it again using Excel 2013, 2010 on various machine to test. And I edit it using notepad++ only, but unless I change the character count it didn't work fine for me. I have posted the file in my Bug report to Microsoft, as LarsTech said it is been sensitive data to share in public forum. Maybe you provide me your email id to send file to you?Kingkingbird
S
6

I would suggest you examine a problem file with a hex editor - inspect the line that causes the problem and the line immediately preceding it.

In particular look at the line terminators (CR/LF? CR only? LF only?) and look for any non-printable characters.

Sessions answered 18/12, 2013 at 12:6 Comment(6)
I did all that, infact I try to change the file from Ansi to Unicode or visa versa, As I stated, when I change the order of problem row with anyone above it will solve for that line, but still give problem for the next error line. What I think is that it is problem in counting the buffer within library.. maybe.Kingkingbird
@SumitGupta I don't think you should try messing with the encoding as that will cause you more problems. As Joe suggests use a HEX Editor to check your row values if your rows are separated with a CR/LF combination look for 0D0A then check the characters directly after that.Parasynthesis
I did open the file in HEX Editor, it has OD as Line break in there, and it is consistent through out the file. No special/non printable character etc.Kingkingbird
@SumitGupta So your file contains OD (CR) only, this could be your problem as you're using FMT=Delimited in the provider string.Parasynthesis
Delimited is used for "," delimitation and not for line break. Also the source of all our files is same computer and same installation of Excel. It is as you understand is automate system, we get 10-15 CSV daily with 200-70K records in these CSV and all parse fine, just some 4-5 CSV have this problem, even if we change the OD to OA or ODOA, in any case it still give same problem.Kingkingbird
@Joe, I use Excel and this time save file as MS-DOS (*.csv) instead of normal Comma Seperated CSV option, this make file readable, One thing that I observe was that it make line break ODOA, but I am not sure what other change it might have done. So, I think it was line break that was causing issue, though I do not fully agree with it, as when changing order of record it still get read, so there is something wrong withing those Drivers/provider that cause the inconsistencyKingkingbird
O
1

Try using ACE Driver instead of JET (it's available on x86 and x64 servers, JET is only x86!)

using (System.Data.OleDb.OleDbConnection conn 
     = new System.Data.OleDb.OleDbConnection
    ("Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties="Excel 12.0 Xml;HDR=YES";
     Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
 {
Odaodab answered 23/12, 2013 at 16:47 Comment(1)
I already use that as well as ODBC Text driver, but all three provide same error in data read.Kingkingbird
O
1

I got the same OleDB, Missing characters of data problem, see here:

enter image description here

The characters go missing because the Microsoft.Jet.OLEDB.4.0 driver tries to guess the column datatype. In my case its was treating the data as hexadecimal not alphanumeric.

Problematic oledbProviderString:

oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"
{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited\"";

To fix the problem I added TypeGuessRows=0

oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"  
{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited;TypeGuessRows=0\"";

Repro:

Create a Book1.csv file with this content:

KU88,G6,CC
KU88,F7,CC

Step through this code as pictured above.

private void button1_Click(object sender, EventArgs e)
{
    string folder = @"G:\Developers\Folder";
    ReproProblem(folder);
}

static string oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited\"";

private void ReproProblem(string folderPath)
{
    using (OleDbConnection oledbConnection = new OleDbConnection(string.Format(oledbProviderString, folderPath)))
    {
        string sqlStatement = "Select * from [Book1.csv]";
        //open the connection
        oledbConnection.Open();
        //Create an OleDbDataAdapter for our connection
        OleDbDataAdapter adapter = new OleDbDataAdapter(sqlStatement, oledbConnection);
        //Create a DataTable and fill it with data
        DataTable table = new DataTable();
        adapter.Fill(table);
        //close the connection
        oledbConnection.Close();
    }
}
Oddfellow answered 18/2, 2015 at 3:53 Comment(6)
Wrapping the G7 and F7 with quotes is a workaround, if you want to fix it in the CSV file, eg KU88,"G7",CCOddfellow
Well, can you please try to save your CSV to "MS-DOS csv" using Excel and then try on old problematic Connection string, I am sure that will work even than. As it solve our problem, as now we use that Save as option for our CSV and never had problem again.Kingkingbird
I have included the repro code @Sumit, please try it yourself and save a file as CSV and another as MS-DOS CSV so you can see the accepted answer doesn't work in my case. If this thread contained any solutions that did work then I wouldn't have posted.Oddfellow
this is really getting interested. I try the demo saved Book1.csv, data is not read, Saved it as MSDos, it again misses even with TypeGuessRows. The only thing that work for me is to wrap all data in ". Even if I wrap G7 or F6 it doesn't work as per your comment.Kingkingbird
Hmm that's really oddOddfellow
well, my problem always get sorted by saving as ms-dos as then I use hex editor to find the files usually have inconsistent line ending. however, this file is strange, maybe G7 or F6 are read as some special string they looks like Cell Number as well.Kingkingbird
P
0

why dont u just use this:

using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection
("Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties='text;HDR=Yes;FMT=Delimited';Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
        {
           string sql_select = "select * from [" + this.FileName + "]";
           System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter();
           da.SelectCommand = new System.Data.OleDb.OleDbCommand(sql_select, conn);
           DataSet ds = new DataSet();

           // Read the First line of File to know the header
           string[] lines = System.IO.File.ReadAllLines(HttpContext.Current.Server.MapPath("/System/SaleList/") + FileName);
        DataTable mdt=new DataTable("ListData");
        for (int i = 1; i < lines.Length; i++)
        {
         string[] sep=lines[i].Split(',');
            foreach (var item in sep)
           {
               mdt.Rows.Add(sep);
           }
        }
           string header = "";
           if (lines.Length > 0)
              header = lines[0];

           string[] headers = header.Split(',');
        ds.Tables.Add(mdt);
        CreateSchema(headers, FileName);
        da.Fill(ds, "ListData");
       DataTable dt = mdt;}

i didnt debugged it. i hope there is no problem but if there is im here for you.

thank you very much

Pazice answered 29/12, 2013 at 9:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.