Restoring dataset from delimiter separated values file
Asked Answered
E

4

1

In short, i'm new to delphi and I want to achieve the following:

  • I have table definition as .cds file {index, data, date}, and some data in .csv format.
  • I want to load the .csv file to the table and show log it's changes and errors (ex: invalid date format).

Question

How to solve this task elegantly?

Eclectic answered 17/6, 2011 at 12:29 Comment(5)
surely this problem has been solved many times before. What's special about your problem that needs a custom solution.Moneywort
@David Heffernan : That's what I tough. It disturbs me, that I could not find a working example at Google - i'm very new to delphi, so probably that's the reason.Eclectic
@Sertac Akyuz: I tried to use that or just setting cdsMaster.LogChanges := false; for a moment, but my program choked at 50k entries test data. So I'm not sure if I used it correctly.Eclectic
It looks to me like you are doing database backup/restoreMoneywort
Sorry, I had deleted the comment. Having read all, I still couldn't decide if it is relevant or not (if anyone wonders it was about MergeChangeLog).Cannell
C
3

You can read line by line from the .csv, set each line to 'DelimitedText' of a StringList, append a record to the dataset, loop the string list to set each field's value and then post to the dataset.
You can put the 'field value assinging'/'posting' in a try-except block and log any error message of raised exceptions together with information you like (e.g. malformed field value/name, line number, and/or entire line etc.) to a file f.i.

(I don't understand what you mean by 'changes', from what I understood, lines from the .csv will be inserted to a dataset, hence all changes will be inserts.)


edit: To be able to discuss on something concrete (I'm having a hard time grasping the task :))

Sample data (part of CodeGear sample 'Clients.cds'):

Davis;Jennifer;1023495,0000;100 Cranberry St.;Wellesley;MA;02181;516-292-3945;01.01.93 Jones;Arthur;2094056,0000;10 Hunnewell St;Los Altos;CA;94024;415-941-4321;07.02.81 Parker;Debra;1209395,0000;74 South St;Atherton;CA;98765;916-213-2234;23.10.90 Sawyer;Dave;3094095,0000;101 Oakland St;Los Altos;CA;94022;415-948-9998;21.12.89 White;Cindy;1024034,0000;1 Wentworth Dr;Los Altos;CA;94022;415-948-6547;01.10.92

procedure TForm1.FormCreate(Sender: TObject);
begin
  CDS.FieldDefs.Add('LAST_NAME', ftString, 20);
  CDS.FieldDefs.Add('FIRST_NAME', ftString, 20);
  CDS.FieldDefs.Add('ACCT_NBR', ftInteger);
  CDS.FieldDefs.Add('ADDRESS_1', ftString, 30);
  CDS.FieldDefs.Add('CITY', ftString, 15);
  CDS.FieldDefs.Add('STATE', ftString, 2);
  CDS.FieldDefs.Add('ZIP', ftString, 5);
  CDS.FieldDefs.Add('TELEPHONE', ftString, 12);
  CDS.FieldDefs.Add('DATE_OPEN', ftDate);
  CDS.CreateDataSet;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  csv: TextFile;
  Rec: string;
  Fields: TStringList;
  LineNo: Integer;
  i: Integer;
begin
  Fields := TStringList.Create;
  try
    Fields.StrictDelimiter := True;
    Fields.Delimiter := ';';

    AssignFile(csv, ExtractFilePath(Application.ExeName) + 'clients.csv');
    try
      Reset(csv);

      LineNo := 0;
      while not Eof(csv) do begin
        Inc(LineNo);
        Readln(csv, Rec);

        Fields.DelimitedText := Rec;
        CDS.Append;

        for i := 0 to Fields.Count - 1 do
          try
            CDS.Fields[i].Value := Fields[i];   // Variant conversion will raise
                                 // exception where conversion from string fails
          except
            on E:EDatabaseError do begin
              CDS.Cancel;        // Failed, discard the record

              // log the error instead of showing a message
              ShowMessage(Format('Cannot set field "%s" at line %d' + sLineBreak +
                  'Error: %s', [CDS.Fields[i].FieldName, LineNo, E.Message]));
              Break;             // Continue with next record
            end;
          end;

        if CDS.State = dsInsert then // It's not dsInsert if we Cancelled the Insert
          try
            CDS.Post;
          except
            on E:EDatabaseError do begin
              // log error instead of showing
              ShowMessage(Format('Cannot post line %d' + sLineBreak + 'Error: %s',
                  [LineNo, E.Message]));
              CDS.Cancel;
            end;
          end;

      end;
    finally
      CloseFile(csv);
    end;
  finally
    Fields.Free;
  end;
end;

procedure TForm1.CDSBeforePost(DataSet: TDataSet);
begin
  // Superficial posting error
  if CDS.FieldByName('LAST_NAME').AsString = '' then
    raise EDatabaseError.Create('LAST_NAME cannot be empty');
end;
Cannell answered 21/6, 2011 at 1:5 Comment(22)
@Sertac Akyuz : Dataset does not need to be empty, and delete + insert counts as update. Also i can expect, that some .csv data can be malformed. Getting the error inputs stored, is harder then it seems.Eclectic
@Eclectic - How do you decide a line in the .csv is supposed to be an insert or a delete?Cannell
@Sertac: This won't work well. First, a .csv file uses a comma as a delimiter, not a ;. But the problem is that you also have to deal with quoted values, where a comma can be not a delimiter, but part of the content, for instance Smith,John,"123 First St., Apt A" - the comma between "First St.," and " Apt A" isn't a delimiter. There are other rules about .csv files, too, such as allowing a row to be split by a line feed and continued, that TStrings won't deal with correctly. That's why I suggested TTextData, which deals with much of this for you.Poolroom
@Ken - I obtained the above sample by opening 'Clients.dbf' in Excel and then saving as '.csv'. It works fine with delimiter in fields when quoted (i.e. "101; Oakland St" is posted as 101; Oakland St). But I'm not against TTextData. Parsing/importing .csv can be achieved this way or that way. I posted the above code to get feedback from Margus to understand what the code is supposed to do besides impoting data.Cannell
As I understand it when the decimal separator is a comma, generally a semicolon is used as a value separator (link). Excel uses the 'List separator' in 'Regional Options' (link).Cannell
@Sertac, I didn't say anything about decimal separator. I specifically said '.csv', which stands for 'comma-separated values'. Since @Eclectic said nothing about the data (decimal separators, etc.), I went with the most generic solution. Not saying your answer is wrong (I didn't down-vote, after all); just pointing out some of the problems with trying to let TStringList do the parsing for you. :)Poolroom
@Ken - A significant part of the world (including my country, I don't know where Margus is :)) uses a semicolon as a value separator in '.csv' files. Wikipedia: "although in locales where the comma is used as a decimal separator, the semicolon is used instead as a delimiter". Currently I see no problem with my approach apart from a line extending to the next. And, I believe, that wouldn't grant a downvote. :)Cannell
@Ken - Just looked at the code for TTextDataSet. It loads the file to a TStringList, and treats each line a record.Cannell
@Sertac: I did NOT down-vote you, just to make things clear, like I said in my last comment. See "(I **didn't down-vote, after all)". As far as the source for TTextDataSet, that's all the more reason to use it; it already handles the loading and parsing, but lets you just access like normal TFields instead of doing the work yourself. And the OP has the CDS, with data and indices already.Poolroom
Downvoted for suggesting CSV parsing with TStringList, which I have said repeatedly is a terrible idea.Fahey
@Ken - Thanks, I know you would inform about a downvote if you'd change your mind for any reason. :)Cannell
@Warren - Thanks for stating the reason, since I didn't know about your repetitions I couldn't guess.. :) I attached working code to the question..., incidentally I'm trying to make the 'CsvDataDemo' work, which throws an exception with the included sample .csv, .. a quoted comma in a field value. .. Which the code in this answer handles just fine. And what an exception, cannot recover gracefully, have to "reset" the debug session. Anyway, thanks again for informing..Cannell
Even if you get a demo working, you should never make a production system that relies on TStringList as a CSV parser, it will fail.Fahey
Sertac: that's a regression in the latest then. It has always worked for me.Fahey
@Warren - Works fine.. For whatever reason the demo project has hardcoded paths for 'JvCsvData' and 'JvCsvParse', thus I was not using the latest sources actually. Removing these units from the .dpr made the demo work. BTW, please expand on when will "TStringList as a CSV parser will fail".Cannell
I have written extensively about it, and it's too big to post here. I will try to use your example code to make a failure case, to make it clearer.Fahey
Your example code, for instance, is not using TStringList to do the line-breaking, you're using the ancient ReadLn to do that. What would you suggest for someone who is using UTF8 delimited text files? You can't use Textfile+ReadLn to read that file. The problem of doing the line-breaks is itself non-trivial, and then following after that is the problem of breaking up the fields, using proper delimited text escape or quote character processing, as required.Fahey
@Warren - No one mentioned a UTF8 csv. Besides I didn't ask about it. I asked about what's wrong with parsing csv with a string list, as it is your downvote reason. A string list is perfectly capable handling 'line-breaks', 'quote escape', 'quote character processing' and 'breaking up fields'. C'mon, you've been repeatedly telling and extensively writing about it. Just give one concrete reason/example. All I want is to learn why I shouldn't be using a string list to parse csv.Cannell
First problem is the line splitting problem with embedded carriage-return linefeeds, which only affects people who have embedded cr/lf or other special formatting. Second problem is delimited text quoting rules in general, so semicolons may appear as a separator, or inside the body of a field. Third problem is TStringList splitting bugs, which are rare in test data, and not so rare in the real world.Fahey
@Warren - "First problem", I admitted this code would not handle that case in a comment to Ken. I'd be very surprised that this would be a problem though.. - "Second problem", What's all that got to do with TStringList?. You're telling general difficulties of csv parsing as a cause of not using a string list? - "Third problem", I have never heard of it, can you give an example? I'd like to find out how these splitting bugs would effect csv parsing.Cannell
Great! All I know is now is that, using a TStringList to parse csv is a terrible idea because of TStringList splitting bugs, which I have no clue about what are they..Cannell
It turns out I have stated some incorrect things in comments above. I appreciate being corrected. Thanks everybody. TStringList in Delphi 2010 and XE is not too bad at CSV parsing at all. TStringList prior to Delphi 2010, is terrible at parsing CSVs. This distinction is VERY important, and I was mistaken before in widely panning TStringList for CSV parsing.Fahey
F
4

I would use JvCsvDataSet (JEDI JVCL component) because it parses CSV files properly, and then use a data-pump component, to move the data into the client dataset, along with some validation.

But if all you really need to do is provide a CSV file, to a data-aware control, I would leave out the ClientDataSet completely, and just use a component built for the purpose you are trying to do. Don't use a screw as a nail, or a nail as a screw. They are both made of metal, but they do different jobs.

CSV file table definitions are quite different in purpose, to a CDS table definition, and the JvCsvDataSet provides a simple string property which you can set up to give the metadata (field datatypes like integer or string or date-time, and associated field names, for CSV files that lack a header row) more easily, than you could hope to do it in ClientDatSet.

Fahey answered 21/6, 2011 at 15:15 Comment(0)
C
3

You can read line by line from the .csv, set each line to 'DelimitedText' of a StringList, append a record to the dataset, loop the string list to set each field's value and then post to the dataset.
You can put the 'field value assinging'/'posting' in a try-except block and log any error message of raised exceptions together with information you like (e.g. malformed field value/name, line number, and/or entire line etc.) to a file f.i.

(I don't understand what you mean by 'changes', from what I understood, lines from the .csv will be inserted to a dataset, hence all changes will be inserts.)


edit: To be able to discuss on something concrete (I'm having a hard time grasping the task :))

Sample data (part of CodeGear sample 'Clients.cds'):

Davis;Jennifer;1023495,0000;100 Cranberry St.;Wellesley;MA;02181;516-292-3945;01.01.93 Jones;Arthur;2094056,0000;10 Hunnewell St;Los Altos;CA;94024;415-941-4321;07.02.81 Parker;Debra;1209395,0000;74 South St;Atherton;CA;98765;916-213-2234;23.10.90 Sawyer;Dave;3094095,0000;101 Oakland St;Los Altos;CA;94022;415-948-9998;21.12.89 White;Cindy;1024034,0000;1 Wentworth Dr;Los Altos;CA;94022;415-948-6547;01.10.92

procedure TForm1.FormCreate(Sender: TObject);
begin
  CDS.FieldDefs.Add('LAST_NAME', ftString, 20);
  CDS.FieldDefs.Add('FIRST_NAME', ftString, 20);
  CDS.FieldDefs.Add('ACCT_NBR', ftInteger);
  CDS.FieldDefs.Add('ADDRESS_1', ftString, 30);
  CDS.FieldDefs.Add('CITY', ftString, 15);
  CDS.FieldDefs.Add('STATE', ftString, 2);
  CDS.FieldDefs.Add('ZIP', ftString, 5);
  CDS.FieldDefs.Add('TELEPHONE', ftString, 12);
  CDS.FieldDefs.Add('DATE_OPEN', ftDate);
  CDS.CreateDataSet;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  csv: TextFile;
  Rec: string;
  Fields: TStringList;
  LineNo: Integer;
  i: Integer;
begin
  Fields := TStringList.Create;
  try
    Fields.StrictDelimiter := True;
    Fields.Delimiter := ';';

    AssignFile(csv, ExtractFilePath(Application.ExeName) + 'clients.csv');
    try
      Reset(csv);

      LineNo := 0;
      while not Eof(csv) do begin
        Inc(LineNo);
        Readln(csv, Rec);

        Fields.DelimitedText := Rec;
        CDS.Append;

        for i := 0 to Fields.Count - 1 do
          try
            CDS.Fields[i].Value := Fields[i];   // Variant conversion will raise
                                 // exception where conversion from string fails
          except
            on E:EDatabaseError do begin
              CDS.Cancel;        // Failed, discard the record

              // log the error instead of showing a message
              ShowMessage(Format('Cannot set field "%s" at line %d' + sLineBreak +
                  'Error: %s', [CDS.Fields[i].FieldName, LineNo, E.Message]));
              Break;             // Continue with next record
            end;
          end;

        if CDS.State = dsInsert then // It's not dsInsert if we Cancelled the Insert
          try
            CDS.Post;
          except
            on E:EDatabaseError do begin
              // log error instead of showing
              ShowMessage(Format('Cannot post line %d' + sLineBreak + 'Error: %s',
                  [LineNo, E.Message]));
              CDS.Cancel;
            end;
          end;

      end;
    finally
      CloseFile(csv);
    end;
  finally
    Fields.Free;
  end;
end;

procedure TForm1.CDSBeforePost(DataSet: TDataSet);
begin
  // Superficial posting error
  if CDS.FieldByName('LAST_NAME').AsString = '' then
    raise EDatabaseError.Create('LAST_NAME cannot be empty');
end;
Cannell answered 21/6, 2011 at 1:5 Comment(22)
@Sertac Akyuz : Dataset does not need to be empty, and delete + insert counts as update. Also i can expect, that some .csv data can be malformed. Getting the error inputs stored, is harder then it seems.Eclectic
@Eclectic - How do you decide a line in the .csv is supposed to be an insert or a delete?Cannell
@Sertac: This won't work well. First, a .csv file uses a comma as a delimiter, not a ;. But the problem is that you also have to deal with quoted values, where a comma can be not a delimiter, but part of the content, for instance Smith,John,"123 First St., Apt A" - the comma between "First St.," and " Apt A" isn't a delimiter. There are other rules about .csv files, too, such as allowing a row to be split by a line feed and continued, that TStrings won't deal with correctly. That's why I suggested TTextData, which deals with much of this for you.Poolroom
@Ken - I obtained the above sample by opening 'Clients.dbf' in Excel and then saving as '.csv'. It works fine with delimiter in fields when quoted (i.e. "101; Oakland St" is posted as 101; Oakland St). But I'm not against TTextData. Parsing/importing .csv can be achieved this way or that way. I posted the above code to get feedback from Margus to understand what the code is supposed to do besides impoting data.Cannell
As I understand it when the decimal separator is a comma, generally a semicolon is used as a value separator (link). Excel uses the 'List separator' in 'Regional Options' (link).Cannell
@Sertac, I didn't say anything about decimal separator. I specifically said '.csv', which stands for 'comma-separated values'. Since @Eclectic said nothing about the data (decimal separators, etc.), I went with the most generic solution. Not saying your answer is wrong (I didn't down-vote, after all); just pointing out some of the problems with trying to let TStringList do the parsing for you. :)Poolroom
@Ken - A significant part of the world (including my country, I don't know where Margus is :)) uses a semicolon as a value separator in '.csv' files. Wikipedia: "although in locales where the comma is used as a decimal separator, the semicolon is used instead as a delimiter". Currently I see no problem with my approach apart from a line extending to the next. And, I believe, that wouldn't grant a downvote. :)Cannell
@Ken - Just looked at the code for TTextDataSet. It loads the file to a TStringList, and treats each line a record.Cannell
@Sertac: I did NOT down-vote you, just to make things clear, like I said in my last comment. See "(I **didn't down-vote, after all)". As far as the source for TTextDataSet, that's all the more reason to use it; it already handles the loading and parsing, but lets you just access like normal TFields instead of doing the work yourself. And the OP has the CDS, with data and indices already.Poolroom
Downvoted for suggesting CSV parsing with TStringList, which I have said repeatedly is a terrible idea.Fahey
@Ken - Thanks, I know you would inform about a downvote if you'd change your mind for any reason. :)Cannell
@Warren - Thanks for stating the reason, since I didn't know about your repetitions I couldn't guess.. :) I attached working code to the question..., incidentally I'm trying to make the 'CsvDataDemo' work, which throws an exception with the included sample .csv, .. a quoted comma in a field value. .. Which the code in this answer handles just fine. And what an exception, cannot recover gracefully, have to "reset" the debug session. Anyway, thanks again for informing..Cannell
Even if you get a demo working, you should never make a production system that relies on TStringList as a CSV parser, it will fail.Fahey
Sertac: that's a regression in the latest then. It has always worked for me.Fahey
@Warren - Works fine.. For whatever reason the demo project has hardcoded paths for 'JvCsvData' and 'JvCsvParse', thus I was not using the latest sources actually. Removing these units from the .dpr made the demo work. BTW, please expand on when will "TStringList as a CSV parser will fail".Cannell
I have written extensively about it, and it's too big to post here. I will try to use your example code to make a failure case, to make it clearer.Fahey
Your example code, for instance, is not using TStringList to do the line-breaking, you're using the ancient ReadLn to do that. What would you suggest for someone who is using UTF8 delimited text files? You can't use Textfile+ReadLn to read that file. The problem of doing the line-breaks is itself non-trivial, and then following after that is the problem of breaking up the fields, using proper delimited text escape or quote character processing, as required.Fahey
@Warren - No one mentioned a UTF8 csv. Besides I didn't ask about it. I asked about what's wrong with parsing csv with a string list, as it is your downvote reason. A string list is perfectly capable handling 'line-breaks', 'quote escape', 'quote character processing' and 'breaking up fields'. C'mon, you've been repeatedly telling and extensively writing about it. Just give one concrete reason/example. All I want is to learn why I shouldn't be using a string list to parse csv.Cannell
First problem is the line splitting problem with embedded carriage-return linefeeds, which only affects people who have embedded cr/lf or other special formatting. Second problem is delimited text quoting rules in general, so semicolons may appear as a separator, or inside the body of a field. Third problem is TStringList splitting bugs, which are rare in test data, and not so rare in the real world.Fahey
@Warren - "First problem", I admitted this code would not handle that case in a comment to Ken. I'd be very surprised that this would be a problem though.. - "Second problem", What's all that got to do with TStringList?. You're telling general difficulties of csv parsing as a cause of not using a string list? - "Third problem", I have never heard of it, can you give an example? I'd like to find out how these splitting bugs would effect csv parsing.Cannell
Great! All I know is now is that, using a TStringList to parse csv is a terrible idea because of TStringList splitting bugs, which I have no clue about what are they..Cannell
It turns out I have stated some incorrect things in comments above. I appreciate being corrected. Thanks everybody. TStringList in Delphi 2010 and XE is not too bad at CSV parsing at all. TStringList prior to Delphi 2010, is terrible at parsing CSVs. This distinction is VERY important, and I was mistaken before in widely panning TStringList for CSV parsing.Fahey
P
2

AFAIK, there's no direct way to load .csv data into a TClientDataset.

The easiest way I can think of would be to use the TTextDataSet (found in Demos\Delphi\Database\TextData, available from Start->All Programs->Embarcadero RAD Studio XE->Samples). You can use it just like any other TDataSet, meaning you can read from it's Fields or use FieldByName, and it supports Bof, Eof, Next, and Prior.

You can simply iterate through and try to assign to your CDS columns, and it will generate errors you can then handle or log.

You can install TTextDataset like any other component, or just add the unit to the uses clause and create it at runtime. There's a readme.htm file in the folder that doesn't explain much; the key properties are FileName and Active. :)

It includes both a pre-designed package (TextPkg.dproj) and a test app (TextTest.dproj). There's also a project group (TextDataGroup.groupproj) - you can simply open this in the IDE, build and install the TextPkg package, and then compile and run the test app. The source for the test app shows usage pretty well.

Poolroom answered 20/6, 2011 at 23:9 Comment(0)
P
1

In the off-chance that your database is DBISAM, you can simply use the IMPORT SQL statement.

import table "tablename" from "myinputfile.csv" Delimiter ',';

Other databases may have a similar feature.

Pickle answered 21/6, 2011 at 15:21 Comment(2)
MySQL has a load data infile see: dev.mysql.com/doc/refman/5.5/en/load-data.htmlHinny
Nice, but CDS is TClientDataSet, which isn't DBISAM. :) It also isn't MySQL.Poolroom

© 2022 - 2024 — McMap. All rights reserved.