Recently I've been informed by a reputable SO user, that TStringList
has splitting bugs which would cause it to fail parsing CSV data. I haven't been informed about the nature of these bugs, and a search on the internet including Quality Central did not produce any results, so I'm asking. What are TStringList splitting bugs?
Please note, I'm not interested in unfounded opinion based answers.
What I know:
Not much... One is that, these bugs show up rarely with test data, but not so rarely in real world.
The other is, as stated, they prevent proper parsing of CSV. Thinking that it is difficult to reproduce the bugs with test data, I am (probably) seeking help from whom have tried using a string list as a CSV parser in production code.
Irrelevant problems:
I obtained the information on a 'Delphi-XE' tagged question, so failing parsing due to the "space character being considered as a delimiter" feature do not apply. Because the introduction of the StrictDelimiter
property with Delphi 2006 resolved that. I, myself, am using Delphi 2007.
Also since the string list can only hold strings, it is only responsible for splitting fields. Any conversion difficulty involving field values (f.i. date, floating point numbers..) arising from locale differences etc. are not in scope.
Basic rules:
There's no standard specification for CSV. But there are basic rules inferred from various specifications.
Below is demonstration of how TStringList handles these. Rules and example strings are from Wikipedia. Brackets ([
]
) are superimposed around strings to be able to see leading or trailing spaces (where relevant) by the test code.
Spaces are considered part of a field and should not be ignored.
Test string: [1997, Ford , E350] Items: [1997] [ Ford ] [ E350]
Fields with embedded commas must be enclosed within double-quote characters.
Test string: [1997,Ford,E350,"Super, luxurious truck"] Items: [1997] [Ford] [E350] [Super, luxurious truck]
Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.
Test string: [1997,Ford,E350,"Super, ""luxurious"" truck"] Items: [1997] [Ford] [E350] [Super, "luxurious" truck]
Fields with embedded line breaks must be enclosed within double-quote characters.
Test string: [1997,Ford,E350,"Go get one now they are going fast"] Items: [1997] [Ford] [E350] [Go get one now they are going fast]
In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters.
Test string: [1997,Ford,E350," Super luxurious truck "] Items: [1997] [Ford] [E350] [ Super luxurious truck ]
Fields may always be enclosed within double-quote characters, whether necessary or not.
Test string: ["1997","Ford","E350"] Items: [1997] [Ford] [E350]
Testing code:
var
SL: TStringList;
rule: string;
function GetItemsText: string;
var
i: Integer;
begin
for i := 0 to SL.Count - 1 do
Result := Result + '[' + SL[i] + '] ';
end;
procedure Test(TestStr: string);
begin
SL.DelimitedText := TestStr;
Writeln(rule + sLineBreak, 'Test string: [', TestStr + ']' + sLineBreak,
'Items: ' + GetItemsText + sLineBreak);
end;
begin
SL := TStringList.Create;
SL.Delimiter := ','; // default, but ";" is used with some locales
SL.QuoteChar := '"'; // default
SL.StrictDelimiter := True; // required: strings are separated *only* by Delimiter
rule := 'Spaces are considered part of a field and should not be ignored.';
Test('1997, Ford , E350');
rule := 'Fields with embedded commas must be enclosed within double-quote characters.';
Test('1997,Ford,E350,"Super, luxurious truck"');
rule := 'Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.';
Test('1997,Ford,E350,"Super, ""luxurious"" truck"');
rule := 'Fields with embedded line breaks must be enclosed within double-quote characters.';
Test('1997,Ford,E350,"Go get one now'#10#13'they are going fast"');
rule := 'In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters.';
Test('1997,Ford,E350," Super luxurious truck "');
rule := 'Fields may always be enclosed within double-quote characters, whether necessary or not.';
Test('"1997","Ford","E350"');
SL.Free;
end;
If you've read it all, the question was :), what are "TStringList splitting bugs?"