SSIS does 2 things in relation to handling flat files which are particularly frustrating, and it seems there should be a way around them, but I can't figure it out. If you define a flat file with 10 columns, tab delimited with CRLF as the end of row marker this will work perfectly for files where there are exactly 10 columns in every row. The 2 painful scenarios are these:
If someone supplies a file with an 11th column anywhere, it would be nice if SSIS simply ignored it, since you haven't defined it. It should just read the 10 columns you have defined then skip to the end of row marker, but what is does instead is concatenate any additional data with the data in the 10th column and bung all that into the 10th column. Kind of useless really. I realise this happens because the delimiter for the 10th column is not tab like all the others, but CRLF, so it just grabs everything up to the CRLF, replacing extra tabs with nothing as it does so. This is not smart, in my opinion.
If someone supplies a file with only 9 columns something even worse happens. It will temporarily disregard the CRLF it has unexpectedly found and pad any missing columns with columns from the start of the next row! Not smart is an understatement here. Who would EVER want that to happen? The remainder of the file is garbage at that point.
It doesn't seem unreasonable to have variations in file width for whatever reason (of course only variations at the end of a row can reaonably be handled (x fewer or extra columns) but it looks like this is simply not handled well, unless I'm missing something.
So far our only solution to this is to load a row as one giant column (column0) and then use a script task to dynamically split it using however many delimiters it finds. This works well, except that it limits row widths to 4000 chars (the max width of one unicode column). If you need to import a wider row (say with multiple 4000 wide columns for text import) then you need to define multiple columns as above, but you are then stuck with requiring a strict number of columns per row.
Is there any way around these limitations?