Assuming your source data is in the correct order, and you want the last record in each set, there aren't any out-of-the-box transformations that would handle this situation. However, a Script Transformation can deal with it fairly easily.
Here's a sample Data Flow:
I'm using FF_SRC_AllRows
and FF_DST_SelectedRows
as flat file source and destination (respectively) for simplicity's sake, using the sample data you provided; your specific needs will vary. The script transformation SCR_SelectLastRow
is configured as a transformation (both input and output):
Select all your input columns (with usage type ReadOnly
):
Create one output (I named mine OutgoingRows
, but you can name it whatever you wish), and set the SynchronousInputID
property to None
. This will let your script filter out the rows you don't want.
Add output columns corresponding to the input columns:
And use code along these lines:
/* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/
using System;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
class IncomingRowData
{
public string Name1;
public string Name2;
public string Name3;
public string Col;
public IncomingRowData(IncomingRowsBuffer Row)
{
Name1 = Row.Name1;
Name2 = Row.Name2;
Name3 = Row.Name3;
Col = Row.Col;
}
public bool KeysDiffer(IncomingRowData other)
{
return (Name1 != other.Name1
|| Name2 != other.Name2
|| Name3 != other.Name3);
}
public void WriteToOutputBuffer(OutgoingRowsBuffer Row)
{
Row.AddRow();
Row.Name1 = Name1;
Row.Name2 = Name2;
Row.Name3 = Name3;
Row.Col = Col;
}
}
private IncomingRowData _previousRow;
public override void IncomingRows_ProcessInputRow(IncomingRowsBuffer Row)
{
if (_previousRow == null)
{
_previousRow = new IncomingRowData(Row);
}
IncomingRowData currentRow = new IncomingRowData(Row);
if (currentRow.KeysDiffer(_previousRow))
{
_previousRow.WriteToOutputBuffer(this.OutgoingRowsBuffer);
}
_previousRow = currentRow;
}
public override void FinishOutputs()
{
if (_previousRow != null)
{
_previousRow.WriteToOutputBuffer(this.OutgoingRowsBuffer);
}
base.FinishOutputs();
}
}
One nice thing about this technique is that it allows you to handle the data in a single pass and requires neither the use of staging tables nor keeping the entire source data set in memory. Depending on how big your data set is, either of those could cause significant performance issues.