ErrorColumn value does not exist as Lineage ID
Asked Answered
O

3

6

During the insert into a destination table, any error that occurs is redirected to Errors table where we can see the ErrorCode and ErrorColumn. The problem is that we got a value in ErrorColumn which does not exist anywhere within the package. Namely, there is not a single column that has LineageID that is equal to ErrorColumn.

Later, while enabling NULL entry in every single column, one by one, I found which column caused the problem. When I analyzed the column inside of a Data Flow task it did not have the LineageID that was reported in an ErrorColumn. For example, the ErrorColumn reported was 413, but LineageID is 84 during the first merge, and it changes during various sortings. Regardless of that, it never becomes 413. This ErrorColumn ID (413) does not exist at all yet it is reported during the error redirection insert into destination source (Error table).

I've checked many sites, but they all suggested enumerating through ComponenteMetaData.InputCollection or ComponentMetaData.OutputCollection in a script task, and then enumerating through columns in order to find LineageID, but it was without any success.

As I said, I've solved the problem, but as we are in the early stage of ETL process, this might happen in some other cases. How can this problem be tackled?

Overflow answered 3/12, 2013 at 12:49 Comment(0)
D
7

I'm copying my answer so we can get an authoritative Q&A on the site

What is the simple way to find the column name from Lineageid in SSIS

I remember saying this can't be that hard, I can write some script in the error redirect to lookup the column name from the input collection.

string badColumn = this.ComponentMetaData.InputCollection[Row.ErrorColumn].Name;

What I learned was the failing column isn't in that collection. Well, it is but the ErrorColumn reported is not quite what I needed. I couldn't find that package but here's an example of why I couldn't get what I needed. Hopefully you will have better luck.

This is a simple data flow that will generate an error once it hits the derived column due to division by zero. The Derived column generates a new output column (LookAtMe) as the result of the division. The data viewer on the Error Output tells me the failing column is 73. Using the above script logic, if I attempted to access column 73 in the input collection, it's going to fail because that is not in the collection. LineageID 73 is LookAtMe and LookAtMe is not in my error branch, it's only in the non-error branch.

Basic data flow

This is a copy of my XML and you can see, yes, the outputColumn id 73 is LookAtme.

<outputColumn id="73" name="LookAtMe" description="" lineageId="73" precision="0" scale="0" length="0" dataType="i4" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Computation" errorRowDisposition="RedirectRow" truncationRowDisposition="RedirectRow" externalMetadataColumnId="0" mappedColumnId="0"><properties>

I really wanted that data though and I'm clever so I can union all my results back together and then conditional split it back out to get that. The problem is, Union All is an asynchronous transformation. Async transformations result in the data being copied from one set of butters to another resulting in...new lineage ids being assigned so even with a union all bringing the two streams back together, you wouldn't be able to call up the data flow chain to find that original lineage id because it's in a different buffer.

Around this point, I conceded defeat and decided I could live without intelligent/helpful error reporting in my packages.

2012

The 2012 release of SSIS changed how they used LineageID to keep columns in sync. Instead of components mapping a number from a source to sink, they went with a textual representation of the element. The XML referenced above would now look like

 <outputColumn
 refId="Package\DFT Generate Errors\DER Divide by SomeNumber.Outputs[Derived Column Output].Columns[LookAtMe]"
 dataType="i4"
 errorOrTruncationOperation="Computation"
 errorRowDisposition="RedirectRow"
 lineageId="Package\DFT Generate Errors\DER Divide by SomeNumber.Outputs[Derived Column Output].Columns[LookAtMe]"
 name="LookAtMe"
 truncationRowDisposition="FailComponent">

If you look at the ErrorColumn now, they don't even reference the textual lineageid. Instead, they reference column 6. If I search through the source XML, I'm not going to find a reference to Column 6 anywhere. It must be some run-time magic.

enter image description here

The net result unfortunately is the same - you cannot access the error column because it's being created in this component because it only exists in the Output columns collection. It isn't available in the Error Columns collection.

Diamagnetism answered 3/12, 2013 at 13:51 Comment(4)
I like you response, but the outputColumn that you show in your example will not be of that format in SSIS 2012. LineageID is not a numeric value any more, but rather a string value that looks like a path. Anyway thanks for the response, but there must be a way to identify the problematic column.Overflow
Agreed that the XML was dated so I updated the answer to address 2012. I look forward to you solving the problem and posting an answer as I have yet to discover it despite repeated attempts.Diamagnetism
Hi Downvoter (@TheOptimusPrimus) Can you help me understand what about this answer is not useful as the DV button indicates so that I may improve the answer?Diamagnetism
Since this is the highest voted answer at this point, I'd like to point the a solution I found in this answerCalciferol
C
2

I realize this is a late answer, but I'm new to the SSIS game. I have created a solution that

  1. Works at package runtime (not pre-populating)
  2. Is automated through a Script Task and Component
  3. Doesn't involve installing new assemblies or custom components
  4. Is nicely BIML compatible

Check out the full solution here.

EDIT

Here is the short version.

  1. Create 2 Object variables, execsObj and lineageIds
  2. Create Script Task in Control flow, give it ReadWrite access to both variables
  3. Insert the following code into your Script Task
Dictionary<int, string> lineageIds = null;

public void Main()
{
    // Grab the executables so we have to something to iterate over, and initialize our lineageIDs list
    // Why the executables?  Well, SSIS won't let us store a reference to the Package itself...
    Dts.Variables["User::execsObj"].Value = ((Package)Dts.Variables["User::execsObj"].Parent).Executables;
    Dts.Variables["User::lineageIds"].Value = new Dictionary<int, string>();
    lineageIds = (Dictionary<int, string>)Dts.Variables["User::lineageIds"].Value;
    Executables execs = (Executables)Dts.Variables["User::execsObj"].Value;

    ReadExecutables(execs);

    Dts.TaskResult = (int)ScriptResults.Success;
}

private void ReadExecutables(Executables executables)
{
    foreach (Executable pkgExecutable in executables)
    {
        if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.TaskHost)))
        {
            TaskHost pkgExecTaskHost = (TaskHost)pkgExecutable;
            if (pkgExecTaskHost.CreationName.StartsWith("SSIS.Pipeline"))
            {
                ProcessDataFlowTask(pkgExecTaskHost);
            }
        }
        else if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.ForEachLoop)))
        {
            // Recurse into FELCs
            ReadExecutables(((ForEachLoop)pkgExecutable).Executables);
        }
    }
}

private void ProcessDataFlowTask(TaskHost currentDataFlowTask)
{
    MainPipe currentDataFlow = (MainPipe)currentDataFlowTask.InnerObject;
    foreach (IDTSComponentMetaData100 currentComponent in currentDataFlow.ComponentMetaDataCollection)
    {
        // Get the inputs in the component.
        foreach (IDTSInput100 currentInput in currentComponent.InputCollection)
            foreach (IDTSInputColumn100 currentInputColumn in currentInput.InputColumnCollection)
                lineageIds.Add(currentInputColumn.ID, currentInputColumn.Name);

        // Get the outputs in the component.
        foreach (IDTSOutput100 currentOutput in currentComponent.OutputCollection)
            foreach (IDTSOutputColumn100 currentoutputColumn in currentOutput.OutputColumnCollection)
                lineageIds.Add(currentoutputColumn.ID, currentoutputColumn.Name);
    }
}

4. Create Script Component in Dataflow with ReadOnly access to lineageIds and the following code.

public override void Input0_ProcessInputRow(Input0Buffer Row)
  {
      Dictionary<int, string> lineageIds = (Dictionary<int, string>)Variables.lineageIds;

      int? colNum = Row.ErrorColumn;
      if (colNum.HasValue && (lineageIds != null))
      {
          if (lineageIds.ContainsKey(colNum.Value))
              Row.ErrorColumnName = lineageIds[colNum.Value];

          else
              Row.ErrorColumnName = "Row error";
      }
      Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);
  }
Calciferol answered 15/9, 2015 at 1:43 Comment(2)
Some of this goes away with the 2016 release of SSIS but I haven't decided whether it's going to be more hassle than something like you've proposed.Diamagnetism
Definitely a solid approach but to fit SO community standards, you might want to see if you can incorporate some of the link material in here. Theory being if the link goes dead, so too the usefulness of the answer to future readersDiamagnetism
G
1

Right click on CSV component (and many other components) and select "show advanced editor". Go to the last tab, "input and output properties"

On this screen look in the tree menu for you output columns (NOT(!) external columns). These have the lineage id in the "common properties"

There is a lot of jabber that this is hard to find, impossible to find without re-writting the code of visual studio itself and what not but i just checked this on both 2010 and 2012 on different systems and it isnt any harder then what i just described.

Goodden answered 5/3, 2014 at 9:5 Comment(2)
I know this method, but I need to do this through Script Component so that I can add column name in the error table in a database. Manually doing this is not a solution for my project.Overflow
The point is that the LineageId that you see in this view does not correspond to the ErrorColumn reported by the Error data flowMetaphrast

© 2022 - 2024 — McMap. All rights reserved.