Specify columns as both feature and label in multiple combined regression models (ML.NET)
Asked Answered
I

1

2

I'm using ML.NET to predict a series of values using a regression model. I am only interested in one column being predicted (the score column). However, the values of some of the other columns are not available for the prediction class. I can't leave them at 0 as this would upset the prediction, so I guess they would have to also be predicted.

I saw a similar question here on predicting multiple values. The answer suggests creating two models, but I can see that the feature columns specified in each model do not include the label column of the other model. So this implies that those columns would not be used when making the prediction. Am I wrong, or should the label column of each model also be included in the feature column of the other model?

Here's some example code to try and explain in code:

public class FooInput
{
    public float Feature1 { get; set; }
    public float Feature2 { get; set; }
    public float Bar {get; set; }
    public float Baz {get; set; }
}

public class FooPrediction : FooInput
{
    public float BarPrediction { get; set; }
    public float BazPrediction { get; set; }
}

public ITransformer Train(IEnumerable<FooInput> data)
{
    var mlContext = new MLContext(0);
    var trainTestData = mlContext.Data.TrainTestSplit(mlContext.Data.LoadFromEnumerable(data));

    var pipelineBar = mlContext.Transforms.CopyColumns("Label", "Bar")
        .Append(mlContext.Transforms.CopyColumns("Score", "BarPrediction"))
        .Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Baz"))
        .Append(mlContext.Regression.Trainers.FastTree());

    var pipelineBaz = mlContext.Transforms.CopyColumns("Label", "Baz")
        .Append(mlContext.Transforms.CopyColumns("Score", "BazPrediction"))
        .Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Bar"))
        .Append(mlContext.Regression.Trainers.FastTree());

    return pipelineBar.Append(pipelineBaz).Fit(trainTestData.TestSet);
}

This is effectively the same as the aforementioned answer, but with the addition of Baz as a feature for the model where Bar is to be predicted, and conversely the addition of Bar as a feature for the model where Baz is to be predicted.

Is this the correct approach, or does the answer on the other question achieve the desired result, being that the prediction of each column will utilise the values of the other predicted column from the loaded dataset?

Introrse answered 13/10, 2019 at 21:2 Comment(0)
D
1

One technique you can use is called "Imputation", which replaces these unknown values with some "guessed" value. Imputation is simply the process of substituting the missing values of our dataset.

In ML.NET, what you're looking for is the ReplaceMissingValues transform. You can find samples on learn.microsoft.com.

The technique you are discussing above is also a form of imputation, where your unknowns are replaced by predicting the value from the other known values. This can work as well. I guess I would try both forms and see what works best for your dataset.

Donoho answered 24/10, 2019 at 14:19 Comment(1)
Thanks for your answer (apologies for delay in replying). ReplaceMissingValues only seems to insert default, min, max or mean average values rather then attempt to predict them. Does the imputation technique I coded not insert the missing values more predictively using ML? Also, I've tried to apply a similar technique to a time series model, but having a problem. Perhaps you could see my other question regarding this.Introrse

© 2022 - 2024 — McMap. All rights reserved.