I'm using ML.NET to predict a series of values using a regression model. I am only interested in one column being predicted (the score column). However, the values of some of the other columns are not available for the prediction class. I can't leave them at 0 as this would upset the prediction, so I guess they would have to also be predicted.
I saw a similar question here on predicting multiple values. The answer suggests creating two models, but I can see that the feature columns specified in each model do not include the label column of the other model. So this implies that those columns would not be used when making the prediction. Am I wrong, or should the label column of each model also be included in the feature column of the other model?
Here's some example code to try and explain in code:
public class FooInput
{
public float Feature1 { get; set; }
public float Feature2 { get; set; }
public float Bar {get; set; }
public float Baz {get; set; }
}
public class FooPrediction : FooInput
{
public float BarPrediction { get; set; }
public float BazPrediction { get; set; }
}
public ITransformer Train(IEnumerable<FooInput> data)
{
var mlContext = new MLContext(0);
var trainTestData = mlContext.Data.TrainTestSplit(mlContext.Data.LoadFromEnumerable(data));
var pipelineBar = mlContext.Transforms.CopyColumns("Label", "Bar")
.Append(mlContext.Transforms.CopyColumns("Score", "BarPrediction"))
.Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Baz"))
.Append(mlContext.Regression.Trainers.FastTree());
var pipelineBaz = mlContext.Transforms.CopyColumns("Label", "Baz")
.Append(mlContext.Transforms.CopyColumns("Score", "BazPrediction"))
.Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Bar"))
.Append(mlContext.Regression.Trainers.FastTree());
return pipelineBar.Append(pipelineBaz).Fit(trainTestData.TestSet);
}
This is effectively the same as the aforementioned answer, but with the addition of Baz
as a feature for the model where Bar
is to be predicted, and conversely the addition of Bar
as a feature for the model where Baz
is to be predicted.
Is this the correct approach, or does the answer on the other question achieve the desired result, being that the prediction of each column will utilise the values of the other predicted column from the loaded dataset?
ReplaceMissingValues
only seems to insert default, min, max or mean average values rather then attempt to predict them. Does the imputation technique I coded not insert the missing values more predictively using ML? Also, I've tried to apply a similar technique to a time series model, but having a problem. Perhaps you could see my other question regarding this. – Introrse