ML.NET, "Score Column" is missing
Asked Answered
C

1

3

I want to make my first app in ML.NET. I bet on Wisconsin Prognostic Breast Cancer Dataset. I generete .csv file by myself. One record of that file looks like this:

B;11.62;18.18;76.38;408.8;0.1175;0.1483;0.102;0.05564;0.1957;0.07255;0.4101;1.74;3.027;27.85;0.01459;0.03206;0.04961;0.01841;0.01807;0.005217;13.36;25.4;88.14;528.1;0.178;0.2878;0.3186;0.1416;0.266;0.0927

And it get 31 diffrent features (columns).

My CancerData.cs looks like this:

class CancerData
{

    [Column(ordinal: "0")]
    public string Diagnosis;

    [Column(ordinal: "1")]
    public float RadiusMean;

    [Column(ordinal: "2")]
    public float TextureMean;

    [Column(ordinal: "3")]
    public float PerimeterMean;

   //.........

   [Column(ordinal: "28")] 
    public float ConcavPointsWorst;

    [Column(ordinal: "29")]
    public float SymmetryWorst;

    [Column(ordinal: "30")]
    public float FractalDimensionWorst;

    [Column(ordinal: "31", name: "Label")]
    public string Label;
}

And CancerPrediction.cs

class CancerPrediction
{
    [ColumnName("PredictedLabel")]
    public string Diagnosis;

}

My Program.cs :

class Program
{

    static void Main(string[] args)
    {
        PredictionModel<CancerData, CancerPrediction> model = Train();
        Evaluate(model);
    }

    public static PredictionModel<CancerData, CancerPrediction> Train()
    {
        var pipeline = new LearningPipeline();
        pipeline.Add(new TextLoader("Cancer-train.csv").CreateFrom<CancerData>(useHeader: true, separator: ';'));
        pipeline.Add(new Dictionarizer(("Diagnosis", "Label")));
        pipeline.Add(new ColumnConcatenator(outputColumn: "Features",
            "RadiusMean",
            "TextureMean",
            "PerimeterMean",
            //... all of the features
            "FractalDimensionWorst"));
        pipeline.Add(new StochasticDualCoordinateAscentBinaryClassifier());
        pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });
        PredictionModel<CancerData, CancerPrediction> model = pipeline.Train<CancerData, CancerPrediction>();
        model.WriteAsync(modelPath);
        return model;

    }

    public static void Evaluate(PredictionModel<CancerData, CancerPrediction> model)
    {
        var testData = new TextLoader("Cancer-test.csv").CreateFrom<CancerData>(useHeader: true, separator: ';');
        var evaluator = new ClassificationEvaluator();
        ClassificationMetrics metrics = evaluator.Evaluate(model, testData);
        var accuracy = Math.Round(metrics.AccuracyMicro, 2);
        Console.WriteLine("The accuracy is: " + accuracy);
        Console.ReadLine();
    }
}

What i get, is:

ArgumentOutOfRangeException: Score column is missing

On ClassificationMetrics metrics = evaluator.Evaluate(model, testData); method.

When i add Score Column in CancerPrediction, i still get the same exception.

I saw that someone have the same problem on StackOverflow but it looks like it is without answer and i cant make a comment on it because i dont have enough reputation. Is it a bug? maybe my data is not prepared properly? Im using ML.NET in ver. 0.5.0

Thanks for any advices!

EDIT1:

When i add into CancerPrediction.cs that line:

class CancerPrediction
{
    [ColumnName("PredictedLabel")]
    public string PredictedDiagnosis;

    [ColumnName("Score")]
    public string Score; // => new column!
}

I get an exception:

System.InvalidOperationException: 'Can't bind the IDataView column 'Score' of type 'R4' to field or property 'Score' of type 'System.String'.'

in line:

PredictionModel<CancerData, CancerPrediction> model = pipeline.Train<CancerData, CancerPrediction>();

EDIT2

How it looks:

enter image description here

EDIT3

Change Separator to ',' and load original dataset not prepered by me it still yelling, taht there is no Score, so annoying

Contact answered 14/9, 2018 at 15:29 Comment(2)
I believe the Score column needs to be a float, which may be why you're getting the second exception.Chute
@Chute still the same Score not existingContact
S
3

I believe I know what the problem is.

You are using a StochasticDualCoordinateAscentBinaryClassifier, which is a binary classifier.

You are trying to evaluate results using ClassificationEvaluator, which is a multiclass classification evaluator.

I suggest you use BinaryClassificationEvaluator to evaluate binary classifier models.

The exact problem is follows: the evaluator expects the column 'Score' to be a vector column that contains a score for every class. What it finds is the 'Score' column which is a scalar (just the score of the positive class).

So it throws with somewhat convoluted message

Score column is missing

Shaylashaylah answered 14/9, 2018 at 20:20 Comment(3)
Thanks! Now it works. Its a shame that is so lack information about that framework, even in documentationContact
@Contact we are working on the new API which would be somewhat friendlier to the user in error messages. Actually, this particular issue will probably turn into a compile-time error.Shaylashaylah
To be honest, its my fault that i dont go deeper into documentation because its feels like lack of my experience in ML at all. Im so excited about new features of ML.NET, especially API for Keras/TF or another implementation with CNN to play with imaged ;)Contact

© 2022 - 2024 — McMap. All rights reserved.