How to return transformed data from an ML.Net pipeline before a predictor is applied
Asked Answered
H

1

7

Here is the creation of the ML.Net pipeline object copied from the TaxiFarePrediction example.

        LearningPipeline pipeline = new LearningPipeline
        {
            new TextLoader(TrainDataPath).CreateFrom<TaxiTrip>(separator:','),
            new ColumnCopier(("FareAmount", "Label")),
            new CategoricalOneHotVectorizer("VendorId","RateCode","PaymentType"),
            new ColumnConcatenator("Features","VendorId","RateCode","PassengerCount","TripDistance","PaymentType"),
            new FastTreeRegressor()
        };

Essentially, I'd like to return the data after the ColumnCopier, the CategoricalOneHotVectorizer and the ColumnConcatenator have been applied.

Hirsch answered 21/9, 2018 at 17:57 Comment(7)
Why not make the regressor null and retrieve data within the pipeline? (Didn't try this, I'm just wondering)Decanter
@KevinAvignon How does one retrieve the data within the pipeline?Hirsch
so in other words, you want pipleline to apply transformations on your data and return to you a new transformed dataset?Stegman
@YuraZaletskyy Yes, I'd like to return the transformed dataset.Hirsch
@Gaspare, did you try my answer?Stegman
@YuraZaletskyy not yet. will do later todayHirsch
@YuraZaletskyy Your answer worked but due to the fact that the LearningPipelineDebugProxy class is internal sealed, anything more than the first 10 rows can't be viewed.Hirsch
S
1

For visualization in debugger Microsoft programmed class

LearningPipelineDebugProxy

That class has two fields which are quite informative: Rows and Columns. And of course class that is intended for debugging is not very easy to create because it is internal sealed:

namespace Microsoft.ML
{
  /// <summary>
  /// The debug proxy class for a LearningPipeline.
  /// Displays the current columns and values in the debugger Watch window.
  /// </summary>
  internal sealed class LearningPipelineDebugProxy
  {

according to source code. In such cases if debugger visualization is not something that is enough I use reflection. In order to create instance of LearningPipelineDebugProxy in TaxiTrip instance I've used following tricks:

  1. Created instance of LearningPipelineDebugProxy via CreateInstance method
  2. Added method that via reflection gets needed field.

In code that fragment looked like this:

    public static PredictionModel<TaxiTrip, TaxiTripFarePrediction> Train()
    {
        var pipeline = new LearningPipeline();
        pipeline.Add(new TextLoader(_datapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ','));


        Type obj = AppDomain.CurrentDomain.GetAssemblies().SelectMany(t => t.GetTypes()).
            Where(t => String.Equals(t.Name, "LearningPipelineDebugProxy", StringComparison.Ordinal)).First();


        var instObject = Activator.CreateInstance(obj, new []{pipeline});

        pipeline.Add(new ColumnCopier(("FareAmount", "Label")));
        pipeline.Add(new CategoricalOneHotVectorizer("VendorId", "RateCode", "PaymentType"));
        pipeline.Add(new ColumnConcatenator("Features", "VendorId", "RateCode", "PassengerCount", "TripDistance", "PaymentType"));

        var rws = GetPropValue(instObject, "Rows");
        var clms = GetPropValue(instObject, "Columns");

        pipeline.Add(new FastTreeRegressor());

        PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = pipeline.Train<TaxiTrip, TaxiTripFarePrediction>();
        return model;
    }

    public static object GetPropValue(object src, string propName)
    {
        return src.GetType().GetProperty(propName).GetValue(src, null);
    }

In debugger window and not only in debugger Rows become available:

rows in ML Net

Stegman answered 25/9, 2018 at 11:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.