Unknown category '2' encountered. Set `add_nan=True` to allow unknown categories pytorch_forecasting

Asked 13/2, 2022 at 6:47 Answered 5/4, 2022 at 9:47

time-series transformer-model pytorch-forecasting

error: "Unknown category '2' encountered. Set add_nan=True to allow unknown categories" while creating time series dataset in pytorch forecasting.

training = TimeSeriesDataSet(
train,
time_idx="index",
target=dni,
group_ids=["Solar Zenith Angle", "Relative Humidity","Dew 
Point","Temperature","Precipitable Water", "Wind Speed"],
min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the 
validation set)
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
static_reals=["Wind Direction"],
time_varying_known_reals=["index", "Solar Zenith Angle", "Relative Humidity","Dew 
Point","Temperature","Precipitable Water"],
#     time_varying_unknown_categoricals=[],
time_varying_unknown_reals=[dhi,dni,ghi],
categorical_encoders={data.columns[2]: NaNLabelEncoder(add_nan=True)},
target_normalizer=GroupNormalizer(
    groups=["Solar Zenith Angle", "Relative Humidity","Dew 
Point","Temperature","Precipitable Water", "Wind Speed"], transformation="softplus"
),  # use softplus and normalize by group
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,

)

Buckshot answered 13/2, 2022 at 6:47 Comment(0)

Try adding pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True), as in this example:

max_prediction_length = 1
max_encoder_length = 27


training = TimeSeriesDataSet(
    sales_train,
    time_idx='dayofyear',
    target="QTT",
    group_ids=['S100','I100','C100','C101'],
    min_encoder_length=0,  
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    static_categoricals=[],
    static_reals=['S100','I100','C100','C101'],
    time_varying_known_categoricals=[],  
    time_varying_known_reals=['DATE'],
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=['DATE'],
    categorical_encoders={
        'S100': *pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),*
        'I100':pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
        'C100':pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
        'C101':pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True)
    },
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
    allow_missing_timesteps=True
   
    
)
print ('Executado')

Exordium answered 14/3, 2022 at 16:45 Comment(0)

Probably a numerical feature in your data set has a string type. When Pandas reads csv files, it treats all values as strings unless another type is defined.

In my case, I forgot to cast the target variable to a numerical type. The problem was solved immediately after changing the variable's type to np.float64.

I hope you find my experience useful.

Roborant answered 5/4, 2022 at 9:47 Comment(0)

Recommended topics

Hot tags