I am trying to train a UNET model on the cityscapes dataset which has 20 'useful' semantic classes and a bunch of background classes that can be ignored (ex. sky, ego vehicle, mountains, street lights). To train the model to ignore these background pixels I am using the following popular solution on the internet :
- I assign a common
ignore_label
(ex:ignore_label=255
) for all the pixels belonging to the ignore classes - Train the model using the
cross_entropy
loss for each pixel prediction - Provide the
ignore_label
parameter in thecross_entropy
loss, therefore the loss computed ignores the pixels with the unnecessary classes.
But this approach has a problem. Once trained, the model ends up classifying these background pixels as belonging to one of the 20 classes instead. This is expected as in the loss we do not penalize the model for whatever classification it makes for the background pixels.
The second obvious solution is therefore to use a extra class for all the background pixels. Therefore it is the 21st class in cityscapes. However, here I am worried that I will 'waste' my model's capacity by teaching it to classify this additional unnecessary class.
What is the most accurate way of handling the background pixel classes ?