How to convert 35 classes of cityscapes dataset to 19 classes?

Asked 18/6, 2019 at 13:23 Answered 7/10, 2020 at 11:21

Solved computer-vision pytorch image-segmentation

The following is a small snippet of my code. Using this, I can train my model called 'lolnet' on cityscapes dataset. But the dataset contains 35 classes/labels [0-34].

imports ***

trainloader = torch.utils.data.DataLoader(
    datasets.Cityscapes('/media/farshid/DataStore/temp/cityscapes/', split='train', mode='fine',
                    target_type='semantic', target_transform =trans,
                    transform=input_transform ), batch_size = batch_size, num_workers = 2)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = lolNet()
criterion = CrossEntropyLoss2d()

net.to(device)
num_of_classes = 34

for epoch in range(int(0), 200000):

    lr = 0.0001

    for batch, data in enumerate(trainloader, 0):

        inputs, labels = data
        labels = labels.long()
        inputs, labels = inputs.to(device), labels.to(device)

        labels = labels.view([-1, ])

        optimizer = optim.Adam(net.parameters(), lr=lr)

        optimizer.zero_grad()
        outputs = net(inputs)

        outputs = outputs.view(-1, num_of_class)


        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        outputs = outputs.to('cpu')
        outputs = outputs.data.numpy()
        outputs = outputs.reshape([-1, num_of_class])

        mask = np.zeros([outputs.shape[0]])
        #
        for i in range(len(outputs)):
            mask[i] = np.argmax(outputs[i])

        mask = mask.reshape([-1, 1])

        IoU = jaccard_score(labels.to('cpu').data, mask, average='micro')

But I want to train my model only on the 19 classes. These 19 classes are found here . The labels to train for are stored as "ignoreInEval" = True. This pytorch Dataloader helper for this dataset doesnt provide any clue.

So my question is how can I train my model on the desired 19 classes of this dataset using pytorch's "datasets.Cityscapes" api.

Ambi answered 18/6, 2019 at 13:23 Comment(7)

can yo provide print(net) – Articulation 18/6, 2019 at 14:37

The input for the net is (batch, 3, 256, 256) output shape is [1, 256*256, num_class] – Ambi 18/6, 2019 at 15:7

OK, is this a pretrained net net = lolNet() – Articulation 18/6, 2019 at 15:9

its a basic resnet – Ambi 18/6, 2019 at 15:10

So it is derived from resnet, and it has the architecture of some resnet . – Articulation 18/6, 2019 at 15:12

its a Resnet50. From pytorch – Ambi 18/6, 2019 at 15:25

Hi, Some months ago I put an answer down here. I believe you don't have this issue anymore, but if you think it is right, please accept so that it can be useful for others. – Pembroke 4/1, 2021 at 23:24

It's been a time, but leaving an answer as can be useful for others:

Firstly create a mapping to 19 classes + background. Background is related to not so important classes with ignore flag as said here.

# Mapping of ignore categories and valid ones (numbered from 1-19)
    mapping_20 = { 
        0: 0,
        1: 0,
        2: 0,
        3: 0,
        4: 0,
        5: 0,
        6: 0,
        7: 1,
        8: 2,
        9: 0,
        10: 0,
        11: 3,
        12: 4,
        13: 5,
        14: 0,
        15: 0,
        16: 0,
        17: 6,
        18: 0,
        19: 7,
        20: 8,
        21: 9,
        22: 10,
        23: 11,
        24: 12,
        25: 13,
        26: 14,
        27: 15,
        28: 16,
        29: 0,
        30: 0,
        31: 17,
        32: 18,
        33: 19,
        -1: 0
    }

Then for each label image (the gray images where each pixel contains a class, which has pattern "{city}__{number}_{number}_gtFine_labelIds.png") that you load for training, run function below.

It will convert each pixel according to mapping above and your label images (masks) will have now only 20 (19 classes + 1 background) different values, instead of 35.

def encode_labels(mask):
    label_mask = np.zeros_like(mask)
    for k in mapping_20:
        label_mask[mask == k] = mapping_20[k]
    return label_mask

Then you can train your model normally with these new number of classes.

Pembroke answered 7/10, 2020 at 11:21 Comment(2)

what do you think @dpetrini, is it better to use CrossEntropy(ignore=bg_class) or to train the network to output bg_class correctly? I am unsure – Hebetate 30/6 at 11:9

I prefer to not ignore. But you can always train model in both situations and compare. Idea is always maximize accuracy. – Pembroke 2/7 at 17:1

You download the model and the weights.

import torch
import torch.nn as nn
import torchvision.models as models

r = models.resnet50(pretrained=True)

Note that original resent has 1000 categories/classes. So when you download pretrained model that last fc will be for 1000 classes.

Here is the forward() method you have, and above that code is your model.

You can remove the last fc fully connected layer from the original resnet50 model and add your new fc with exactly 19 classes (19 outputs) and you can train the classifier only for that last layer. The other layers, except that last should be frozen.

So you will learn just the 19 classes you need.

Note the resent __init__ method may also take the number of classes so you may try that, but in this case you cannot load the pretrained weights so you need to use pretrained=False and you need to train from scratch.

import torch
import torch.nn as nn
import torchvision.models as models

r = models.resnet50(num_classes=19, pretrained=False)

Articulation answered 18/6, 2019 at 15:38 Comment(2)

The cityscapes dataset has 35 classes. I need to train the model on 19 of them. (see the links). Changing the output is not the problem. – Ambi 18/6, 2019 at 15:43

The question was how do I get train my model on the specific 19 classes of the dateset. – Ambi 18/6, 2019 at 15:44

Recommended topics

Hot tags