How to implement multi-class semantic segmentation?

Asked 10/5, 2017 at 18:27 Answered 10/10, 2018 at 16:51

python machine-learning deep-learning keras image-segmentation

I'm able to train a U-net with labeled images that have a binary classification.

But I'm having a hard time figuring out how to configure the final layers in Keras/Theano for multi-class classification (4 classes).

I have 634 images and corresponding 634 masks that are unit8 and 64 x 64 pixels.

My masks, instead of being black (0) and white (1), have color labeled objects in 3 categories plus background as follows:

black (0), background
red (1), object class 1
green (2), object class 2
yellow (3), object class 3

Before training runs, the array containing masks is one-hot encoded as follows:

mask_train = to_categorical(mask_train, 4)

This makes mask_train.shape go from (634, 1, 64, 64) to (2596864, 4).

My model closely follows the Unet architecture, however the final layers seem problematic, as I'm unable to flatten the structure so as to match the one-hot encoded array.

[...]
up3 = concatenate([UpSampling2D(size=(2, 2))(conv7), conv2], axis=1)
conv8 = Conv2D(128, (3, 3), activation='relu', padding='same')(up3)
conv8 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv8)

up4 = concatenate([UpSampling2D(size=(2, 2))(conv8), conv1], axis=1)
conv9 = Conv2D(64, (3, 3), activation='relu', padding='same')(up4)
conv10 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv9)

# here I used number classes = number of filters and softmax although
# not sure if a dense layer should be here instead
conv11 = Conv2D(4, (1, 1), activation='softmax')(conv10)

model = Model(inputs=[inputs], outputs=[conv11])

# here categorical cross entropy is being used but may not be correct
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])

return model

Do you have any suggestions on how to modify the final portions of the model so this trains successfully? I get a variety of shape mismatch errors, and the few times I managed to make it run, the loss did not change throughout epochs.

Thacker answered 10/5, 2017 at 18:27 Comment(0)

You should have your target as (634,4,64,64) if you're using channels_first.
Or (634,64,64,4) if channels_last.

Each channel of your target should be one class. Each channel is an image of 0's and 1's, where 1 means that pixel is that class and 0 means that pixel is not that class.

Then, your target is 634 groups, each group containing four images, each image having 64x64 pixels, where pixels 1 indicate the presence of the desired feature.

I'm not sure the result will be ordered correctly, but you can try:

mask_train = to_categorical(mask_train, 4)
mask_train = mask_train.reshape((634,64,64,4)) 
#I chose channels last here because to_categorical is outputing your classes last: (2596864,4)

#moving the channel:
mask_train = np.moveaxis(mask_train,-1,1)

If the ordering doesn't work properly, you can do it manually:

newMask = np.zeros((634,4,64,64))

for samp in range(len(mask_train)):
    im = mask_train[samp,0]
    for x in range(len(im)):
        row = im[x]
        for y in range(len(row)):
            y_val = row[y]
            newMask[samp,y_val,x,y] = 1

Crenel answered 10/5, 2017 at 19:37 Comment(12)

I'm using the Theano back end, so that means channels first—do you think the final layers in my model appear correct? – Thacker 10/5, 2017 at 19:39

What defines if channels go first or last is "keras", not theano. The default is channels last. – Belorussia 10/5, 2017 at 19:41

yes but I have my Keras set up correctly for Theano in .keras.json, so my concern has turned to the model, since I'm unsure how it should be shaped at the final stages – Thacker 10/5, 2017 at 19:42

To see if your last layer is ok, do a model.summary() and see if its output is (None,64,64,4). It seems correct, since you have 4 filters it will give you four channels, but I can't say the convolutions will result in (64,64). If you use padding = 'same' in all layers, including the last, it will probably be ok. – Belorussia 10/5, 2017 at 19:43

interesting thought, thanks—.summary() shows (None, 4, 64, 64) for last layer... – Thacker 10/5, 2017 at 19:44

Ok, then you have channels_first indeed, and you just have to shape your mask_train exactly as in my answer :) – Belorussia 10/5, 2017 at 19:45

I'll try that now—are softmax and categorical_crossentropyadequate choices at the end of the model? – Thacker 10/5, 2017 at 19:47

If you use softmax, you have to use it in the right axis, adding a layer Softmax(axis=1). That's because you expect more than one 1 result). But unfortunately, I don't know which axis would be proper, I think it's the axis 1, which is the channels axis. (You want only one 1 among the four channels for the same pixel) – Belorussia 10/5, 2017 at 19:50

that's helpful, the classes are mutually exclusive (there should not be more than one 1 per pixel) – Thacker 10/5, 2017 at 19:52

I never used softmax in one axis, I don't know how to do it exactly. It's mentioned here, but not explained: keras.io/activations -- In the worst case, you'd have to add a Lambda layer just for using that softmax function at the end of your model. – Belorussia 10/5, 2017 at 19:59

Let us continue this discussion in chat. – Thacker 10/5, 2017 at 20:4

Why is the background needed as a separate class to be predicted? – Alfonzoalford 7/7, 2021 at 14:32

Bit late but you should try

mask_train = to_categorical(mask_train, num_classes=None)

That will result in (634, 4, 64, 64) for mask_train.shape and a binary mask for each individual class (one-hot encoded).

Last conv layer, activation and loss looks good for multiclass segmentation.

Whiteheaded answered 10/10, 2018 at 16:51 Comment(0)

Recommended topics

Hot tags