Is it possible to use a collection of hyperspectral 1x1 pixels in a CNN model purposed for more conventional datasets (CIFAR-10/MNIST)?

Asked 4/12, 2021 at 14:44 Answered 16/12, 2021 at 10:27

Solved tensorflow machine-learning keras deep-learning conv-neural-network

I have created a working CNN model in Keras/Tensorflow, and have successfully used the CIFAR-10 & MNIST datasets to test this model. The functioning code as seen below:

import keras
from keras.datasets import cifar10
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Conv2D, Flatten, MaxPooling2D
from keras.layers.normalization import BatchNormalization

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

#reshape data to fit model
X_train = X_train.reshape(50000,32,32,3)
X_test = X_test.reshape(10000,32,32,3)

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)


# Building the model 

#1st Convolutional Layer
model.add(Conv2D(filters=64, input_shape=(32,32,3), kernel_size=(11,11), strides=(4,4), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#2nd Convolutional Layer
model.add(Conv2D(filters=224, kernel_size=(5, 5), strides=(1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#3rd Convolutional Layer
model.add(Conv2D(filters=288, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

#4th Convolutional Layer
model.add(Conv2D(filters=288, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

#5th Convolutional Layer
model.add(Conv2D(filters=160, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

model.add(Flatten())

# 1st Fully Connected Layer
model.add(Dense(4096, input_shape=(32,32,3,)))
model.add(BatchNormalization())
model.add(Activation('relu'))
# Add Dropout to prevent overfitting
model.add(Dropout(0.4))

#2nd Fully Connected Layer
model.add(Dense(4096))
model.add(BatchNormalization())
model.add(Activation('relu'))
#Add Dropout
model.add(Dropout(0.4))

#3rd Fully Connected Layer
model.add(Dense(1000))
model.add(BatchNormalization())
model.add(Activation('relu'))
#Add Dropout
model.add(Dropout(0.4))

#Output Layer
model.add(Dense(10))
model.add(BatchNormalization())
model.add(Activation('softmax'))


#compile model using accuracy to measure model performance
opt = keras.optimizers.Adam(learning_rate = 0.0001)
model.compile(optimizer=opt, loss='categorical_crossentropy', 
              metrics=['accuracy'])


#train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=30)

From this point after utilising the aforementioned datasets, I wanted to go one further and use a dataset with more channels than a greyscale or rgb presented, hence the inclusion of a hyperspectral dataset. When looking for a hyperspectral dataset I came across this one.

The issue at this stage was realising that this hyperspectral dataset was one image, with each value in the ground truth relating to each pixel. At this stage I reformatted the data from this into a collection of hyperspectral data/pixels.

Code reformatting corrected dataset for x_train & x_test:

import keras
import scipy
import numpy as np
import matplotlib.pyplot as plt
from keras.utils import to_categorical
from scipy import io

mydict = scipy.io.loadmat('Indian_pines_corrected.mat')
dataset = np.array(mydict.get('indian_pines_corrected'))


#This is creating the split between x_train and x_test from the original dataset 
# x_train after this code runs will have a shape of (121, 145, 200) 
# x_test after this code runs will have a shape of (24, 145, 200)
x_train = np.zeros((121,145,200), dtype=np.int)
x_test = np.zeros((24,145,200), dtype=np.int)    

xtemp = np.array_split(dataset, [121])
x_train = np.array(xtemp[0])
x_test = np.array(xtemp[1])

# x_train will have a shape of (17545, 200) 
# x_test will have a shape of (3480, 200)
x_train = x_train.reshape(-1, x_train.shape[-1])
x_test = x_test.reshape(-1, x_test.shape[-1])

Code reformatting ground truth dataset for Y_train & Y_test:

truthDataset = scipy.io.loadmat('Indian_pines_gt.mat')
gTruth = truthDataset.get('indian_pines_gt')

#This is creating the split between Y_train and Y_test from the original dataset 
# Y_train after this code runs will have a shape of (121, 145) 
# Y_test after this code runs will have a shape of (24, 145)

Y_train = np.zeros((121,145), dtype=np.int)
Y_test = np.zeros((24,145), dtype=np.int)    

ytemp = np.array_split(gTruth, [121])
Y_train = np.array(ytemp[0])
Y_test = np.array(ytemp[1])

# Y_train will have a shape of (17545) 
# Y_test will have a shape of (3480)
Y_train = Y_train.reshape(-1)
Y_test = Y_test.reshape(-1)


#17 binary categories ranging from 0-16

#Y_train one-hot encode target column
Y_train = to_categorical(Y_train)

#Y_test one-hot encode target column
Y_test = to_categorical(Y_test, num_classes = 17)

My thought process was that, despite the initial image being broken down into 1x1 patches, the large number of channels each patch possessed with their respective values would aid in categorisation of the dataset.

Essentially I'd want to input this reformatted data into my model (seen within the first code fragment in this post), however I'm uncertain if I am taking the wrong approach to this due to my inexperience with this area of expertise. I was expecting to input a shape of (1,1,200), i.e the shape of x_train & x_test would be (17545,1,1,200) & (3480,1,1,200) respectively.

Scientific answered 4/12, 2021 at 14:44 Comment(0)

First, lets say that the hyper-spectral image you are using is targeted to a semantic segmentation problem rather than a classification one.

If we look at what is a convolutional layer in a neural network, it's unlikely to work too well. It might work, but there are probably better approaches.

Lets look at this 2D convolution animation (by Michael Plotke licensed under CC-BY-SA 3.0) :

We can see that at its core, a 2D convolution operation is like applying a filter of a certain size to a region of an image, then repeating this operation for all the region of the image. 2D Convolution are often used in neural networks when trying to learn/find spatial features: i.e the relationship between neighboring pixels.

An excerpt from CS231n - Convolutional Networks

As we slide the filter over the width and height of the input volume we will produce a 2-dimensional activation map that gives the responses of that filter at every spatial position. Intuitively, the network will learn filters that activate when they see some type of visual feature such as an edge of some orientation or a blotch of some color on the first layer, or eventually entire honeycomb or wheel-like patterns on higher layers of the network.

By using small patches of size 1x1, you essentially stripped the data of its spatial dimensions. Applying 2D convolution in that case does not make too much sense. (Especially given the size of the filters used in that architecture, like 11x11 in the first layer).

Suggested approaches:

Finding a bigger dataset with multiple images designed for classification: this is probably the way to go. In data driven problems, the most important part is data.
If classifying the areas of this image is important to you, you could either use a simpler network architecture and/or machine learning techniques on your spectral data pixels. This might work, but you still lose the spatial relationships between neighboring pixels.

Tartaric answered 16/12, 2021 at 10:27 Comment(1)

Appreciate the detailed response including the idea of semantic segmentation as well as the theoretical breakdown of 2d convolutions, it's helped with my understanding of issues I didn't realise I faced here. Out of your suggested approaches I'm more inclined to move towards the first - I've previously tried to source a larger hyperspectral dataset without success, I will need to revisit this as I want to keep the proposed architecture in my model. – Scientific 16/12, 2021 at 16:10

If the hyperspectral dataset is given to you as a large image with many channels, I suppose that the classification of each pixel should depend on the pixels around it (otherwise I would not format the data as an image, i.e. without grid structure). Given this assumption, breaking up the input picture into 1x1 parts is not a good idea as you are loosing the grid structure.

I further suppose that the order of the channels is arbitrary, which implies that convolution over the channels is probably not meaningful (which you however did not plan to do anyways).

Instead of reformatting the data the way you did, you may want to create a model that takes an image as input and also outputs an "image" containing the classifications for each pixel. I.e. if you have 10 classes and take a (145, 145, 200) image as input, your model would output a (145, 145, 10) image. In that architecture you would not have any fully-connected layers. Your output layer would also be a convolutional layer.

That however means that you will not be able to keep your current architecture. That is because the tasks for MNIST/CIFAR10 and your hyperspectral dataset are not the same. For MNIST/CIFAR10 you want to classify an image in it's entirety, while for the other dataset you want to assign a class to each pixel (while most likely also using the pixels around each pixel).

Some further ideas:

If you want to turn the pixel classification task on the hyperspectral dataset into a classification task for an entire image, maybe you can reformulate that task as "classifying a hyperspectral image as the class of it's center (or top-left, or bottom-right, or (21th, 104th), or whatever) pixel". To obtain the data from your single hyperspectral image, for each pixel, I would shift the image such that the target pixel is at the desired location (e.g. the center). All pixels that "fall off" the border could be inserted at the other side of the image.
If you want to stick with a pixel classification task but need more data, maybe split up the single hyperspectral image you have into many smaller images (e.g. 10x10x200). You may even want to use images of many different sizes. If you model only has convolution and pooling layers and you make sure to maintain the sizes of the image, that should work out.

Once answered 16/12, 2021 at 10:18 Comment(0)