How to perform object detection model training on more than 1 class?

Link: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb

I have tried the above google colab to train a object detection model with 1 class as shown in the example.

I am trying to understand how to modify this code to be able to train 2 classes.

In the above example, after I annotate the image with boxes, it runs the following code to create the category_index and image/box tensors. Suppose I modify the num_classes = 2 and add another class to the category_index, then how to proceed from here? For example - I believe the one-hot encoding is for 1-class only. How to modify the code to make it work with 2 classes?

# By convention, our non-background classes start counting at 1.  Given
# that we will be predicting just one class, we will therefore assign it a
# `class id` of 1.
duck_class_id = 1
num_classes = 1

category_index = {duck_class_id: {'id': duck_class_id, 'name': 'rubber_ducky'}}

# Convert class labels to one-hot; convert everything to tensors.
# The `label_id_offset` here shifts all classes by a certain number of indices;
# we do this here so that the model receives one-hot labels where non-background
# classes start counting at the zeroth index.  This is ordinarily just handled
# automatically in our training binaries, but we need to reproduce it here.
label_id_offset = 1
train_image_tensors = []
gt_classes_one_hot_tensors = []
gt_box_tensors = []
for (train_image_np, gt_box_np) in zip(
    train_images_np, gt_boxes):
  train_image_tensors.append(tf.expand_dims(tf.convert_to_tensor(
      train_image_np, dtype=tf.float32), axis=0))
  gt_box_tensors.append(tf.convert_to_tensor(gt_box_np, dtype=tf.float32))
  zero_indexed_groundtruth_classes = tf.convert_to_tensor(
      np.ones(shape=[gt_box_np.shape[0]], dtype=np.int32) - label_id_offset)
  gt_classes_one_hot_tensors.append(tf.one_hot(
      zero_indexed_groundtruth_classes, num_classes))
print('Done prepping data.')

In order for the mono-class detection detection tutorial: Rubber Ducky detector or Zombie detector . Change it to work with multi-class, changes like these need to be made:

The category_index variable has to look like this. (get very good example: category_index example )

    # Array of paths to the images
    train_image_filenames = [
          './datasets/train_images/train_image0001.jpg',
          './datasets/train_images/train_image0002.jpg'
          ]
    # Label map ids start from "1"
    category_index = {
         1: {'id': 1, 'name': 'cat'},
         2: {'id': 2, 'name': 'dog'}
         }
    # Array of IDs
    Gt_labels = [
        np.array([1,1]),
        np.array([1,2,2])
        ]   
    # Bounding boxes. Numpy array of [ miny, minx, maxy, maxx ] 
    Gt_boxes = [
         np.array([[0.436, 0.591, 0.629, 0.712],[0.539, 0.583, 0.73, 0.71]], dtype=np.float32),
         np.array([[0.464, 0.414, 0.626, 0.548],[0.313, 0.308, 0.648, 0.526],[0.256, 0.444, 0.484, 0.629]], dtype=np.float32)
        ]
    NUM_CLASSES = len(category_index)

Here the np.ones(shape=[gt_box_np.shape[0]], dtype=np.int32) is non-sense (also in Rubber Ducky detector ), it is a very awkward way the author has found to format the grount true classes variable as a tensor. the GT_classes entry must be in the format Tensor("Const:0", shape=(1, NUM_CLASES), dtype=float32) and one_hot encoder (float32 is important) .
For it, must be replaced by both: tf.one_hot and with tf.reshape. Example creation correct gt_classes_one_hot_tensors:

LABEL_ID_OFFSET = 1
train_image_tensors = []
gt_classes_one_hot_tensors = []
gt_box_tensors = []
# One memory-costly process is the internal conversion from (usually) numpy to tf.tensor. If you are sure that your GPU should be able to handle the batches:
# Manually convert the data to tf tensors using the CPU RAM, and only then pass it to your model (eventually using GPU).
print("\nUse CPU, GPU dont have space to locate one_hot_tensors. Use for hot_encoders:", str(tf.config.experimental.list_physical_devices('CPU')[0]))
with tf.device('/cpu:0'): # Recomended for this step Use CPU, if GPU dont have space
    for (train_image_np, gt_box_np, gt_label_np) in zip(train_images_np, Gt_boxes, Gt_labels):
        print("|", end="")
        train_image_tensors.append(tf.expand_dims(tf.convert_to_tensor(train_image_np, dtype=tf.float32), axis=0)) # image in np format train_image_np 
        gt_box_tensors.append(tf.convert_to_tensor(gt_box_np, dtype=tf.float32)) # put box in Tensor
        zero_indexed_groundtruth_classes = tf.convert_to_tensor(gt_label_np - LABEL_ID_OFFSET) # put labels in Numpy array (min:0)
        gt_classes_one_hot_tensors.append(tf.one_hot(zero_indexed_groundtruth_classes, num_classes)) # label Tensor to one hot
print('Done prepping data. Num Data: ',len(train_images_np), "\n" )

More Info and full example full example extra github example link

TIP: If you are starting with these tutorials Rubber Ducky detector, I recommend you to read: Does tensorflow's object detection api support multi-class multi-label detection?

Warning, in case you want to use it for multi bounding boxes ([n,4] boxes): warn important

Recommended topics

Hot tags