How to input multiple N-D arrays to a net in caffe?
Asked Answered
H

1

0

I want to create a custom loss layer for semantic segmentation in caffe that requires multiple inputs. I wish this loss function to have an additional input factor in order to penalize the miss detection in small objects.

To do that I have created an image GT that contains for each pixel a weight. If the pixel belongs to a small object the weight is high.

I am newbie in caffe and I do not know how to feed my net with three 2-D signals at the same time (image, gt-mask and the per-pixel weights). I have doubts regarding how is caffe doing the correspondence between rgb data and gt data.
I want to expand this in order to have 2 gt one for the class label image and the other to put this factor in the loss function.

Can you give some hint in order to achive that?

Thanks,

Halliburton answered 18/7, 2017 at 19:16 Comment(4)
Try reading this post. Should get you started. Once you have more specific questions, feel free to ask them here. As it currently stands, your question is "too broad".Stoneware
Thanks Shai for your answer. I already read your post. But I have the same doubts. First at all I am using as a network reference fcn_alexnet(github.com/NVIDIA/DIGITS/blob/master/examples/…) The first thing I do not understand is that: This net has 4 input data: - train data - train gt - val data - val gt How is caffe sync this information? I mean, How is caffe is doing the correspondence between train data and train gt. This is my first doubt.Halliburton
The other doubt is. I want to modify the loss function in order to incorporate a scale factor in order to penalize the miss detection in small objects. Regarding that I understand I can put two bottom parameters in order to pass to the loss function train GT and my new GT with the pixel weight. But then my input data layers would be something like that: - train data - train gt - train weight gt - val data - val gt - val weight gt How is managing caffe this? How can I sure that the curren train data corresponds to the correct train gt and the train weight gt.Halliburton
I took the liberty to rephrase your question. please make sure the edited version reflects your original question.Stoneware
S
6

You want to caffe to use several N-D signals for each training sample. You are concerned with the fact that the default "Data" layer can only handle one image as a training sample.
There are several solutions for this concern:

  1. Using several "Data" layers (as was done in the model you linked to). In order to sync between the three "Data" layers you'll have you need to know that caffe reads the samples from the underlying LMDB sequentially. So, if you prepare your three LMDBs in the same order caffe will read one sample at a time from each of the LMDBs in the order in which the samples were put there, so the three inputs will be in sync during training/validation.
    Note that convert_imageset has a 'shuffle' flag, do NOT use it as it will shuffle your samples differently in each of the three LMDBs and you will have no sync. You are strongly advised to shuffle the samples yourself before preparing the LMDBs but in a way that the same "shuffle" is applied to all three inputs leaving them in sync with each other.

  2. Using 5 channel input. caffe can store N-D data in LMDB and not only color/gray images. You can use python to create LMDB with each "image" is a 5-channel array with the first three channels are image's RGB and the last two are the ground-truth labels and the weight for the per-pixel loss.
    In your model you only need to add a "Slice" layer on top of your "Data":

    layer {
      name: "slice_input"
      type: "Slice"
      bottom: "raw_input" # 5-channel "image" stored in LMDB
      top: "rgb"
      top: "gt"
      top: "weight"
      slice_param { 
        axis: 1
        slice_point: 3
        slice_point: 4
      }
    }
    
  3. Using "HDF5Data" layer (my personal favorite). You can store your inputs in a binary hdf5 format and have caffe read from these files. Using "HDF5Data" is much more flexible in caffe and allows you to shape the inputs as much as you like. In your case you need to prepare a binary hdf5 file with three "datasets": 'rgb', 'gt' and 'weight'. You need to make sure the samples are synced when you create the hdf5 file(s). Once you have the, ready you can have a "HDF5Data" layer with three "top"s ready to be used.

  4. Write your own "Python" input layer. I will not go into the details here. But you can implement your own input layer in python. See this thread for more details.

Stoneware answered 19/7, 2017 at 5:48 Comment(3)
Hi Shai, thanks for you detailed explanation. From what you have explained to me it seems that the most versatile solution is HDF5Data . I will try to do the implementation using this solution. But I have one more question. Regarding HDF5Data how is managing Caffe the memory used? I mean, Caffe at the time of initialization of the training, is loading the entire HDF5Data in ram or only is loading the necessary information for each batch?Halliburton
@Halliburton to be honest, I have no idea. But in my experience you can have very large HDF5 files and still caffe works flawlessly. So I suppose it only reads the portions it needs right away.Stoneware
Thanks @Stoneware for your help. I will try this and report back. Thanks again.Halliburton

© 2022 - 2024 — McMap. All rights reserved.