Building custom Caffe layer in python

Asked 18/11, 2015 at 10:57 Answered 2/4, 2016 at 5:42

Solved python neural-network deep-learning caffe pycaffe

After parsing many links regarding building Caffe layers in Python i still have difficulties in understanding few concepts. Can please someone clarify them?

Blobs and weights python structure for network is explained here: Finding gradient of a Caffe conv-filter with regards to input.
Network and Solver structure is explained here: Cheat sheet for caffe / pycaffe?.
Example of defining python layer is here: pyloss.py on git.
Layer tests here: test layer on git.
Development of new layers for C++ is described here: git wiki.

What I am still missing is:

setup() method: what I should do here? Why in example I should compare the lenght of 'bottom' param with '2'? Why it should be 2? It seems not a batch size because its arbitrary? And bottom as I understand is blob, and then the first dimension is batch size?
reshape() method: as I understand 'bottom' input param is blob of below layer, and 'top' param is blob of upper layer, and I need to reshape top layer according to output shape of my calculations with forward pass. But why do I need to do this every forward pass if these shapes do not change from pass to pass, only weights change?
reshape and forward methods have 0 indexes for 'top' input param used. Why would I need to use top[0].data=... or top[0].input=... instead of top.data=... and top.input=...? Whats this index about? If we do not use other part of this top list, why it is exposed in this way? I can suspect its or C++ backbone coincidence, but it would be good to know exactly.
reshape() method, line with:
```
if bottom[0].count != bottom[1].count
```
what I do here? why its dimension is 2 again? And what I am counting here? Why both part of blobs (0 and 1) should be equal in amount of some members (count)?
forward() method, what I define by this line:
```
self.diff[...] = bottom[0].data - bottom[1].data
```
When it is used after forward path if I define it? Can we just use
```
diff = bottom[0].data - bottom[1].data 
```
instead to count loss later in this method, without assigning to self, or its done with some purpose?
backward() method: what's this about: for i in range(2):? Why again range is 2?
backward() method, propagate_down parameter: why it is defined? I mean if its True, gradient should be assigned to bottom[X].diff as I see, but why would someone call method which would do nothing with propagate_down = False, if it just do nothing and still cycling inside?

I'm sorry if those questions are too obvious, I just wasn't able to find a good guide to understand them and asking for help here.

Inadmissible answered 18/11, 2015 at 10:57 Comment(1)

Dont forget to make pycaffe with flag WITH_PYTHON_LAYERS=1, see here – Salsala 22/3, 2016 at 5:6

You asked a lot of questions here, I'll give you some highlights and pointers that I hope will clarify matters for you. I will not explicitly answer all your questions.

It seems like you are most confused about the the difference between a blob and a layer's input/output. Indeed most of the layers has a single blob as input and a single blob as output, but it is not always the case. Consider a loss layer: it has two inputs: predictions and ground truth labels. So, in this case bottom is a vector of length 2(!) with bottom[0] being a (4-D) blob representing predictions, while bottom[1] is another blob with the labels. Thus, when constructing such a layer you must ascertain that you have exactly (hard coded) 2 input blobs (see e.g., ExactNumBottomBlobs() in AccuracyLayer definition).

The same goes for top blobs as well: indeed in most cases there is a single top for each layer, but it's not always the case (see e.g., AccuracyLayer). Therefore, top is also a vector of 4-D blobs, one for each top of the layer. Most of the time there would be a single element in that vector, but sometimes you might find more than one.

I believe this covers your questions 1,3,4 and 6.

As of reshape() (Q.2) this function is not called every forward pass, it is called only when net is setup to allocate space for inputs/outputs and params.
Occasionally, you might want to change input size for your net (e.g., for detection nets) then you need to call reshape() for all layers of the net to accommodate the new input size.

As for propagate_down parameter (Q.7): since a layer may have more than one bottom you would need, in principle, to pass the gradient to all bottoms during backprop. However, what is the meaning of a gradient to the label bottom of a loss layer? There are cases when you do not want to propagate to all bottoms: this is what this flag is for. (here's an example with a loss layer with three bottoms that expect gradient to all of them).

For more information, see this "Python" layer tutorial.

Prorogue answered 19/11, 2015 at 6:56 Comment(1)

Thank you, Shai, that makes much more sense now. – Inadmissible 19/11, 2015 at 10:20

Why it should be 2?

That specific gist is talking about the Euclidian loss layer. Euclidian loss is the mean square error between 2 vectors. Hence there must be 2 vectors in the input blob to this layer. The length of each vector must be same because it is element-wise difference. You can see this check in the reshape method.

Thanks.

Prank answered 2/4, 2016 at 5:42 Comment(1)

This should be comment – Fourway 2/4, 2016 at 6:6

Recommended topics

Hot tags