I am a little confused about how should I use/insert "BatchNorm"
layer in my models.
I see several different approaches, for instance:
ResNets: "BatchNorm"
+"Scale"
(no parameter sharing)
"BatchNorm"
layer is followed immediately with "Scale"
layer:
layer {
bottom: "res2a_branch1"
top: "res2a_branch1"
name: "bn2a_branch1"
type: "BatchNorm"
batch_norm_param {
use_global_stats: true
}
}
layer {
bottom: "res2a_branch1"
top: "res2a_branch1"
name: "scale2a_branch1"
type: "Scale"
scale_param {
bias_term: true
}
}
cifar10 example: only "BatchNorm"
In the cifar10 example provided with caffe, "BatchNorm"
is used without any "Scale"
following it:
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
cifar10 Different batch_norm_param
for TRAIN
and TEST
batch_norm_param: use_global_scale
is changed between TRAIN
and TEST
phase:
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
batch_norm_param {
use_global_stats: false
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TRAIN
}
}
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
batch_norm_param {
use_global_stats: true
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TEST
}
}
So what should it be?
How should one use"BatchNorm"
layer in caffe?
decay_mult
in BN, just uselr_mult:0
. Am I right? – Patriciodecay_mult
andlr_mult
are meaningless for"BatchNorm"
layer as its parameters are updated based on the input statistics, rather than the backprop gradients. AFAIK, recent versions of caffe automatically setslr_mult
to zero for this layer. – Kingkingbird