I'm training a small (10M weights on 12K images) FCN (see e.g. Long et al, 2015). The architecture is the following (it starts with FCN8s fc7 layer):
fc7->relu1->dropout->conv2048->conv1024->conv512->deconv1->deconv2->deconv3->deconv4->deconv5->crop->softmax_with_loss
When I initialized all deconv layers with Gaussian weights, I got some (though not always) reasonable result. Then I decided to do it the right way, and used the scripts provided by Shelhamer (e.g. https://github.com/zeakey/DeepSkeleton/blob/master/examples/DeepSkeleton/solve.py)
The Deconvolution layers look like this (first one):
layer {
name: "upscore2"
type: "Deconvolution"
bottom: "upsample"
top: "upscore2"
param {
lr_mult: 2
}
convolution_param {
# num output: number of channels, our cow+bgr
num_output: 2
kernel_size: 8
stride: 2
bias_term: false
}
}
The output I get is really weird: loss drop fast (1000 generations), and stays at around 1, but the model is entirely useless on the test set. Any suggestions? I reduced the learning rate, but nothing seems to be working.
net: "mcn-train_finetune11_slow_bilinear.prototxt"
solver_mode: GPU
# REDUCE LEARNING RATE
base_lr: 1e-8
lr_policy: "fixed"
iter_size: 1
max_iter: 100000
# REDUCE MOMENTUM TO 0.5
momentum: 0.5
weight_decay: 0.016
test_interval: 1000
test_iter: 125
display: 1000
average_loss: 1000
type: "Nesterov"
snapshot: 1000
snapshot_prefix: "mcn_finetune11_slow_bilinear"
debug_info: false
PS: a short printout of training
I0723 08:38:56.772249 29191 solver.cpp:272] Solving MyCoolNetwork, MCN
I0723 08:38:56.772260 29191 solver.cpp:273] Learning Rate Policy: fixed
I0723 08:38:56.775032 29191 solver.cpp:330] Iteration 0, Testing net (#0)
I0723 08:39:02.331010 29191 blocking_queue.cpp:49] Waiting for data
I0723 08:39:18.075814 29191 solver.cpp:397] Test net output #0: loss = 37.8394 (* 1 = 37.8394 loss)
I0723 08:39:18.799008 29191 solver.cpp:218] Iteration 0 (-2.90699e-35 iter/s, 22.0247s/1000 iters), loss = 42.4986
I0723 08:39:18.799057 29191 solver.cpp:237] Train net output #0: loss = 42.4986 (* 1 = 42.4986 loss)
I0723 08:39:18.799067 29191 sgd_solver.cpp:105] Iteration 0, lr = 1e-08
I0723 08:46:12.581365 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:46:12.773717 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:51:14.609473 29191 solver.cpp:447] Snapshotting to binary proto file mcn_finetune11_slow_bilinear_iter_1000.caffemodel
I0723 08:51:15.245028 29191 sgd_solver.cpp:273] Snapshotting solver state to binary proto file mcn_finetune11_slow_bilinear_iter_1000.solverstate
I0723 08:51:15.298612 29191 solver.cpp:330] Iteration 1000, Testing net (#0)
I0723 08:51:20.888267 29203 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:51:21.194495 29202 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:51:36.276700 29191 solver.cpp:397] Test net output #0: loss = 1.18519 (* 1 = 1.18519 loss)
I0723 08:51:36.886041 29191 solver.cpp:218] Iteration 1000 (1.35488 iter/s, 738.075s/1000 iters), loss = 3.89015
I0723 08:51:36.887783 29191 solver.cpp:237] Train net output #0: loss = 1.82311 (* 1 = 1.82311 loss)
I0723 08:51:36.887807 29191 sgd_solver.cpp:105] Iteration 1000, lr = 1e-08
I0723 08:53:34.997433 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:53:35.040670 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:00:35.779531 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:00:35.791441 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:03:31.710410 29191 solver.cpp:447] Snapshotting to binary proto file mcn_finetune11_slow_bilinear_iter_2000.caffemodel
I0723 09:03:32.383363 29191 sgd_solver.cpp:273] Snapshotting solver state to binary proto file mcn_finetune11_slow_bilinear_iter_2000.solverstate
I0723 09:03:32.09 29203 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:03:44.351140 29202 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:03:52.166584 29191 solver.cpp:397] Test net output #0: loss = 1.14507 (* 1 = 1.14507 loss)
I0723 09:03:52.777982 29191 solver.cpp:218] Iteration 2000 (1.35892 iter/s, 735.881s/1000 iters), loss = 2.60843
I0723 09:03:52.778029 29191 solver.cpp:237] Train net output #0: loss = 3.07199 (* 1 = 3.07199 loss)
I0723 09:03:52.778038 29191 sgd_solver.cpp:105] Iteration 2000, lr = 1e-08
I0723 09:07:57.400295 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:07:57.448870 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:14:58.070508 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:14:58.100841 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:15:48.708067 29191 solver.cpp:447] Snapshotting to binary proto file mcn_finetune11_slow_bilinear_iter_3000.caffemodel
I0723 09:15:49.358572 29191 sgd_solver.cpp:273] Snapshotting solver state to binary proto file mcn_finetune11_slow_bilinear_iter_3000.solverstate
I0723 09:15:49.411862 29191 solver.cpp:330] Iteration 3000, Testing net (#0)
I0723 09:16:05.268878 29203 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:16:05.502995 29202 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:16:08.177001 29191 solver.cpp:397] Test net output #0: loss = 1.115 (* 1 = 1.115 loss)
I0723 09:16:08.767503 29191 solver.cpp:218] Iteration 3000 (1.35874 iter/s, 735.979s/1000 iters), loss = 2.57038
I0723 09:16:08.768218 29191 solver.cpp:237] Train net output #0: loss = 2.33784 (* 1 = 2.33784 loss)
I0723 09:16:08.768534 29191 sgd_solver.cpp:105] Iteration 3000, lr = 1e-08
I0723 09:22:16.315538 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:22:16.349555 29200 data_layer.cpp:73] Restarting data prefetching from start.