I am using Tensorflow Object Detection API to train my own object detector. I downloaded the faster_rcnn_inception_v2_coco_2018_01_28
from the model zoo (here), and made my own dataset (train.record (~221Mo), test.record and the label map) to fine tune it.
But when I run it :
python train.py --logtostderr --pipeline_config_path=/home/username/Documents/Object_Detection/training/faster_rcnn_inception_v2_coco_2018_01_28/pipeline.config --train_dir=/home/username/Documents/Object_Detection/training/
the process is killed during the filling up shuffle buffer operation, looks like an OOM problem (16Go RAM)...
2018-06-07 12:02:51.107021: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 410 of 2048
Process stopped
Does it exist a way to reduce the shuffle buffer size ? What impact its size ?
Then, I add some swap (115Go swap + 16Go RAM) and the filling up shuffle buffer op finished, but my training took all the RAM and swap after step 4 whereas my train.record is just about 221 Mo !
I already added those lines to my pipeline.config > train_config:
batch_size: 1
batch_queue_capacity: 10
num_batch_queue_threads: 8
prefetch_queue_capacity: 9
and these ones to my pipeline.config > train_input_reader :
queue_capacity: 2
min_after_dequeue: 1
num_readers: 1
following this post.
I know my images are very (very very) large : 25Mo each, but as I only took 9 images to make my train.record (just to test if my installation gone well), it should not be so memory consuming right ?
Any other idea about why it uses so much RAM ?
(BTW I only use CPU)