Let's go one by one with your questions:
- Batch size is the number of images on which the training/testing/validation is done at a time. You can find the respective parameters and their default values defined in the script:
parser.add_argument(
'--train_batch_size',
type=int,
default=100,
help='How many images to train on at a time.'
)
parser.add_argument(
'--test_batch_size',
type=int,
default=-1,
help="""\
How many images to test on. This test set is only used once, to evaluate
the final accuracy of the model after training completes.
A value of -1 causes the entire test set to be used, which leads to more
stable results across runs.\
"""
)
parser.add_argument(
'--validation_batch_size',
type=int,
default=100,
help="""\
How many images to use in an evaluation batch. This validation set is
used much more often than the test set, and is an early indicator of how
accurate the model is during training.
A value of -1 causes the entire validation set to be used, which leads to
more stable results across training iterations, but may be slower on large
training sets.\
"""
)
So if you want to decrease training batch size, you should run the script with this parameter among others:
python -m retrain --train_batch_size=16
I also recommend you to specify the number of the batch size as a power of 2 (16, 32, 64, 128, ...). And this number depends on the GPU you are using. The less memory the GPU has the lesser batch size you should use. With 8Gb in the GPU, you can try a batch size of 16.
- To discover whether you are using GPUs at all you can follow the steps in the Tensorflow documentation you mentioned - just put
tf.debugging.set_log_device_placement(True)
as the first statement of your script.
Device placement logging causes any Tensor allocations or operations will be printed.