Following the instructions included in the model, --training_crop_size
is set to a value much smaller than the size of the training images. For instance:
python deeplab/train.py \
--logtostderr \
--training_number_of_steps=90000 \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="769,769" \
--train_batch_size=1 \
--dataset="cityscapes" \
--tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
--train_logdir=${PATH_TO_TRAIN_DIR} \
--dataset_dir=${PATH_TO_DATASET}
But what does this option actually do? Does it take a random crop of each training image? If so, wouldn't the input dimensions be smaller, e.g., 769x769 (WxH) as per example? As per instructions, the eval crop size is set to 2049x1025. How does a network with input dimensions 769x769 take 2049x1025 input when there's no suggestion of image resizing? A shape mismatch issue would arise.
Are the instructions conflicting?