I am running a model written with TensorFlow 1.x on 4x RTX 3090 and it is taking a long time to start up the training than as in 1x RTX 3090. Although, as training starts, it gets finished up earlier in 4x than in 1x. I am using CUDA 11.1 and TensorFlow 1.14 in both the GPUs.
Secondly, When I am using 1x RTX 2080ti, with CUDA 10.2 and TensorFlow 1.14, it is taking less amount to start the training as compared to 1x RTX 3090 with 11.1 CUDA and Tensorflow 1.14. Tentatively, it is taking 5 min in 1x RTX 2080ti, 30-35 minutes in 1x RTX 3090, and 1.5 hrs in 4x RTX 3090 to start the training for one of the datasets.
I'll be grateful if anyone can help me to resolve this issue.
I am using Ubuntu 16.04, Core™ i9-10980XE CPU, and 32 GB ram both in 2080ti and 3090 machines.
EDIT: I found out that TF takes a long start-up time in Ampere architecture GPUs, according to this, but I'm still unclear if this is the case; and, if this is the case, does any solution exist for it?