For some reason the time used to extract results using .float_val is extremely high.
Scenario example along with its output:
t2 = time.time()
options = [('grpc.max_receive_message_length', 100 * 4000 * 4000)]
channel = grpc.insecure_channel('{host}:{port}'.format(host='localhost', port=str(self.serving_grpc_port)), options = options)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'ivi-detector'
request.model_spec.signature_name = 'serving_default'
request.inputs['inputs'].CopyFrom(tf.make_tensor_proto(imgs_array, shape=imgs_array.shape))
res = stub.Predict(request, 100.0)
print("Time to detect:")
t3 = time.time(); print("t3:", t3 - t2)
t11 = time.time()
boxes_float_val = res.outputs['detection_boxes'].float_val
t12 = time.time(); print("t12:", t12 - t11)
classes_float_val = res.outputs['detection_classes'].float_val
t13 = time.time(); print("t13:", t13 - t12)
scores_float_val = res.outputs['detection_scores'].float_val
t14 = time.time(); print("t14:", t14 - t13)
boxes = np.reshape(boxes_float_val, [len(imgs_array), self.max_total_detections,4])
classes = np.reshape(classes_float_val, [len(imgs_array), self.max_total_detections])
scores = np.reshape(scores_float_val, [len(imgs_array), self.max_total_detections])
t15 = time.time(); print("t15:", t15 - t14)
Time to detect:
t3: 1.4687104225158691
t12: 1.9140026569366455
t13: 3.719329833984375e-05
t14: 9.298324584960938e-06
t15: 0.0008063316345214844
Tensorflow Serving is running an object detection model from tensorflow's object detection api (faster_rncc_resnet101). As we can see, the extraction of the boxes found on detection is higher than the prediction itself.
The current shape of the detected boxes is [batch_size, 100, 4], with 100 being the number of max detections. As a workaround I can low the number of max detection and decrease significantly the necessary time to extract these values, but it keeps staying unnecessary (on my point of view) high.
I'm using tensorflow-serving 2.3.0-gpu as a docker container along with tensorflow-serving-api==2.3.0
Also, it's important to inform that I tried to reproduce this behaviour on a public saved model (purely trained on imagenet) and the slow performance on .float_val didn't happen, pointing that the problem can be specifically with my custom trained model. I already tried to export the saved model from .ckpt files in different ways but the problem still occurs and, if I use any of the export methods for the downloaded model (the downloaded model comes with both .ckpt files and saved_model format files) the problem doesn't occur, so the export methods are safe.
Now I'm suspecting that something is wrong/different with the model I trained.... but.. why? Does it makes sense that it affects .float_val from tensorflow-serving-api?
The code I used (with fast results): https://github.com/denisb411/tfserving-od/blob/master/inference-using-tfserving-docker.ipynb
I don't know how to proceed as my custom training follows almost the same pipeline.config as the original so, there's nothing different on the training process.
How can I manage to fix this? How is this related with .float_val, if there's any relation?
Assuming that this is a bug, a time ago I created a github issue talking about this problem I got into but it didn't get enough attention.
predict_pb2.PredictRequest()
) and the exactly same problem occurs. How do you suggest I debug? And what output are you referring? The response from prediction or the data I want to extract (ie. boxes, classes and scores)? – Sparid