TensorFlow Serving frequent request timeouts
Asked Answered
S

1

6

Problem description

The problem we encounter is the following. Serving is configured to load and serve 7 models, and with an increase in the number of models, Serving requests timeout more frequently. On the contrary, with a decrease in the number of models request timeouts are insignificant. From the client's side, timeout was configured to 5 seconds.

Interestedly, the maximum batch processing duration is approximately 700ms, with a configured maximum batch size of 10. The average batch processing duration is ~60ms.

Logs and screenshots

We've checked the TensorFlow Serving logs but no warnings nor errors were found. In addition to, we've monitored the network of the running GPU machines and hosts executing inference requests towards Serving, but no network issues were identified neither.

Temporally solution

Decreasing the number of loaded and served models, however not the expected solution because this requires setting up multiple distinct GPU instance each loading and serving only a subset of models.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04 TensorFlow Serving installed from (source or binary): source TensorFlow Serving version: 1.9 TensorFlow serving runs on multiple AWS g2.2xlarge instances. We run TensorFlow Serving using Docker, with a base image nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04

What could be the route cause of such a behaviour? How is Serving expected to handle requests when having multiple models loaded in-memory? How does it change the model context?

Saucepan answered 3/9, 2018 at 13:22 Comment(0)
G
0

Adding the argument --rest_api_timeout_in_ms=0, which is passed from docker to tensorflow serving, worked for me.

Example:

docker run -p 8501:8501 \
  --mount type=bind,\
source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu,\
target=/models/half_plus_two \
  -e MODEL_NAME=half_plus_two -t tensorflow/serving --rest_api_timeout_in_ms=0 &
Gaul answered 22/9, 2020 at 16:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.