I'm trying to accelerate my model's performance by converting it to OnnxRuntime. However, I'm getting weird results, when trying to measure inference time.
While running only 1 iteration OnnxRuntime's CPUExecutionProvider greatly outperforms OpenVINOExecutionProvider:
- CPUExecutionProvider - 0.72 seconds
- OpenVINOExecutionProvider - 4.47 seconds
But if I run let's say 5 iterations the result is different:
- CPUExecutionProvider - 3.83 seconds
- OpenVINOExecutionProvider - 14.13 seconds
And if I run 100 iterations, the result is drastically different:
- CPUExecutionProvider - 74.19 seconds
- OpenVINOExecutionProvider - 46.96seconds
It seems to me, that the inference time of OpenVinoEP is not linear, but I don't understand why. So my questions are:
- Why does OpenVINOExecutionProvider behave this way?
- What ExecutionProvider should I use?
The code is very basic:
import onnxruntime as rt
import numpy as np
import time
from tqdm import tqdm
limit = 5
# MODEL
device = 'CPU_FP32'
model_file_path = 'road.onnx'
image = np.random.rand(1, 3, 512, 512).astype(np.float32)
# OnnxRuntime
sess = rt.InferenceSession(model_file_path, providers=['CPUExecutionProvider'], provider_options=[{'device_type' : device}])
input_name = sess.get_inputs()[0].name
start = time.time()
for i in tqdm(range(limit)):
out = sess.run(None, {input_name: image})
end = time.time()
inference_time = end - start
print(inference_time)
# OnnxRuntime + OpenVinoEP
sess = rt.InferenceSession(model_file_path, providers=['OpenVINOExecutionProvider'], provider_options=[{'device_type' : device}])
input_name = sess.get_inputs()[0].name
start = time.time()
for i in tqdm(range(limit)):
out = sess.run(None, {input_name: image})
end = time.time()
inference_time = end - start
print(inference_time)