Jetson NX optimize tensorflow model using TensorRT
Asked Answered
S

2

6

I am trying to speed up the segmentation model(unet-mobilenet-512x512). I converted my tensorflow model to tensorRT with FP16 precision mode. And the speed is lower than I expected. Before the optimization i had 7FPS on inference with .pb frozen graph. After tensorRT oprimization I have 14FPS.

Here is benchmark results of Jetson NX from their site
You can see, that unet 256x256 segmentation speed is 146 FPS. I thought, the speed of my unet512x512 should be 4 times slower in the worst case.

enter image description here

Here is my code for optimizing tensorflow saved model using TensorRt:

import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf

params = trt.DEFAULT_TRT_CONVERSION_PARAMS
params = params._replace(
    max_workspace_size_bytes=(1<<32))
params = params._replace(precision_mode="FP16")
converter = tf.experimental.tensorrt.Converter(input_saved_model_dir='./model1', conversion_params=params)
converter.convert()

def my_input_fn():
  inp1 = np.random.normal(size=(1, 512, 512, 3)).astype(np.float32)
  yield [inp1]

converter.build(input_fn=my_input_fn)  # Generate corresponding TRT engines
output_saved_model_dir = "trt_graph2"
converter.save(output_saved_model_dir)  # Generated engines will be saved.


print("------------------------freezing the graph---------------------")


from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

saved_model_loaded = tf.saved_model.load(
    output_saved_model_dir, tags=[tf.compat.v1.saved_model.SERVING])
graph_func = saved_model_loaded.signatures[
    tf.compat.v1.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = convert_variables_to_constants_v2(
    graph_func)
frozen_func.graph.as_graph_def()

tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
                logdir="./",
                name="unet_frozen_graphTensorRt.pb",
                as_text=False)

I downloaded the repository, that was used for Jetson NX benchmarking ( https://github.com/NVIDIA-AI-IOT/jetson_benchmarks ), and the speed of unet256x256 really is ~146FPS. But there is no pipeline to optimize the model. How can I get the similar results? I am looking for the solutions to get speed of my model(unet-mobilenet-512x512) close to 30FPS
Maybe I should run inference in other way(without tensorflow) or change some converting parameters?
Any suggestions, thanks

Silvertongued answered 7/2, 2021 at 12:20 Comment(0)
G
7

As far as I can see, the repository you linked to uses command line tools that use TensorRT (TRT) under the hood. Note that TensorRT is not the same as "TensorRT in TensorFlow" aka TensorFlow-TensorRT (TF-TRT) which is what you are using in your code. Both TF-TRT and TRT models run faster than regular TF models on a Jetson device but TF-TRT models still tend to be slower than TRT ones (source 1, source 2).

The downside of TRT is that the conversion to TRT needs to be done on the target device and that it can be quite difficult to implement it successfully as there are various TensorFlow operations that TRT does not support (in which case you need to write a custom plugin or pray to God that someone on the internet has already done so. …or use TensorRT only for part of your model and do pre-/postprocessing in TensorFlow).

There are basically two ways to convert models from TensorFlow models to TensorRT "engines" aka "plan files", both of which use intermediate formats:

  • TF -> UFF -> TRT
  • TF -> ONNX -> TRT

In both cases, the graphsurgeon/onnx-graphsurgeon libraries can be used to modify the TF/ONNX graph to achieve compatibility of graph operations. Unsupported operations can be added by means of TensorRT plugins, as mentioned above. (This is really the main challenge here: Different graph file formats and different target GPUs support different graph operations.)

There's also a third way where you do TF -> Caffe -> TRT and apparently a fourth one where you use Nvidia's Transfer Learning Toolkit (TLT) (based upon TF/Keras) and a tool called tlt-converter but I'm not familiar with it. The latter link does mention converting a UNet model, though.

Note that the paths involving UFF and Caffe are now deprecated and support will be removed in TensorRT 9.0, so if you want something future-proof, you should probably go for ONNX. That being said, most sample code online I've come across online still uses UFF and TensorRT 9.0 is still some time away.

Anyway, I haven't tried converting a UNet to TensorRT yet, but the following repositories provide sample code which might give you an idea of how it works in principle:

Note that even if you don't manage to pull off the conversion from ONNX to TRT for your model, using the ONNX runtime for inference could potentially still give you a performance gain, especially when you're using the CUDA or the TensorRT execution provider which will be enabled automatically provided you're on a Jetson device and running the correct ONNXRuntime build. (I'm not sure how it compares to TF-TRT or TRT, though, but it might still be worth a shot.)

Finally, for completeness's sake let me also mention that at least my team has been dabbling with the idea of switching from TF to PyTorch, partly because the Nvidia support has been getting a lot better lately and Nvidia employees seem to gravitate towards PyTorch, too. In particular, there are now two separate ways to convert models to TRT:

Gaelic answered 13/2, 2021 at 9:58 Comment(9)
Many thanks! The best answer I have ever seen on StackOverflow.Silvertongued
I'm glad I could help! I just thought I'd tell you what I wish someone else had told me when I started working with TensorRT a few months ago. It really is a messy ecosystem and I'm still anything but an expert. I just googled a lot haha. Anyway, please let me (and everyone else here) know how it goes – as far as I can see, no one else has converted a UNet to TensorRT yet, so your experiences might really help out other people here!Gaelic
Hi! Have tried Keras/TF -> ONNX -> TRT, but without success. Issues with Upsampling layer, tried a lot of things: different TRT versions, used onnx simplifier, changed source code of TRT, but without succes.It seems that really no one did It yet. As I work on localization, decided to use custom architecture, without the Upsampling and without segmentation at all. Thank you.Silvertongued
What Jetpack version are you using?Warmedover
This is literally all the research I did in the past months, you wrote it all in one answer 😅. Thanks for the awesome efforts.Inshore
@Gaelic Thanks for the awesome efforts. Although when trying these methods, I am still unable to get my Mask-RCNN model running in TensorRT. Please share, if you've dealt with mrcnn+TensorRTInshore
@PeDro I'm afraid I won't be of much help as I haven't used Mask-RCNN so far – sorry!Gaelic
@Gaelic Thanks for good explanation, do you know what options exist for darknet weights?Hards
@Hards Thank you for your kind words! Unfortunately, I'm not familiar with Darknet.Gaelic
W
3

Hi can you share the errors you are getting? Its should work with the following steps:

  1. Convert the TensorFlow/Keras model to a .pb file.
  2. Convert the .pb file to ONNX format.
  3. Create a TensorRT engine.
  4. Run inference from the TensorRT engine.

I am not sure about Unet (I will check) but you may have some operations not supported by onnx (please share your errors).

Here is an example with Resnet-50.

Conversion to .pb:

import tensorflow as tf
import keras
from tensorflow.keras.models import Model
import keras.backend as K
K.set_learning_phase(0)

def keras_to_pb(model, output_filename, output_node_names):

   """
   This is the function to convert the Keras model to pb.

   Args:
      model: The Keras model.
      output_filename: The output .pb file name.
      output_node_names: The output nodes of the network. If None, then
      the function gets the last layer name as the output node.
   """

   # Get the names of the input and output nodes.
   in_name = model.layers[0].get_output_at(0).name.split(':')[0]

   if output_node_names is None:
       output_node_names = [model.layers[-1].get_output_at(0).name.split(':')[0]]

   sess = keras.backend.get_session()

   # The TensorFlow freeze_graph expects a comma-separated string of output node names.
   output_node_names_tf = ','.join(output_node_names)

   frozen_graph_def = tf.graph_util.convert_variables_to_constants(
       sess,
       sess.graph_def,
       output_node_names)

   sess.close()
   wkdir = ''
   tf.train.write_graph(frozen_graph_def, wkdir, output_filename, as_text=False)

   return in_name, output_node_names

# load the ResNet-50 model pretrained on imagenet
model = keras.applications.resnet.ResNet50(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)

# Convert the Keras ResNet-50 model to a .pb file
in_tensor_name, out_tensor_names = keras_to_pb(model, "models/resnet50.pb", None) 

Then you need to convert the .pb model to the ONNX format. To do this, you will need to install tf2onnx. Example:

python -m tf2onnx.convert  --input /Path/to/resnet50.pb --inputs input_1:0 --outputs probs/Softmax:0 --output resnet50.onnx 

Last step create the TensorRT engine from the ONNX file:

import tensorrt as trt

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
def build_engine(onnx_path, shape = [1,224,224,3]):

   """
   This is the function to create the TensorRT engine
   Args:
      onnx_path : Path to onnx_file. 
      shape : Shape of the input of the ONNX file. 
  """
   with trt.Builder(TRT_LOGGER) as builder, builder.create_network(1) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
       builder.max_workspace_size = (256 << 20)
       with open(onnx_path, 'rb') as model:
           parser.parse(model.read())
       network.get_input(0).shape = shape
       engine = builder.build_cuda_engine(network)
       return engine

def save_engine(engine, file_name):
   buf = engine.serialize()
   with open(file_name, 'wb') as f:
       f.write(buf)
def load_engine(trt_runtime, plan_path):
   with open(engine_path, 'rb') as f:
       engine_data = f.read()
   engine = trt_runtime.deserialize_cuda_engine(engine_data)
   return engine

I suggest you check this Pytorch TRT Unet implementation

Warmedover answered 16/3, 2021 at 7:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.