Deploy Semantic Segmentation Network (U-Net) with TensorRT (no upsampling support)
Asked Answered
N

3

18

I am trying to deploy a trained U-Net with TensorRT. The model was trained using Keras (with Tensorflow as backend). The code is very similar to this one: https://github.com/zhixuhao/unet/blob/master/model.py

When I converted the model to UFF format, using some code like this:

import uff
import os
uff_fname = os.path.join("./models/", "model_" + idx + ".uff")
uff_model = uff.from_tensorflow_frozen_model(
    frozen_file = os.path.join('./models', trt_fname), output_nodes = output_names, 
    output_filename = uff_fname
)

I will get the following warning:

Warning: No conversion function registered for layer: ResizeNearestNeighbor yet.
Converting up_sampling2d_32_12/ResizeNearestNeighbor as custom op: ResizeNearestNeighbor
Warning: No conversion function registered for layer: DataFormatVecPermute yet.
Converting up_sampling2d_32_12/Shape-0-0-VecPermuteNCHWToNHWC-LayoutOptimizer as custom op: DataFormatVecPermute

I tried to avoid this by replacing the upsampling layer with upsampling(bilinear interpolation) and transpose convolution. But the converter would throw me similar errors. I checked https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html and it seemed all these operations are not supported yet.

I am wondering if there is any workaround to this problem? Is there any other format/framework that TensorRT likes and has upsampling supported? Or is it possible to replace it with some other supported operations?

I also saw somewhere that one can add customized operations to replace those unsupported ones for TensorRT. Though I am not so sure how the workflow would be. It would also be really helpful if someone could point out an example of custom layers.

Thank you in advance!

Nameplate answered 17/7, 2019 at 22:57 Comment(3)
Can you link the network?Chancroid
It seems there are ResizeNearestNeighbor layers, which seem to be custom layers.Budworth
> transpose convolution TransposedConvolution2D is in the list of supported ops for TensorFlow.Decato
C
2

The warnings are because these operations are not supported yet by TensorRT, as you already mentioned. Unfortunately there is no easy way to fix this. You either have to modify the graph (even after training) to use a combination supported operation only; or write these operation yourself as custom layer.

However, there is a better way to run inference on other devices in C++. You can use TensorFlow mixed with TensorRT together. TensorRT will analyze the graph for ops that it supports and convert them to TensorRT nodes, and the remaining of the graph will be handled by TensorFlow as usual. More information here. This solution is much faster than rewriting the operations yourself. The only complicated part is to build TensorFlow from sources on your target device and generating the dynamic library tensorflow_cc. Recently there are many guides and support for TensorFlow ports to various architectures e.g. ARM.

Cypriot answered 23/7, 2019 at 3:56 Comment(2)
Yeah, building it is pretty easy but one of the questions I've had deploying TensorFlow versus TensorRT is operating directly on GPU arrays. Seems like vanilla TensorFlow won't directly load GPU mapped pointers. For my application this will mess stuff up.Chancroid
I think TensorFlow supports direct GPU memory access like here !Cypriot
N
2

Update 09/28/2019

Nvidia released TensorRT 6.0.1 about two weeks ago and added a new API called "IResizeLayer". This layer supports "Nearest" interpolation and can thus be used to implement upsampling. No need to use custom layers/plugins any more!

Original answer:

thanks for all the answers and suggestions posted here!

In the end, we implemented the network in TensorRT C++ API directly and loaded the weights from the .h5 model file. We haven't got the time to profile and polish the solution yet, but the inference seems to be working according to the test images we fed in.

Here's the workflow we've adopted:

Step 1: Code the upsampling layer.

In our U-Net model, all the upsampling layer has a scaling factor of (2, 2) and they all use ResizeNearestNeighbor interpolation. Essentially, pixel value at (x,y) in the original tensor will go to four pixels: (2x, 2y), (2x+1, 2y), (2x, 2y+1) and (2x+1, 2y+1) in the new tensor. This can be easily coded up into a CUDA kernel function.

Once we got the upsampling kernel we need to wrap it with TensorRT API, specifically the IPluginV2Ext class. The developer reference has some descriptions of what functions need to be implemented. I'd say enqueue() is the most important function because the CUDA kernel gets executed there.

There are also examples in the TensorRT Samples folder. For my version, these resources are helpful:

Step 2: Code the rest of the network using TensorRT API

The rest of the network should be quite straightforward. Just find call different "addxxxLayer" function from TensorRT network definitions.

One thing to keep in mind: depending on which version of TRT you are using, the way to add padding can be different. I think the newest version (5.1.5) allows developers to add parameters in addConvolution() so that the proper padding mode can be selected.

My model was trained using Keras, the default padding mode is that the right and bottom get more padding if the total number of padding is not even. Check this Stack Overflow link for details. There's a mode in 5.1.5 that represents this padding scheme.

If you are on an older version (5.1.2.2), you will need to add the padding as a separate layer before the convolution layer, which has two parameters: pre-padding and post-padding.

Also, all things are NCHW in TensorRT

Helpful sample:

  • TensorRT-5.1.2.2/samples/sampleMNISTAP

Step 3: Load the weights

TensorRT wants weights in format [out_c, in_c, filter_h, filter_w], which is mentioned in an archived documentation. Keras has weights in format [filter_h, filter_w, c_in, c_out].

We got a pure weights file by calling model.save_weights('weight.h5') in Python. Then we can read the weights into a Numpy array using h5py, performed transposing and saved the transposed weights as a new file. We also figured out the Group and Dataset name using h5py. This info was used when loading weights into C++ code using HDF5 C++ API.

We compared the output layer by layer between C++ code and Python code. For our U-Net, all the activation maps are the same till maybe the third block (after 2 pooling). After that, there is a tiny difference between pixel values. The absolute percentage error is 10^-8 so we don't think it's that bad. We are still in the process of polishing the C++ implementation.

Again, thanks for all the suggestions and answers we got in this post. Hope our solution can be helpful as well!

Nameplate answered 23/8, 2019 at 16:21 Comment(1)
Heeey just adding a comment here, you can use ConvTranspose2D with fixed weights and no bias learned to simulate bilinear interpolation, I did this for onnx, and compared it with a bilinear interpolation custom layer and the deconv worked better, also faster.Firebrat
F
1

Hey I've done something similar, I'd say the best way to tackle the issue is to export your model to .onnx with a good like this one, if you check the support matrix for onnx, upsample is supported: enter image description here

Then you can use https://github.com/onnx/onnx-tensorrt to convert the onnx-model to tensorrt, I've this to convert a network that I trained in pytorch and that had upsample. The repo for onnx-tensorrt is a bit more active, and if you check the pr tab you can check other people writing custom layers and fork from there.

Firebrat answered 26/7, 2019 at 13:17 Comment(3)
ps. maybe you can post your model once you convert it to onnx and I can parse it to trt for you :)Firebrat
Hi, thank you for answering. I just had a quick question. TensorRT documentation said that the tensorRT engine is device-specific (devtalk.nvidia.com/default/topic/1030042/jetson-tx1/…). So for it to work, I will have to run this script on the deployment platform as well? Thank you!Nameplate
Yes, you need to serialize it in the platform you will end up using. But onnx-trt builds on jetson and xavier.Firebrat

© 2022 - 2024 — McMap. All rights reserved.