I tried the quantization tools in Tensorflow with a toy model. It surely reduced the model to about 25% percent, however, increase the executing time by many times.
GPU is fully used when both model runs. So I am wondering what is wrong? I guess there are two possibilities:
- Tensorflow Quantization tool doesn't utilize float computing cores on GPU.
- Something wrong with my deploys.
Any suggestions are welcomed! Thanks!
The model I use is:
def dense_cnn_model(weights):
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, weights["w_conv1"]) + weights["b_conv1"])
h_pool1 = max_pool_2x2(h_conv1)
h_conv2 = tf.nn.relu(conv2d(h_pool1, weights["w_conv2"]) + weights["b_conv2"])
h_pool2 = max_pool_2x2(h_conv2)
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, weights["w_fc1"]) + weights["b_fc1"])
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, weights["w_fc2"]) + weights["b_fc2"],name='softmax')
return y_conv
Using the quantization tool, the frozen graph was compressed from 13M to 3.2M.
-rw-rw-r-- 1 yonghu yonghu 3.2M Aug 3 22:27 quantified_const_kb.pb
-rw-rw-r-- 1 yonghu yonghu 13M Aug 3 22:22 unified_const_kb.pb
However, it becomes slower on GeForce GTX 1080. The benchmarked performance of the original model is as following:
I tensorflow/core/util/stat_summarizer.cc:218] 50 runs, avg 15.25 ms, 47 nodes defined 67 nodes observed
============ By run order =================
[start] [first] [avg] [%] [cdf%] [Op] [Name]
0.000 0.013 0.063 0.413% 0.413% _SOURCE
2.177 0.012 0.010 0.063% 0.476% w_conv1/read/_7__cf__7
2.193 0.009 0.007 0.046% 0.522% b_conv1/read/_6__cf__6
2.205 0.007 0.007 0.046% 0.567% w_conv2/read/_5__cf__5
2.215 0.007 0.007 0.044% 0.611% b_conv2/read/_4__cf__4
2.225 0.010 0.007 0.043% 0.654% b_fc1/read/_3__cf__3
2.237 0.008 0.006 0.042% 0.696% dropout/random_uniform/sub/_2__cf__2
2.247 0.008 0.006 0.040% 0.736% w_fc2/read/_1__cf__1
2.257 0.006 0.006 0.039% 0.774% b_fc2/read/_0__cf__0
2.266 0.010 0.006 0.042% 0.816% Const w_fc1
2.279 0.008 0.006 0.040% 0.857% Const Reshape/shape
2.288 0.001 0.002 0.014% 0.870% edge_67__recv_x_0:MEMCPYHtoD
2.290 0.007 0.006 0.038% 0.908% Const Reshape_1/shape
2.299 0.007 0.006 0.041% 0.949% Const dropout/random_uniform/min
2.308 0.008 0.006 0.042% 0.992% Identity w_fc1/read
2.434 0.013 0.012 0.082% 1.074% Reshape Reshape
2.452 523.172 10.644 69.788% 70.862% Conv2D Conv2D
310.531 0.005 0.047 0.308% 71.170% Conv2D:Conv2D
524.645 0.001 0.000 0.000% 71.170% Conv2D:Conv2D:MEMCPYHtoD
525.636 0.035 0.039 0.253% 71.424% Add add
525.667 0.004 0.005 0.032% 71.455% add:Add
525.677 0.020 0.031 0.206% 71.661% Relu Relu
525.694 0.003 0.004 0.024% 71.686% Relu:Relu
525.701 0.031 0.043 0.285% 71.971% MaxPool MaxPool
525.726 0.007 0.008 0.054% 72.025% MaxPool:MaxPool
525.735 0.962 0.192 1.258% 73.283% Conv2D Conv2D_1
525.751 0.005 0.133 0.874% 74.157% Conv2D_1:Conv2D
526.705 0.019 0.034 0.225% 74.382% Add add_1
526.730 0.015 0.029 0.190% 74.572% Relu Relu_1
526.749 0.005 0.005 0.036% 74.608% add_1:Add
526.750 0.021 0.039 0.253% 74.861% MaxPool MaxPool_1
526.756 0.003 0.004 0.024% 74.885% Relu_1:Relu
526.766 0.006 0.006 0.040% 74.925% MaxPool_1:MaxPool
526.775 0.006 0.008 0.055% 74.980% Reshape Reshape_1
526.784 144.271 2.923 19.166% 94.146% MatMul MatMul
670.941 0.001 0.000 0.000% 94.146% MatMul:MatMul:MEMCPYHtoD
671.038 0.070 0.089 0.585% 94.731% MatMul:MatMul
671.063 0.037 0.031 0.203% 94.935% Add add_2
671.104 0.019 0.027 0.178% 95.113% Relu Relu_2
671.110 0.005 0.006 0.041% 95.154% add_2:Add
671.121 0.003 0.003 0.023% 95.176% Relu_2:Relu
671.126 0.008 0.010 0.066% 95.242% Shape dropout/Shape
671.136 0.029 0.030 0.196% 95.438% Div dropout/Div
671.162 0.004 0.006 0.041% 95.479% dropout/Div:Div
671.167 0.021 0.029 0.193% 95.672% RandomUniform dropout/random_uniform/RandomUniform
671.185 0.005 0.008 0.051% 95.723% dropout/random_uniform/RandomUniform:RandomUniform
671.191 0.027 0.029 0.187% 95.910% Mul dropout/random_uniform/mul
671.215 0.003 0.004 0.023% 95.933% dropout/random_uniform/mul:Mul
671.221 0.018 0.027 0.176% 96.109% Add dropout/random_uniform
671.237 0.003 0.004 0.024% 96.133% dropout/random_uniform:Add
671.242 0.016 0.027 0.178% 96.311% Add dropout/add
671.256 0.003 0.003 0.022% 96.333% dropout/add:Add
671.261 0.024 0.026 0.169% 96.502% Floor dropout/Floor
671.283 0.004 0.004 0.028% 96.530% dropout/Floor:Floor
671.288 0.017 0.027 0.180% 96.710% Mul dropout/mul
671.303 0.003 0.004 0.023% 96.733% dropout/mul:Mul
671.308 0.019 0.034 0.223% 96.956% MatMul MatMul_1
671.325 0.017 0.023 0.149% 97.106% MatMul_1:MatMul
671.330 0.016 0.030 0.195% 97.300% Add add_3
671.345 0.007 0.009 0.060% 97.360% add_3:Add
671.349 0.177 0.125 0.822% 98.183% Softmax softmax
671.366 0.003 0.027 0.177% 98.360% softmax:Softmax
671.621 0.001 0.001 0.009% 98.368% edge_13_softmax:MEMCPYDtoH
671.732 0.004 0.057 0.375% 98.743% _SINK
18446744074384.223 0.001 0.001 0.006% 98.749% unknown:MEMCPYHtoD
18446744074384.363 0.004 0.190 1.246% 99.996% unknown
18446744074385.602 0.001 0.001 0.004% 100.000% unknown:MEMCPYDtoH
After quantization:
I tensorflow/core/util/stat_summarizer.cc:218] 50 runs, avg 99.44 ms, 114 nodes defined 83 nodes observed
============ By run order =================
[start] [first] [avg] [%] [cdf%] [Op] [Name]
0.000 0.039 0.158 0.159% 0.159% _SOURCE
0.111 0.018 0.010 0.010% 0.169% dropout/keep_prob/_3__cf__3
0.138 0.010 0.011 0.011% 0.180% dropout/random_uniform/min/_1__cf__1
0.154 0.009 0.009 0.009% 0.189% b_fc2/_0__cf__0
0.169 0.010 0.009 0.009% 0.198% Const w_conv1_quint8_const
0.184 0.008 0.008 0.008% 0.206% Const w_conv1_min
0.195 0.009 0.008 0.008% 0.214% Const w_conv1_max
0.208 0.056 0.009 0.009% 0.224% Const w_conv2_quint8_const
0.269 0.010 0.007 0.007% 0.231% Const w_conv2_min
0.283 0.009 0.007 0.007% 0.238% Const w_conv2_max
0.295 0.008 0.008 0.009% 0.247% Const w_fc1_quint8_const
0.307 0.013 0.007 0.007% 0.254% Const w_fc1_min
0.324 0.010 0.007 0.007% 0.261% Const w_fc1_max
0.338 0.010 0.007 0.007% 0.268% Const w_fc2_quint8_const
0.350 0.007 0.007 0.007% 0.275% Const w_fc2_min
0.360 0.007 0.007 0.007% 0.282% Const w_fc2_max
0.370 0.010 0.007 0.007% 0.289% b_conv1/_6__cf__6
0.392 0.009 0.008 0.008% 0.297% b_conv2/_5__cf__5
0.411 0.008 0.008 0.008% 0.305% b_fc1/_4__cf__4
3.380 0.017 0.013 0.013% 0.317% Const Reshape/shape
3.402 0.014 0.011 0.011% 0.328% Const Conv2D_eightbit_reshape_dims
3.419 0.010 0.012 0.013% 0.341% Const Conv2D_eightbit_reduction_dims
3.431 0.007 0.010 0.010% 0.351% Const Reshape_1/shape
34.110 0.020 0.016 0.016% 0.368% Reshape Reshape
34.159 352.617 7.132 7.172% 7.540% Sub dropout/random_uniform/sub
34.234 0.010 0.011 0.011% 7.551% Reshape Conv2D_eightbit_reshape_Reshape
34.249 352.581 7.113 7.153% 14.704% Min Conv2D_eightbit_min_Reshape
386.852 0.063 0.043 0.043% 14.747% Max Conv2D_eightbit_max_Reshape
387.104 0.070 0.057 0.058% 14.804% QuantizeV2 Conv2D_eightbit_quantize_Reshape
387.181 3.764 2.210 2.222% 17.027% QuantizedConv2D Conv2D_eightbit_quantized_conv
390.964 0.771 0.674 0.677% 17.704% QuantizeDownAndShrinkRange Conv2D_eightbit_quantize_down
391.742 0.681 0.583 0.586% 18.290% Dequantize Conv2D
392.608 0.086 0.064 0.064% 18.354% Add add
392.781 0.012 0.011 0.011% 18.365% Reshape Relu_eightbit_reshape_add
392.798 0.055 0.048 0.048% 18.413% Min Relu_eightbit_min_add
392.858 0.041 0.038 0.039% 18.452% Max Relu_eightbit_max_add
393.035 0.266 0.274 0.276% 18.728% QuantizeV2 Relu_eightbit_quantize_add
393.306 0.052 0.110 0.111% 18.838% QuantizedRelu Relu_eightbit_quantized
393.362 0.201 0.152 0.153% 18.991% QuantizedMaxPool MaxPool_eightbit_quantized
393.567 22.550 23.069 23.199% 42.190% QuantizedConv2D Conv2D_1_eightbit_quantized_conv
416.126 0.211 0.354 0.356% 42.546% QuantizeDownAndShrinkRange Conv2D_1_eightbit_quantize_down
416.343 0.127 0.266 0.268% 42.814% Dequantize Conv2D_1
416.577 0.035 0.058 0.058% 42.871% Add add_1
416.654 0.007 0.011 0.011% 42.882% Reshape Relu_1_eightbit_reshape_add_1
416.664 0.023 0.043 0.044% 42.926% Min Relu_1_eightbit_min_add_1
416.690 0.018 0.033 0.033% 42.959% Max Relu_1_eightbit_max_add_1
416.779 0.158 0.179 0.180% 43.140% QuantizeV2 Relu_1_eightbit_quantize_add_1
416.940 0.029 0.057 0.058% 43.197% QuantizedRelu Relu_1_eightbit_quantized
416.973 0.089 0.082 0.082% 43.279% QuantizedMaxPool MaxPool_1_eightbit_quantized
417.065 0.037 0.072 0.072% 43.352% Dequantize MaxPool_1
417.175 0.008 0.011 0.011% 43.363% Reshape Reshape_1
417.226 0.007 0.008 0.008% 43.371% Reshape MatMul_eightbit_reshape_Reshape_1
417.237 0.028 0.047 0.048% 43.419% Min MatMul_eightbit_min_Reshape_1
417.269 0.017 0.034 0.034% 43.453% Max MatMul_eightbit_max_Reshape_1
417.360 0.076 0.109 0.109% 43.562% QuantizeV2 MatMul_eightbit_quantize_Reshape_1
417.440 31.302 54.697 55.005% 98.567% QuantizedMatMul MatMul_eightbit_quantized_bias_add
448.748 0.022 0.033 0.033% 98.601% QuantizeDownAndShrinkRange MatMul_eightbit_quantize_down
448.773 0.016 0.024 0.024% 98.625% Dequantize MatMul
448.908 0.034 0.052 0.052% 98.677% Add add_2
448.980 0.006 0.008 0.009% 98.685% Reshape Relu_2_eightbit_reshape_add_2
448.990 0.022 0.036 0.036% 98.721% Min Relu_2_eightbit_min_add_2
449.015 0.017 0.027 0.027% 98.748% Max Relu_2_eightbit_max_add_2
449.103 0.032 0.038 0.038% 98.786% QuantizeV2 Relu_2_eightbit_quantize_add_2
449.139 0.013 0.014 0.014% 98.801% QuantizedRelu Relu_2_eightbit_quantized
449.156 0.016 0.023 0.023% 98.824% Dequantize Relu_2
449.180 0.007 0.008 0.008% 98.832% Shape dropout/Shape
449.215 0.092 0.086 0.086% 98.918% RandomUniform dropout/random_uniform/RandomUniform
449.292 0.090 0.080 0.080% 98.999% Div dropout/Div
449.314 0.105 0.053 0.053% 99.052% Mul dropout/random_uniform/mul
449.425 0.039 0.053 0.053% 99.105% Add dropout/random_uniform
449.469 0.054 0.046 0.046% 99.151% Add dropout/add
449.528 0.043 0.032 0.032% 99.183% Floor dropout/Floor
449.575 0.033 0.029 0.029% 99.212% Mul dropout/mul
449.724 0.011 0.010 0.010% 99.222% Reshape MatMul_1_eightbit_reshape_dropout/mul
449.740 0.050 0.042 0.042% 99.264% Min MatMul_1_eightbit_min_dropout/mul
449.795 0.037 0.031 0.032% 99.296% Max MatMul_1_eightbit_max_dropout/mul
449.986 0.085 0.070 0.070% 99.366% QuantizeV2 MatMul_1_eightbit_quantize_dropout/mul
450.077 0.525 0.367 0.369% 99.736% QuantizedMatMul MatMul_1_eightbit_quantized_bias_add
450.608 0.015 0.012 0.012% 99.747% QuantizeDownAndShrinkRange MatMul_1_eightbit_quantize_down
450.627 0.013 0.010 0.010% 99.757% Dequantize MatMul_1
450.765 0.055 0.051 0.051% 99.808% Add add_3
450.825 0.254 0.133 0.134% 99.942% Softmax softmax
451.200 0.006 0.058 0.058% 100.000% _SINK