Why my Tensorflow network becomes slower after applying the Quantization tools on GeForce GTX 1080?

I tried the quantization tools in Tensorflow with a toy model. It surely reduced the model to about 25% percent, however, increase the executing time by many times.

GPU is fully used when both model runs. So I am wondering what is wrong? I guess there are two possibilities:

Tensorflow Quantization tool doesn't utilize float computing cores on GPU.
Something wrong with my deploys.

Any suggestions are welcomed! Thanks!

The model I use is:

def dense_cnn_model(weights):
    def conv2d(x, W): 
        return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

    def max_pool_2x2(x):
        return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], 
                              strides=[1, 2, 2, 1], padding='SAME')

    x_image = tf.reshape(x, [-1,28,28,1])
    h_conv1 = tf.nn.relu(conv2d(x_image, weights["w_conv1"]) + weights["b_conv1"])
    h_pool1 = max_pool_2x2(h_conv1)
    h_conv2 = tf.nn.relu(conv2d(h_pool1, weights["w_conv2"]) + weights["b_conv2"])
    h_pool2 = max_pool_2x2(h_conv2)
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, weights["w_fc1"]) + weights["b_fc1"])
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
    y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, weights["w_fc2"]) + weights["b_fc2"],name='softmax')
    return y_conv

Using the quantization tool, the frozen graph was compressed from 13M to 3.2M.

-rw-rw-r-- 1 yonghu yonghu 3.2M Aug  3 22:27 quantified_const_kb.pb
-rw-rw-r-- 1 yonghu yonghu  13M Aug  3 22:22 unified_const_kb.pb

However, it becomes slower on GeForce GTX 1080. The benchmarked performance of the original model is as following:

I tensorflow/core/util/stat_summarizer.cc:218] 50 runs, avg 15.25 ms, 47 nodes defined 67 nodes observed
============ By run order =================
  [start]  [first]    [avg]      [%]      [cdf%]          [Op]  [Name]
    0.000    0.013    0.063   0.413%      0.413%                _SOURCE
    2.177    0.012    0.010   0.063%      0.476%                w_conv1/read/_7__cf__7
    2.193    0.009    0.007   0.046%      0.522%                b_conv1/read/_6__cf__6
    2.205    0.007    0.007   0.046%      0.567%                w_conv2/read/_5__cf__5
    2.215    0.007    0.007   0.044%      0.611%                b_conv2/read/_4__cf__4
    2.225    0.010    0.007   0.043%      0.654%                b_fc1/read/_3__cf__3
    2.237    0.008    0.006   0.042%      0.696%                dropout/random_uniform/sub/_2__cf__2
    2.247    0.008    0.006   0.040%      0.736%                w_fc2/read/_1__cf__1
    2.257    0.006    0.006   0.039%      0.774%                b_fc2/read/_0__cf__0
    2.266    0.010    0.006   0.042%      0.816%         Const  w_fc1
    2.279    0.008    0.006   0.040%      0.857%         Const  Reshape/shape
    2.288    0.001    0.002   0.014%      0.870%                edge_67__recv_x_0:MEMCPYHtoD
    2.290    0.007    0.006   0.038%      0.908%         Const  Reshape_1/shape
    2.299    0.007    0.006   0.041%      0.949%         Const  dropout/random_uniform/min
    2.308    0.008    0.006   0.042%      0.992%      Identity  w_fc1/read
    2.434    0.013    0.012   0.082%      1.074%       Reshape  Reshape
    2.452  523.172   10.644  69.788%     70.862%        Conv2D  Conv2D
  310.531    0.005    0.047   0.308%     71.170%                Conv2D:Conv2D
  524.645    0.001    0.000   0.000%     71.170%                Conv2D:Conv2D:MEMCPYHtoD
  525.636    0.035    0.039   0.253%     71.424%           Add  add
  525.667    0.004    0.005   0.032%     71.455%                add:Add
  525.677    0.020    0.031   0.206%     71.661%          Relu  Relu
  525.694    0.003    0.004   0.024%     71.686%                Relu:Relu
  525.701    0.031    0.043   0.285%     71.971%       MaxPool  MaxPool
  525.726    0.007    0.008   0.054%     72.025%                MaxPool:MaxPool
  525.735    0.962    0.192   1.258%     73.283%        Conv2D  Conv2D_1
  525.751    0.005    0.133   0.874%     74.157%                Conv2D_1:Conv2D
  526.705    0.019    0.034   0.225%     74.382%           Add  add_1
  526.730    0.015    0.029   0.190%     74.572%          Relu  Relu_1
  526.749    0.005    0.005   0.036%     74.608%                add_1:Add
  526.750    0.021    0.039   0.253%     74.861%       MaxPool  MaxPool_1
  526.756    0.003    0.004   0.024%     74.885%                Relu_1:Relu
  526.766    0.006    0.006   0.040%     74.925%                MaxPool_1:MaxPool
  526.775    0.006    0.008   0.055%     74.980%       Reshape  Reshape_1
  526.784  144.271    2.923  19.166%     94.146%        MatMul  MatMul
  670.941    0.001    0.000   0.000%     94.146%                MatMul:MatMul:MEMCPYHtoD
  671.038    0.070    0.089   0.585%     94.731%                MatMul:MatMul
  671.063    0.037    0.031   0.203%     94.935%           Add  add_2
  671.104    0.019    0.027   0.178%     95.113%          Relu  Relu_2
  671.110    0.005    0.006   0.041%     95.154%                add_2:Add
  671.121    0.003    0.003   0.023%     95.176%                Relu_2:Relu
  671.126    0.008    0.010   0.066%     95.242%         Shape  dropout/Shape
  671.136    0.029    0.030   0.196%     95.438%           Div  dropout/Div
  671.162    0.004    0.006   0.041%     95.479%                dropout/Div:Div
  671.167    0.021    0.029   0.193%     95.672%    RandomUniform   dropout/random_uniform/RandomUniform
  671.185    0.005    0.008   0.051%     95.723%                dropout/random_uniform/RandomUniform:RandomUniform
  671.191    0.027    0.029   0.187%     95.910%           Mul  dropout/random_uniform/mul
  671.215    0.003    0.004   0.023%     95.933%                dropout/random_uniform/mul:Mul
  671.221    0.018    0.027   0.176%     96.109%           Add  dropout/random_uniform
  671.237    0.003    0.004   0.024%     96.133%                dropout/random_uniform:Add
  671.242    0.016    0.027   0.178%     96.311%           Add  dropout/add
  671.256    0.003    0.003   0.022%     96.333%                dropout/add:Add
  671.261    0.024    0.026   0.169%     96.502%         Floor  dropout/Floor
  671.283    0.004    0.004   0.028%     96.530%                dropout/Floor:Floor
  671.288    0.017    0.027   0.180%     96.710%           Mul  dropout/mul
  671.303    0.003    0.004   0.023%     96.733%                dropout/mul:Mul
  671.308    0.019    0.034   0.223%     96.956%        MatMul  MatMul_1
  671.325    0.017    0.023   0.149%     97.106%                MatMul_1:MatMul
  671.330    0.016    0.030   0.195%     97.300%           Add  add_3
  671.345    0.007    0.009   0.060%     97.360%                add_3:Add
  671.349    0.177    0.125   0.822%     98.183%       Softmax  softmax
  671.366    0.003    0.027   0.177%     98.360%                softmax:Softmax
  671.621    0.001    0.001   0.009%     98.368%                edge_13_softmax:MEMCPYDtoH
  671.732    0.004    0.057   0.375%     98.743%                _SINK
18446744074384.223    0.001    0.001      0.006%     98.749%                unknown:MEMCPYHtoD
18446744074384.363    0.004    0.190      1.246%     99.996%                unknown
18446744074385.602    0.001    0.001      0.004%    100.000%                unknown:MEMCPYDtoH

After quantization:

I tensorflow/core/util/stat_summarizer.cc:218] 50 runs, avg 99.44 ms, 114 nodes defined 83 nodes observed
============ By run order =================
  [start]  [first]    [avg]      [%]      [cdf%]          [Op]  [Name]
    0.000    0.039    0.158   0.159%      0.159%                _SOURCE
    0.111    0.018    0.010   0.010%      0.169%                dropout/keep_prob/_3__cf__3
    0.138    0.010    0.011   0.011%      0.180%                dropout/random_uniform/min/_1__cf__1
    0.154    0.009    0.009   0.009%      0.189%                b_fc2/_0__cf__0
    0.169    0.010    0.009   0.009%      0.198%         Const  w_conv1_quint8_const
    0.184    0.008    0.008   0.008%      0.206%         Const  w_conv1_min
    0.195    0.009    0.008   0.008%      0.214%         Const  w_conv1_max
    0.208    0.056    0.009   0.009%      0.224%         Const  w_conv2_quint8_const
    0.269    0.010    0.007   0.007%      0.231%         Const  w_conv2_min
    0.283    0.009    0.007   0.007%      0.238%         Const  w_conv2_max
    0.295    0.008    0.008   0.009%      0.247%         Const  w_fc1_quint8_const
    0.307    0.013    0.007   0.007%      0.254%         Const  w_fc1_min
    0.324    0.010    0.007   0.007%      0.261%         Const  w_fc1_max
    0.338    0.010    0.007   0.007%      0.268%         Const  w_fc2_quint8_const
    0.350    0.007    0.007   0.007%      0.275%         Const  w_fc2_min
    0.360    0.007    0.007   0.007%      0.282%         Const  w_fc2_max
    0.370    0.010    0.007   0.007%      0.289%                b_conv1/_6__cf__6
    0.392    0.009    0.008   0.008%      0.297%                b_conv2/_5__cf__5
    0.411    0.008    0.008   0.008%      0.305%                b_fc1/_4__cf__4
    3.380    0.017    0.013   0.013%      0.317%         Const  Reshape/shape
    3.402    0.014    0.011   0.011%      0.328%         Const  Conv2D_eightbit_reshape_dims
    3.419    0.010    0.012   0.013%      0.341%         Const  Conv2D_eightbit_reduction_dims
    3.431    0.007    0.010   0.010%      0.351%         Const  Reshape_1/shape
   34.110    0.020    0.016   0.016%      0.368%       Reshape  Reshape
   34.159  352.617    7.132   7.172%      7.540%           Sub  dropout/random_uniform/sub
   34.234    0.010    0.011   0.011%      7.551%       Reshape  Conv2D_eightbit_reshape_Reshape
   34.249  352.581    7.113   7.153%     14.704%           Min  Conv2D_eightbit_min_Reshape
  386.852    0.063    0.043   0.043%     14.747%           Max  Conv2D_eightbit_max_Reshape
  387.104    0.070    0.057   0.058%     14.804%    QuantizeV2  Conv2D_eightbit_quantize_Reshape
  387.181    3.764    2.210   2.222%     17.027%    QuantizedConv2D Conv2D_eightbit_quantized_conv
  390.964    0.771    0.674   0.677%     17.704%    QuantizeDownAndShrinkRange  Conv2D_eightbit_quantize_down
  391.742    0.681    0.583   0.586%     18.290%    Dequantize  Conv2D
  392.608    0.086    0.064   0.064%     18.354%           Add  add
  392.781    0.012    0.011   0.011%     18.365%       Reshape  Relu_eightbit_reshape_add
  392.798    0.055    0.048   0.048%     18.413%           Min  Relu_eightbit_min_add
  392.858    0.041    0.038   0.039%     18.452%           Max  Relu_eightbit_max_add
  393.035    0.266    0.274   0.276%     18.728%    QuantizeV2  Relu_eightbit_quantize_add
  393.306    0.052    0.110   0.111%     18.838%    QuantizedRelu   Relu_eightbit_quantized
  393.362    0.201    0.152   0.153%     18.991%    QuantizedMaxPool    MaxPool_eightbit_quantized
  393.567   22.550   23.069  23.199%     42.190%    QuantizedConv2D Conv2D_1_eightbit_quantized_conv
  416.126    0.211    0.354   0.356%     42.546%    QuantizeDownAndShrinkRange  Conv2D_1_eightbit_quantize_down
  416.343    0.127    0.266   0.268%     42.814%    Dequantize  Conv2D_1
  416.577    0.035    0.058   0.058%     42.871%           Add  add_1
  416.654    0.007    0.011   0.011%     42.882%       Reshape  Relu_1_eightbit_reshape_add_1
  416.664    0.023    0.043   0.044%     42.926%           Min  Relu_1_eightbit_min_add_1
  416.690    0.018    0.033   0.033%     42.959%           Max  Relu_1_eightbit_max_add_1
  416.779    0.158    0.179   0.180%     43.140%    QuantizeV2  Relu_1_eightbit_quantize_add_1
  416.940    0.029    0.057   0.058%     43.197%    QuantizedRelu   Relu_1_eightbit_quantized
  416.973    0.089    0.082   0.082%     43.279%    QuantizedMaxPool    MaxPool_1_eightbit_quantized
  417.065    0.037    0.072   0.072%     43.352%    Dequantize  MaxPool_1
  417.175    0.008    0.011   0.011%     43.363%       Reshape  Reshape_1
  417.226    0.007    0.008   0.008%     43.371%       Reshape  MatMul_eightbit_reshape_Reshape_1
  417.237    0.028    0.047   0.048%     43.419%           Min  MatMul_eightbit_min_Reshape_1
  417.269    0.017    0.034   0.034%     43.453%           Max  MatMul_eightbit_max_Reshape_1
  417.360    0.076    0.109   0.109%     43.562%    QuantizeV2  MatMul_eightbit_quantize_Reshape_1
  417.440   31.302   54.697  55.005%     98.567%    QuantizedMatMul MatMul_eightbit_quantized_bias_add
  448.748    0.022    0.033   0.033%     98.601%    QuantizeDownAndShrinkRange  MatMul_eightbit_quantize_down
  448.773    0.016    0.024   0.024%     98.625%    Dequantize  MatMul
  448.908    0.034    0.052   0.052%     98.677%           Add  add_2
  448.980    0.006    0.008   0.009%     98.685%       Reshape  Relu_2_eightbit_reshape_add_2
  448.990    0.022    0.036   0.036%     98.721%           Min  Relu_2_eightbit_min_add_2
  449.015    0.017    0.027   0.027%     98.748%           Max  Relu_2_eightbit_max_add_2
  449.103    0.032    0.038   0.038%     98.786%    QuantizeV2  Relu_2_eightbit_quantize_add_2
  449.139    0.013    0.014   0.014%     98.801%    QuantizedRelu   Relu_2_eightbit_quantized
  449.156    0.016    0.023   0.023%     98.824%    Dequantize  Relu_2
  449.180    0.007    0.008   0.008%     98.832%         Shape  dropout/Shape
  449.215    0.092    0.086   0.086%     98.918%    RandomUniform   dropout/random_uniform/RandomUniform
  449.292    0.090    0.080   0.080%     98.999%           Div  dropout/Div
  449.314    0.105    0.053   0.053%     99.052%           Mul  dropout/random_uniform/mul
  449.425    0.039    0.053   0.053%     99.105%           Add  dropout/random_uniform
  449.469    0.054    0.046   0.046%     99.151%           Add  dropout/add
  449.528    0.043    0.032   0.032%     99.183%         Floor  dropout/Floor
  449.575    0.033    0.029   0.029%     99.212%           Mul  dropout/mul
  449.724    0.011    0.010   0.010%     99.222%       Reshape  MatMul_1_eightbit_reshape_dropout/mul
  449.740    0.050    0.042   0.042%     99.264%           Min  MatMul_1_eightbit_min_dropout/mul
  449.795    0.037    0.031   0.032%     99.296%           Max  MatMul_1_eightbit_max_dropout/mul
  449.986    0.085    0.070   0.070%     99.366%    QuantizeV2  MatMul_1_eightbit_quantize_dropout/mul
  450.077    0.525    0.367   0.369%     99.736%    QuantizedMatMul MatMul_1_eightbit_quantized_bias_add
  450.608    0.015    0.012   0.012%     99.747%    QuantizeDownAndShrinkRange  MatMul_1_eightbit_quantize_down
  450.627    0.013    0.010   0.010%     99.757%    Dequantize  MatMul_1
  450.765    0.055    0.051   0.051%     99.808%           Add  add_3
  450.825    0.254    0.133   0.134%     99.942%       Softmax  softmax
  451.200    0.006    0.058   0.058%    100.000%                _SINK

Recommended topics

Hot tags