I have a TensorFlow model on which I would like to perform post-training quantization. I am using C# for software and the device will be in c++ (where I will use tflite). It would be nice to have the same quantized model in both the software and instrument. Is there a way to optimize a TensorFlow model in a similar fashion that happens in TensorFlow Lite conversion:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
There does not appear to be a straight forward way of performing a similar optimization and keeping it a keras model (.h5 or .pb). From what I have read on TensorFlow's website discussing post-training quantization,
These [post-training quantization] techniques can be performed on an already-trained float TensorFlow model and applied during TensorFlow lite conversion. These techniques are enabled as options in the TensorFlow Lite converter.
It sounds like the 'techniques' are bottled within the TFLite converter.
In looking at the tf.lite.Optimize
website, it looks like the DEFAULT
optimization quantizes model weights. One of the things that it looks like it does from my reading is converting the weights from 32-bit float to 8-bit integer - I'm not sure what else is going on.
From the third answer in this stack overflow, TOCO did have a way of doing a tflite -> pb file conversion, but that has been discontinued since TF1.12.
There is a cross platform .Net wrapper for the Google TensorFlow Lite library by Emgu for those that would like to just use the tflite version. This is always a possibility, but it would just be nice to do it right in python and have it be the same conversion without the need for another .Net package.
Is there an easier way to do this that I haven't seen yet? If all that is changed is the weights, could you just replace the weights from the tflite model to the h5/pb model (so the weights are being changed out for quantized weights)?