These are the data types of the output Tensor
of the function, tf.quantization.quantize()
. This corresponds to the Argument, T
of the function.
Mentioned below is the underlying code, which converts/quantizes a Tensor from one Data Type (e.g. float32
) to another (tf.qint8, tf.quint8, tf.qint32, tf.qint16, tf.quint16
).
out[i] = (in[i] - min_range) * range(T) / (max_range - min_range)
if T == qint8: out[i] -= (range(T) + 1) / 2.0
Then, they can be passed to functions like tf.nn.quantized_conv2d
, etc.., whose input is a Quantized Tensor, explained above.
TLDR, to answer your question in short, they are actually stored 8 bits (for qint8
) in memory.
You can find more information about this topic in the below links:
https://www.tensorflow.org/api_docs/python/tf/quantization/quantize
https://www.tensorflow.org/api_docs/python/tf/nn/quantized_conv2d
https://www.tensorflow.org/lite/performance/post_training_quantization
If you feel this answer is useful, kindly accept this answer and/or up vote it. Thanks.
tf.qint8
vstf.int8
. Why is another data type needed? When should one use over the other? – Acie