TFRecord vs RecordIO

RecordIO and TFRecord are same in the sense that they are serving the same purpose - to put data in one sequence for faster reading, and both of them using Protocol buffers under the hood for better space allocation.

It seems to me that RecordIO is more like an umbrella term: a format that is used to store huge chunk of data in one file for faster reading. Some products adopt "RecordIO" as an actual term, but in Tensorflow they decided to use a specific word for that - TFRecord. That's why some people call TFRecord as "TensorFlow-flavored RecordIO format".

There is no single RecordIO format as is. People from Apache Mesos, who also call their format RecordIO, say: "Since there is no formal specification of the RecordIO format, there tend to be slight incompatibilities between RecordIO implementations". And their RecordIO format is different from the one MXNet uses - I don't see "magic number" at the beginning of each record.

So, on structure level TFRecord of Tensorflow and RecordIO of MXNet are different file formats, e.g. you don't expect MXNet to be able to read TFRecord and vice versa. But on a logical level - they serve same purpose and can be considered similar.

Recommended topics

Hot tags