TFRecord vs RecordIO
Asked Answered
B

1

5

TensorFlow Object Detection API prefers TFRecord file format. MXNet and Amazon Sagemaker seem to use RecordIO format. How are these two binary file formats different, or are they the same thing?

Broach answered 9/11, 2018 at 4:4 Comment(0)
C
11

RecordIO and TFRecord are same in the sense that they are serving the same purpose - to put data in one sequence for faster reading, and both of them using Protocol buffers under the hood for better space allocation.

It seems to me that RecordIO is more like an umbrella term: a format that is used to store huge chunk of data in one file for faster reading. Some products adopt "RecordIO" as an actual term, but in Tensorflow they decided to use a specific word for that - TFRecord. That's why some people call TFRecord as "TensorFlow-flavored RecordIO format".

There is no single RecordIO format as is. People from Apache Mesos, who also call their format RecordIO, say: "Since there is no formal specification of the RecordIO format, there tend to be slight incompatibilities between RecordIO implementations". And their RecordIO format is different from the one MXNet uses - I don't see "magic number" at the beginning of each record.

So, on structure level TFRecord of Tensorflow and RecordIO of MXNet are different file formats, e.g. you don't expect MXNet to be able to read TFRecord and vice versa. But on a logical level - they serve same purpose and can be considered similar.

Croatian answered 9/11, 2018 at 19:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.