Detect endianness of binary file data
Asked Answered
L

2

6

Recently I was (again) reading about 'endian'ness. I know how to identify the endianness of host, as there are lots of post on SO, and also I have seen this, which I think is pretty good resource.

However, one thing I like to know is to how to detect the endianness of input binary file. For example, I am reading a binary file (using C++) like following:

ifstream mydata("mydata.raw", ios::binary);

short value;
char buf[sizeof(short)];
int dataCount = 0;

short myDataMat[DATA_DIMENSION][DATA_DIMENSION];
while (mydata.read(reinterpret_cast<char*>(&buf), sizeof(buf)))
{
    memcpy(&value, buf, sizeof(value));
    myDataMat[dataCount / DATA_DIMENSION][dataCount%DATA_DIMENSION] = value;
    dataCount++;
}

I like to know how I can detect the endianness in the mydata.raw, and whether endianness affects this program anyway.

Additional Information:

  • I am only manipulating the data in myDataMat using mathematical operations, and no pointer operation or bitwise operation is done on the data).
  • My machine (host) is little endian.
Lucre answered 17/6, 2016 at 8:49 Comment(3)
You cannot detech endianness of a binary file. Just use htons etc when writing data to the file and ntohs etc when reading itCordiality
In short: You cannot.Telltale
I like to know how I can detect the endianness... You cannot detect it. Either the file itself contains some indication of the endianness it uses, or you're short of luck. If you read 0x2a00, you cannot determine if it is 42 (little endian) or 10752 (big endian).Tagore
F
10

It is impossible to "detect" the endianity of data in general. Just like it is impossible to detect whether the data is an array of 4 byte integers, or twice that many 2 byte integers. Without any knowledge about the representation, raw data is just a mass of meaningless bits.

However, with some extra knowledge about the data representation, it become possible. Some examples:

  • Most file formats mandate particular endianity, in which case this is never a problem.
  • Unicode text files may optionally start with a byte order mark. Same idea can be implemented by other data representations.
  • Some file formats contain a checksum. You can guess one endianity, and if the checksum does not match, try again with another endianity. It will be unlikely that the checksum matches with wrong interpretation of the data.
  • Sometimes you can make guesses based on the data. Is the temperature outside 33'554'432 degrees, or maybe 2? You can pick the endianity that represents sane data. Of course, this type of guesswork fails miserably, when the aliens invade and start melting our planet.
Fluorometer answered 17/6, 2016 at 9:21 Comment(0)
K
5

You can't tell.

The endianness transformation is essentially an operator E(x) on a number x such that x = E(E(x)). So you don't know "which way round" the x elements are in your file.

Kildare answered 17/6, 2016 at 8:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.