What is *(int*)&data[18] actually doing in this code?
Asked Answered
I

1

5

I came across this syntax for reading a BMP file in C++

#include <fstream>
int main() {
    std::ifstream in('filename.bmp', std::ifstream::binary);
    in.seekg(0, in.end);
    size = in.tellg();
    in.seekg(0);
    unsigned char * data = new unsigned char[size];
    in.read((unsigned char *)data, size);

    int width = *(int*)&data[18];
    // omitted remainder for minimal example
}

and I don't understand what the line

int width = *(int*)&data[18];

is actually doing. Why doesn't a simple cast from unsigned char * to int, int width = (int)data[18];, work?

Implosion answered 4/12, 2019 at 23:57 Comment(13)
It is taking the memory address of data[18], treating it as a pointer to an integer, then dereferencing it. Basically, treating it as a number. This seems like UB though, since data is only size 1Kape
@Kape &data[18] is size 8 on 64-bit and 4 on 32-bit, so I think this is only undefined behavior if that ends up causing a read past the end of data, no?Sabbatical
What is *(int)&data[18] actually doing in this code?* Violating the Strict Aliasing Rule, so it could be doing absolutely anything.Skit
@WilliamMiller, unless I'm misreading, data is allocated as an array of 1 unsigned char. I think it should have been new unsigned char[size]Kape
@Kape I totally missed that, I wonder if it's a typoSabbatical
@Kape that was a typo, thanks for pointing it outImplosion
Thanks for cleaning that up, @Implosion . Unfortunately it's still Undefined Behaviour. You can view any object as an array of characters, but the reverse is not true. You can only convert an array of characters to another type if certain conditions are met, and in this case they probably aren't. Consider using int width; memcpy(&width, &data[18], sizeof (width)); to ensure correct alignment and praying that there are no endian issues.Skit
@Skit thanks for that explanation, I'm confused why this works if it's undefined behavior? (it does get the correct value for width as far as i can tell)Implosion
Undefined Behaviour can work; it's just not guaranteed to or even be consistent. In this case the writer is making a bunch of assumptions about how the CPU works and what the CPU will let it get away with. Here they're probably only taking a small performance hit on the CPUs that will allow misaligned accesses. This code will, as @WilliamMiller hints at in his answer, fail hilariously on a system with a non-32 bit int. That makes int32_t width; memcpy(&width, &data[18], sizeof (width)); a better idea than my previous suggestion. The size is fixed.Skit
Undefined behavior can sometimes appear to work. It's just not guaranteed.Enviable
@Skit that makes more sense, so int32_t width; memcp(&width, &data[18], sizeof(width)); is implementation agnostic but *(int*)&data[18] will fail if int isn't 32 bits?Implosion
Yes, but *(int*)&data[18] will also fail on CPUs that require a 32 bit number to be aligned to a 32 bit address (Some CPUs will allow mis-aligned data, but access it much more slowly). Assuming that data is aligned to whatever size data the CPU prefers (usually 32 or 64 bits) data[18] will not be because 18 is not evenly divisible by 4 (32 bits in bytes). It will also fail if the CPU is big endian and the byte order is backwards.Skit
Off topic, but not quite: https://learn.microsoft.com/en-us/windows/win32/gdi/bitmap-storage.Essary
S
7

Note

As @user4581301 indicated in the comments, this depends on the implementation and will fail in many instances. And as @NathanOliver- Reinstate Monica and @ChrisMM pointed out this is Undefined Behavior and the result is not guaranteed.

According to the bitmap header format, the width of the bitmap in pixels is stored as a signed 32-bit integer beginning at byte offset 18. The syntax

int width = *(int*)&data[18];

reads bytes 19 through 22, inclusive (assuming a 32-bit int) and interprets the result as an integer.

How?

  • &data[18] gets the address of the unsigned char at index 18
  • (int*) casts the address from unsigned char* to int* to avoid loss of precision on 64 bit architectures
  • *(int*) dereferences the address to get the referred int value

So basically, it takes the address of data[18] and reads the bytes at that address as if they were an integer.

Why doesn't a simple cast to `int` work?

sizeof(data[18]) is 1, because unsigned char is one byte (0-255) but sizeof(&data[18]) is 4 if the system is 32-bit and 8 if it is 64-bit, this can be larger (or even smaller for 16-bit systems) but with the exception of 16-bit systems it should be at minimum 4 bytes. Obviously reading more than 4 bytes is not desired in this case, and the cast to (int*) and subsequent dereference to int yields 4 bytes, and indeed the 4 bytes between offsets 18 and 21, inclusive. A simple cast from unsigned char to int will also yield 4 bytes, but only one byte of the information from data. This is illustrated by the following example:

#include <iostream>
#include <bitset>

int main() {
    // Populate 18-21 with a recognizable pattern for demonstration
    std::bitset<8> _bits(std::string("10011010"));
    unsigned long bits = _bits.to_ulong();
    for (int ii = 18; ii < 22; ii ++) {
        data[ii] = static_cast<unsigned char>(bits);
    }

    std::cout << "data[18]                    -> 1 byte  " 
        << std::bitset<32>(data[18]) << std::endl;
    std::cout << "*(unsigned short*)&data[18] -> 2 bytes " 
        << std::bitset<32>(*(unsigned short*)&data[18]) << std::endl;
    std::cout << "*(int*)&data[18]            -> 4 bytes " 
        << std::bitset<32>(*(int*)&data[18]) << std::endl;
}
data[18]                    -> 1 byte  00000000000000000000000010011010
*(unsigned short*)&data[18] -> 2 bytes 00000000000000001001101010011010
*(int*)&data[18]            -> 4 bytes 10011010100110101001101010011010
Sabbatical answered 5/12, 2019 at 0:1 Comment(2)
You should note that this cast is UB. There is no integer there so synthesizing one is illegal.Wallboard
@NathanOliver-ReinstateMonica Good point, I'm glad others noticed thatSabbatical

© 2022 - 2024 — McMap. All rights reserved.