Dereferencing this pointer gives me -46, but I am not sure why
Asked Answered
S

4

-3

This is a program I ran:

#include <stdio.h>

int main(void)
{
    int y = 1234;
    char *p = &y;
    int *j = &y;
    printf("%d %d\n", *p, *j);
}

I am slightly confused about the output. What I'm seeing is:

-46 1234

I wrote this program as an experiment and wasn't sure what it was going to output. I was expecting possibly one byte from y.

What is happening "behind-the-scenes" here? How does dereferencing p give me -46?

As pointed out by others, I had to do explicit casting to not cause UB. I am not changing that line from char *p = &y; to char *p = (char *)&y; so that I am not invalidating the answers below.

This program is not causing any UB behaviour as pointed here.

Stern answered 16/1, 2016 at 10:46 Comment(0)
L
15

If you have something like ,

int x = 1234;
int *p = &x;

If You Dereference Pointer p then it will correctly read integer bytes. Because You declared it to be pointer to int . It will know how many bytes to read by sizeof() operator. Generally size of int is 4 bytes (for 32/64-bit platforms) but it is machine dependent that is why it will use sizeof() operator to know correct size and will read so.

For Your Code

 int y = 1234;
 char *p = &y;
 int *j  = &y;

Now pointer p points to y but we have declared it to be pointer to a char so it will only read one byte or whatever byte char is of . 1234 in binary would be represented as

        00000000 00000000 00000100 11010010

Now if your machine is little endian it will store the bytes reversing them

        11010010 00000100 00000000 00000000

11010010 is at address 00 Hypothetical address, 00000100 is at address 01 and so on.

BE:      00   01   02   03
       +----+----+----+----+   
    y: | 00 | 00 | 04 | d2 |
       +----+----+----+----+


LE:      00   01   02   03
       +----+----+----+----+
    y: | d2 | 04 | 00 | 00 |
       +----+----+----+----+

(In Hexadecimal)

So now if you dereference pointer p it will read only first byte and output will be (-46in case of signed char and 210 in case of unsigned char, according to the C standard the signed-ness of plain char is "implementation defined.) as Byte read would be 11010010(because we pointed signed char(in this case it is signed char).

On your PC negative numbers are represented as 2's Complement so the most-significant bit is the sign bit. First bit 1 denotes the sign. 11010010 = –128 + 64 + 16 + 2 = –46 and if you dereference pointer j it will completely read all bytes of int as we declared it to be pointer to int and output will be 1234

If you declare pointer j as int *j then *j will read sizeof(int) here 4 bytes(machine dependent). Same goes with char or any other data type the pointer pointed to them will read as many bytes there size is of , char is of 1 byte.

As others have pointed, you need to explicitly cast to char* as char *p = &y; is a constraint violation - char * and int * are not compatible types, instead write char *p = (char *)&y.

Logogram answered 16/1, 2016 at 11:52 Comment(4)
sizeof(int) does not necessarily == 4.Foreordain
Your example is missing a cast to char *, as we can only implicitly convert to void *. It would be worth mentioning that only [[un]signed] char * is allowed to alias other types via pointer/reference, in order to read their object representation as a series of chars. We are not allowed, e.g. to declare a char[4] (or whatever our sizeof(int) is), fill it with values, and try to read it via an int *, as that breaks assumptions about aliasing, alignment, and others.Foreordain
A pointer of any char type - char, unsigned char, signed char - can alias any other type. But the converse isn't true: we can't legally take a bunch of chars & read them as another type e.g. int. Some compilers let you do it; that doesn't mean it's legal. Trying to read an object's memory as-if it contains an incompatible type causes undefined behaviour, so the compiler can produce an error, work exactly as you expect, or do anything. More so, where both pointers are in scope, aliasing rules mean the illegal attempt to reinterpret can be optimised away to something you don't wantForeordain
@SurajJain This answers is mostly correct. I would correct the following: Generally char is of 1 byte: it is always 1 byte. Generally size of int is 4 bytes: it is true if 32/64-bit platforms (and not true for 8-bit platforms for example). Also as @JohnBode says in its answers char *p = &y; is invalid and a cast is required.Procrastinate
C
10

There are a couple of issues with the code as written.

First of all, you are invoking undefined behavior by trying to print the numeric representation of a char object using the %d conversion specifier:

Online C 2011 draft, §7.21.6.1, subclause 9:

If a conversion specification is invalid, the behavior is undefined.282) If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

Yes, objects of type char are promoted to int when passed to variadic functions; printf is special, and if you want the output to be well-defined, then the type of the argument and the conversion specifier must match up. To print the numeric value of a char with %d or unsigned char argument with %u, %o, or %x, you must use the hh length modifier as part of the conversion spec:

printf( "%hhd ", *p );

The second issue is that the line

char *p = &y;

is a constraint violation - char * and int * are not compatible types, and may have different sizes and/or representations2. Thus, you must explicitly cast the source to the target type:

char *p = (char *) &y;

The one exception to this rule occurs when one of the operands is void *; then the cast isn't necessary.

Having said all that, I took your code and added a utility that dumps the address and contents of objects in the program. Here's what y, p, and j look like on my system (SLES-10, gcc 4.1.2):

       Item        Address   00   01   02   03
       ----        -------   --   --   --   --
          y 0x7fff1a7e99cc   d2   04   00   00    ....

          p 0x7fff1a7e99c0   cc   99   7e   1a    ..~.
            0x7fff1a7e99c4   ff   7f   00   00    ....

          j 0x7fff1a7e99b8   cc   99   7e   1a    ..~.
            0x7fff1a7e99bc   ff   7f   00   00    ....

I'm on an x86 system, which is little-endian, so it stores multi-byte objects starting with the least-significant byte at the lowest address:

BE:      A   A+1  A+2  A+3
       +----+----+----+----+
    y: | 00 | 00 | 04 | d2 |
       +----+----+----+----+
LE:     A+3  A+2  A+1   A

On a little-endian system, the addressed byte is the least-significant byte, which in this case is 0xd2 (210 unsigned, -46 signed).

In a nutshell, you're printing the signed, decimal representation of that single byte.

As for the broader question, the type of the expression *p is char and the type of the expression *j is int; the compiler simply goes by the type of the expression. The compiler keeps track of all objects, expressions, and types as it translates your source to machine code. So when it sees the expression *j, it knows that it's dealing with an integer value and generates machine code appropriately. When it sees the expression *p, it knows it's dealing with a char value.


  1. Admittedly, almost all modern desktop systems that I know of use the same representations for all pointer types, but for more oddball embedded or special-purpose platforms, that may not be true.
  2. § 6.2.5, subclause 28.

Canorous answered 27/12, 2016 at 18:0 Comment(7)
@SurajJain: The *p = &y line is a constraint violation (both operands must be of compatible types), so the compiler is required to issue a diagnostic. As to what will happen if you leave the cast off, that depends on a lot of things. If you're working on a platform where all pointer types have the same size and representation, then your code will most likely work as intended. If pointer sizes and representations differ, then you could see anything from garbled data to a runtime error. (or nothing at all).Canorous
@SurajJain: As for dasblinkenlight's answer...I tend to take an absolutist view when it comes to matching up conversions and arguments in printf calls, mainly because I've been bitten by that in the past. If you're passing a char argument, then you should use either %c or %hhd (or %hho or %hhx or %hhu for unsigned char). Yes, because promotions are a thing in variadic functions, you should be able to print a char value using %d, but I will argue that such code is less than clear.Canorous
This answer currently says: you are invoking undefined behavior by trying to print the numeric representation of a char object using the %d conversion specifier — which is, to be polite about it, hogwash. There is absolutely nothing undefined about it. There is implementation-defined behaviour, but that's not undefined. It is implementation-defined whether plain char is signed or unsigned, which affects the int value that the char is converted to, but that's all.Otoole
The arguments after the format string to printf() undergo the default promotion rules (C11 §6.5.2.2 Function calls, ¶6-7) because of the ellipsis in the function prototype.Otoole
@John Bode, It looks like you made a mistake in the first part of your question. Could you fix it so it can be accepted?Absently
@Absently He did not fixed it.Logogram
Typo in my comment: s/your question/your answerAbsently
F
2

(Please note this answer refers to the original form of the question, which asked how the program knew how many bytes to read, etc. I'm keeping it around on that basis, despite the rug having been pulled out from under it.)

A pointer refers to a location in memory that contains a particular object and must be incremented/decremented/indexed with a particular stride size, reflecting the sizeof the pointed type.

The observable value of the pointer itself (e.g. through std::cout << ptr) need not reflect any recognisable physical address, nor does ++ptr need to increment said value by 1, sizeof(*ptr), or anything else. A pointer is just a handle to an object, with an implementation-defined bit representation. That representation doesn't and shouldn't matter to users. The only thing for which users should use the pointer is to... well, point to stuff. Talk of its address is nonportable and only useful in debugging.

Anyway, simply, the compiler knows how many bytes to read/write because the pointer is typed, and that type has a defined sizeof, representation, and mapping to physical addresses. So, based on that type, operations on ptr will be compiled to appropriate instructions in order to calculate the real hardware address (which again, need not correspond to the observable value of ptr), read the right sizeof number of memory 'bytes', add/subtract the right number of bytes so it points at the next object, etc.

Foreordain answered 9/8, 2016 at 10:35 Comment(0)
L
1

First read the warning which says warning: initialization from incompatible pointer type [enabled by default] char *p = &y;

which means you should do explicit typecasting to avoid undefined behaviour according to standard §7.21.6.1, subclause 9 (pointed by @john Bode) as

chat *p = (char*)&y;

and

int y =1234;

here y is the local variable and it will be stores in the stack section of RAM.In Linux machine integers are stored in memory according to little endian format. Assume 4 bytes of memory reserved for y is from 0x100 to 0x104

    -------------------------------------------------
    | 0000 0000 | 0000 0000 | 0000 0100 | 1101 0010 |
    -------------------------------------------------
    0x104      0x103       0x102       0x101       0x100
                                                    y
                                                    p
                                                    j

As pointed above, j and p both points to same address 0x100 but when compiler will perform *p since p is signed character pointer by default it will check sign bit and here sign bit is 1 means one thing is sure that output it's going to print is negative number.

If sign bit is 1 i.e negative number and Negative numbers are stored in Memory as 2's compliment So

    actual          => 1101 0010 (1st byte)
    ones compliment => 0010 1101
                              +1
                      ------------
                       0010  1110 => 46 and since sign bit was one it will print -46

while printing if you are using %u format specifier which is for printing unsigned equivalent, it will not check sign bit, finally whatever data is there in 1 byte gets printed.

finally

printf("%d\n",*j);

In above statement while doing dereferencing j which is signed pointer by default and its a int pointer so it will check 31st bit for sign, which is 0 means output will be positive no and that is 1234.

Levo answered 25/12, 2017 at 16:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.