Please include an example with the explanation.
Reviewing the basic terminology
It's usually good enough - unless you're programming assembly - to envisage a pointer containing a numeric memory address, with 1 referring to the second byte in the process's memory, 2 the third, 3 the fourth and so on....
- What happened to 0 and the first byte? Well, we'll get to that later - see null pointers below.
- For a more accurate definition of what pointers store, and how memory and addresses relate, see "More about memory addresses, and why you probably don't need to know" at the end of this answer.
When you want to access the data/value in the memory that the pointer points to - the contents of the address with that numerical index - then you dereference the pointer.
Different computer languages have different notations to tell the compiler or interpreter that you're now interested in the pointed-to object's (current) value - I focus below on C and C++.
A pointer scenario
Consider in C, given a pointer such as p
below...
const char* p = "abc";
...four bytes with the numerical values used to encode the letters 'a', 'b', 'c', and a 0 byte to denote the end of the textual data, are stored somewhere in memory and the numerical address of that data is stored in p
. This way C encodes text in memory is known as ASCIIZ.
For example, if the string literal happened to be at address 0x1000 and p
a 32-bit pointer at 0x2000, the memory content would be:
Memory Address (hex) Variable name Contents
1000 'a' == 97 (ASCII)
1001 'b' == 98
1002 'c' == 99
1003 0
...
2000-2003 p 1000 hex
Note that there is no variable name/identifier for address 0x1000, but we can indirectly refer to the string literal using a pointer storing its address: p
.
Dereferencing the pointer
To refer to the characters p
points to, we dereference p
using one of these notations (again, for C):
assert(*p == 'a'); // The first character at address p will be 'a'
assert(p[1] == 'b'); // p[1] actually dereferences a pointer created by adding
// p and 1 times the size of the things to which p points:
// In this case they're char which are 1 byte in C...
assert(*(p + 1) == 'b'); // Another notation for p[1]
You can also move pointers through the pointed-to data, dereferencing them as you go:
++p; // Increment p so it's now 0x1001
assert(*p == 'b'); // p == 0x1001 which is where the 'b' is...
If you have some data that can be written to, then you can do things like this:
int x = 2;
int* p_x = &x; // Put the address of the x variable into the pointer p_x
*p_x = 4; // Change the memory at the address in p_x to be 4
assert(x == 4); // Check x is now 4
Above, you must have known at compile time that you would need a variable called x
, and the code asks the compiler to arrange where it should be stored, ensuring the address will be available via &x
.
Dereferencing and accessing a structure data member
In C, if you have a variable that is a pointer to a structure with data members, you can access those members using the ->
dereferencing operator:
typedef struct X { int i_; double d_; } X;
X x;
X* p = &x;
p->d_ = 3.14159; // Dereference and access data member x.d_
(*p).d_ *= -1; // Another equivalent notation for accessing x.d_
Multi-byte data types
To use a pointer, a computer program also needs some insight into the type of data that is being pointed at - if that data type needs more than one byte to represent, then the pointer normally points to the lowest-numbered byte in the data.
So, looking at a slightly more complex example:
double sizes[] = { 10.3, 13.4, 11.2, 19.4 };
double* p = sizes;
assert(p[0] == 10.3); // Knows to look at all the bytes in the first double value
assert(p[1] == 13.4); // Actually looks at bytes from address p + 1 * sizeof(double)
// (sizeof(double) is almost always eight bytes)
++p; // Advance p by sizeof(double)
assert(*p == 13.4); // The double at memory beginning at address p has value 13.4
*(p + 2) = 29.8; // Change sizes[3] from 19.4 to 29.8
// Note earlier ++p and + 2 here => sizes[3]
Pointers to dynamically allocated memory
Sometimes you don't know how much memory you'll need until your program is running and sees what data is thrown at it... then you can dynamically allocate memory using malloc
. It is common practice to store the address in a pointer...
int* p = (int*)malloc(sizeof(int)); // Get some memory somewhere...
*p = 10; // Dereference the pointer to the memory, then write a value in
fn(*p); // Call a function, passing it the value at address p
(*p) += 3; // Change the value, adding 3 to it
free(p); // Release the memory back to the heap allocation library
In C++, memory allocation is normally done with the new
operator, and deallocation with delete
:
int* p = new int(10); // Memory for one int with initial value 10
delete p;
p = new int[10]; // Memory for ten ints with unspecified initial value
delete[] p;
p = new int[10](); // Memory for ten ints that are value initialised (to 0)
delete[] p;
See also C++ smart pointers below.
Losing and leaking addresses
Often a pointer may be the only indication of where some data or buffer exists in memory. If ongoing use of that data/buffer is needed, or the ability to call free()
or delete
to avoid leaking the memory, then the programmer must operate on a copy of the pointer...
const char* p = asprintf("name: %s", name); // Common but non-Standard printf-on-heap
// Replace non-printable characters with underscores....
for (const char* q = p; *q; ++q)
if (!isprint(*q))
*q = '_';
printf("%s\n", p); // Only q was modified
free(p);
...or carefully orchestrate reversal of any changes...
const size_t n = ...;
p += n;
...
p -= n; // Restore earlier value...
free(p);
C++ smart pointers
In C++, it's best practice to use smart pointer objects to store and manage the pointers, automatically deallocating them when the smart pointers' destructors run. Since C++11 the Standard Library provides two, unique_ptr
for when there's a single owner for an allocated object...
{
std::unique_ptr<T> p{new T(42, "meaning")};
call_a_function(p);
// The function above might throw, so delete here is unreliable, but...
} // p's destructor's guaranteed to run "here", calling delete
...and shared_ptr
for share ownership (using reference counting)...
{
auto p = std::make_shared<T>(3.14, "pi");
number_storage1.may_add(p); // Might copy p into its container
number_storage2.may_add(p); // Might copy p into its container } // p's destructor will only delete the T if neither may_add copied it
Null pointers
In C, NULL
and 0
- and additionally in C++ nullptr
- can be used to indicate that a pointer doesn't currently hold the memory address of a variable, and shouldn't be dereferenced or used in pointer arithmetic. For example:
const char* p_filename = NULL; // Or "= 0", or "= nullptr" in C++
int c;
while ((c = getopt(argc, argv, "f:")) != -1)
switch (c) {
case f: p_filename = optarg; break;
}
if (p_filename) // Only NULL converts to false
... // Only get here if -f flag specified
In C and C++, just as inbuilt numeric types don't necessarily default to 0
, nor bools
to false
, pointers are not always set to NULL
. All these are set to 0/false/NULL when they're static
variables or (C++ only) direct or indirect member variables of static objects or their bases, or undergo zero initialisation (e.g. new T();
and new T(x, y, z);
perform zero-initialisation on T's members including pointers, whereas new T;
does not).
Further, when you assign 0
, NULL
and nullptr
to a pointer the bits in the pointer are not necessarily all reset: the pointer may not contain "0" at the hardware level, or refer to address 0 in your virtual address space. The compiler is allowed to store something else there if it has reason to, but whatever it does - if you come along and compare the pointer to 0
, NULL
, nullptr
or another pointer that was assigned any of those, the comparison must work as expected. So, below the source code at the compiler level, "NULL" is potentially a bit "magical" in the C and C++ languages...
More about memory addresses, and why you probably don't need to know
More strictly, initialised pointers store a bit-pattern identifying either NULL
or a (often virtual) memory address.
The simple case is where this is a numeric offset into the process's entire virtual address space; in more complex cases the pointer may be relative to some specific memory area, which the CPU may select based on CPU "segment" registers or some manner of segment id encoded in the bit-pattern, and/or looking in different places depending on the machine code instructions using the address.
For example, an int*
properly initialised to point to an int
variable might - after casting to a float*
- access memory in "GPU" memory quite distinct from the memory where the int
variable is, then once cast to and used as a function pointer it might point into further distinct memory holding machine opcodes for the program (with the numeric value of the int*
effectively a random, invalid pointer within these other memory regions).
3GL programming languages like C and C++ tend to hide this complexity, such that:
If the compiler gives you a pointer to a variable or function, you can dereference it freely (as long as the variable's not destructed/deallocated meanwhile) and it's the compiler's problem whether e.g. a particular CPU segment register needs to be restored beforehand, or a distinct machine code instruction used
If you get a pointer to an element in an array, you can use pointer arithmetic to move anywhere else in the array, or even to form an address one-past-the-end of the array that's legal to compare with other pointers to elements in the array (or that have similarly been moved by pointer arithmetic to the same one-past-the-end value); again in C and C++, it's up to the compiler to ensure this "just works"
Specific OS functions, e.g. shared memory mapping, may give you pointers, and they'll "just work" within the range of addresses that makes sense for them
Attempts to move legal pointers beyond these boundaries, or to cast arbitrary numbers to pointers, or use pointers cast to unrelated types, typically have undefined behaviour, so should be avoided in higher level libraries and applications, but code for OSes, device drivers, etc. may need to rely on behaviour left undefined by the C or C++ Standard, that is nevertheless well defined by their specific implementation or hardware.
p[1]
and *(p + 1)
identical? That is, Does p[1]
and *(p + 1)
generate the same instructions? –
Cheryle [2]
because someone commented that it was wrong and I missed the ++p
myself then - I'd been right the first time. Will revert. Thanks! –
Impress p
at 2000-2003
?! Why isn't it just 2000
containing 1000 hex
? –
Lap p
is just 2000: if you had another pointer to p
it would have to store 2000 in its four or eight bytes. Hope that helps! Cheers. –
Impress u
contains an array arr
, both gcc and clang will recognize that the lvalue u.arr[i]
might access the same storage as other union members, but will not recognize that lvalue *(u.arr+i)
might do so. I'm not sure whether the authors of those compilers think that the latter invokes UB, or that the former invokes UB but they should process it usefully anyway, but they clearly view the two expressions as different. –
Pesce memcpy
). GCC and clang have their -fno-strict-aliasing
support. I can't tell from your description whether arr
or the "other union members" in your scenario are character arrays, or whether you're using -fno-strict-aliasing
, but regardless if does seem bizarre for one notation to work and not the other. Maybe worth making a question about it? Cheers –
Impress -fno-strict-aliasing
isn't particularly onerous. –
Impress -fstrict-aliasing
mode. The Standard is useless in that regard, since even an ordinary struct member access lvalue invokes UB. It's possible to use -fno-strict-aliasing
to be sure, but if the design of gcc makes it necessary to use -fno-strict-aliasing
to be certain code will work then any effort the authors put into aliasing optimizations is wasted and could be better spent elsewhere. –
Pesce -fno-strict-aliasing
, and minimise regressions between releases. In this space, I believe C++20 is still expected to add bit_cast
, which will hopefully replace some of the reinterpret_cast
and union based type punning UBs. "even an ordinary struct member access lvalue invokes UB" - could you elaborate or provide a link re this one? Cheers –
Impress struct S {int x;} s;
, the type of lvalue s.x
is int
, but the only types of lvalue that can be used to access the stored value of a struct S
are struct S
and character types. I think the authors of the Standard thought it sufficiently obvious that any non-broken compiler should handle cases like s.x
that they didn't need to explicitly specify it, but they probably thought the same about a number of access patterns that the authors of gcc and clang seem to take pride in refusing to support without -fno-strict-aliasing
. –
Pesce Dereferencing a pointer means getting the value that is stored in the memory location pointed by the pointer. The operator * is used to do this, and is called the dereferencing operator.
int a = 10;
int* ptr = &a;
printf("%d", *ptr); // With *ptr I'm dereferencing the pointer.
// Which means, I am asking the value pointed at by the pointer.
// ptr is pointing to the location in memory of the variable a.
// In a's location, we have 10. So, dereferencing gives this value.
// Since we have indirect control over a's location, we can modify its content using the pointer. This is an indirect way to access a.
*ptr = 20; // Now a's content is no longer 10, and has been modified to 20.
[]
also dereferences a pointer (a[b]
is defined to mean *(a + b)
). –
Antependium someUnion.array[i]
and *(someUnion.array+i)
differently, recognizing accesses upon the former, but not the latter, as accesses to the union object. –
Pesce someUnion.array[i]
or *(someUnion.array + i)
, you a) name an array (someUnion.array
), you b) decay that array into a pointer, and you c) access the bytes behind that pointer. Step b) may be omitted if someUnion.array
is not an array itself but rather a pointer to some dynamic memory, but the rest is identical in all cases. –
Antependium if (u.arr1[i]) u.arr2[j]=1; return u.arr[i]
, gcc and clang will return the value that u.arr1[i]
holds when the function returns. Change the code to if (*(u.arr1+i)) *(u.arr2+j)=1; return *(u.arr1+i);
and they'll return the value that u.arr1[i]
held when the condition was tested, even if the write to u.arr2[j]
modified u.arr1[i]
. –
Pesce return u.arr[i]
is firmly within UB territory after executing u.arr2[j] = 1
. After all, you are reading a union member after writing to another one. The compiler could decide to return a pink elephant instead. So it's of little use to muse over what it actually returned... –
Antependium In simple words, dereferencing means accessing the value from a certain memory location against which that pointer is pointing.
A pointer is a "reference" to a value.. much like a library call number is a reference to a book. "Dereferencing" the call number is physically going through and retrieving that book.
int a=4 ;
int *pA = &a ;
printf( "The REFERENCE/call number for the variable `a` is %p\n", pA ) ;
// The * causes pA to DEREFERENCE... `a` via "callnumber" `pA`.
printf( "%d\n", *pA ) ; // prints 4..
If the book isn't there, the librarian starts shouting, shuts the library down, and a couple of people are set to investigate the cause of a person going to find a book that isn't there.
Code and explanation from Pointer Basics:
The dereference operation starts at the pointer and follows its arrow over to access its pointee. The goal may be to look at the pointee state or to change the pointee state. The dereference operation on a pointer only works if the pointer has a pointee -- the pointee must be allocated and the pointer must be set to point to it. The most common error in pointer code is forgetting to set up the pointee. The most common runtime crash because of that error in the code is a failed dereference operation. In Java the incorrect dereference will be flagged politely by the runtime system. In compiled languages such as C, C++, and Pascal, the incorrect dereference will sometimes crash, and other times corrupt memory in some subtle, random way. Pointer bugs in compiled languages can be difficult to track down for this reason.
void main() {
int* x; // Allocate the pointer x
x = malloc(sizeof(int)); // Allocate an int pointee,
// and set x to point to it
*x = 42; // Dereference x to store 42 in its pointee
}
I think all the previous answers are wrong, as they state that dereferencing means accessing the actual value. Wikipedia gives the correct definition instead: https://en.wikipedia.org/wiki/Dereference_operator
It operates on a pointer variable, and returns an l-value equivalent to the value at the pointer address. This is called "dereferencing" the pointer.
That said, we can dereference the pointer without ever accessing the value it points to. For example:
char *p = NULL;
*p;
We dereferenced the NULL pointer without accessing its value. Or we could do:
p1 = &(*p);
sz = sizeof(*p);
Again, dereferencing, but never accessing the value. Such code will NOT crash: The crash happens when you actually access the data by an invalid pointer. However, unfortunately, according the the standard, dereferencing an invalid pointer is an undefined behaviour (with a few exceptions), even if you don't try to touch the actual data.
So in short: dereferencing the pointer means applying the dereference operator to it. That operator just returns an l-value for your future use.
int arr[5]={0};int (*p)[5] = &arr
Why *p
is the address of the array?I know *p
is int [5]
type, but p
stores the address of array. Since dereference p, I thought it will be 0
, the value of arr[0]. Coz that's what stores in the address indicated by p. –
Spirt *p;
causes undefined behaviour. Although you are right that dereferencing does not access the value per se, the code *p;
does access the value. –
Thaw operator =
is what accesses the value, and in *p;
there is no operator =
, only a dereferencing op. It doesn't access the value. Some compilers provide the "volatile dummy read" feature that inserts the fetch if p
was marked volatile. –
Roberts *
operator is undefined." - with footnote "Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer [...]" So *p
on a null pointer has undefined behaviour - all bets are off. That could manifest in many unfortunate ways - not just an attempt to read from address 0, but - for example - removing code unconditionally executing *p
on NULL, based on the (legal) deduction it can't be meant to run given it has undefined behaviour. –
Impress &(*p)
though - the C Standard defines that as equivalent to p
, explicitly even for NULL: "Thus, &*E
is equivalent to E
(even if E
is a null pointer)" –
Impress © 2022 - 2024 — McMap. All rights reserved.
int *p;
would define a pointer to an integer, and*p
would dereference that pointer, meaning that it would actually retrieve the data that p points to. – Ample*p
does not retrieve the data that p points to. Instead it designates the memory location. That expression could then go on to be used to either store new data in, or retrieve data from, or nothing. – Thaw