Placement of the asterisk in pointer declarations
Asked Answered
G

15

140

I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.

How about these examples:

  1. int* test;
  2. int *test;
  3. int * test;
  4. int* test,test2;
  5. int *test,test2;
  6. int * test,test2;

Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.

The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?

Gasiform answered 7/10, 2008 at 21:2 Comment(10)
In C/C++ white spaces don't change meaning.Wedlock
7. int*test;?Seawright
+1 because I'd only thought to ask about 1 - 3. Reading this question taught me something about 4 - 6 that I'd never thought of.Ethylethylate
@Wedlock That is true 99% of the time, but not always. Of the top of my head there was the type of templated type in templated type space requirement (pre C++11). In Foo<Bar<char>> the >> had to be written > > so as not to be treated as a right-shift.Auk
@Auk You are right, that's a rather old comment. There are multiple situations when a space will change meaning, for example, the increment ++ operator cannot be split by a space, identifiers cannot be split by a space (and the result can be still legal for the compiler but with undefined runtime behavior). The exact situations are very difficult to define considering the syntax mess that C/C++ is.Wedlock
@Wedlock yes those cases you mention now should be rather obvious, I meant that there are a few non-obvious cases where a space is required. Anyway I just wanted to make that note for the record, it wasn't meant as criticism to your comment.Auk
@JinKwon: Whitespace is only necessary to separate tokens that can't otherwise be distinguished. Since * is not part of any identifier (it's a token all on its own), no whitespace is necessary to separate out int, *, and test. It will still be parsed as int (*test).Pushing
I don't understand why people keep saying this is "just aesthetics" or "style" or "a matter of opinion". The fact that int* test,test2; doesn't do what you would expect implies that it is wrong, a result of misunderstanding the language, and that int *test,test2; is correct.Diffluent
...and that int* test; int test2; is correct.Maitland
I guess that was another reason that smart pointers were created. To avoid the asterisk at all.Salomon
R
159

4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:

int *test, *test2;

Or, even better (to make everything clear):

int* test;
int* test2;
Romulus answered 7/10, 2008 at 21:4 Comment(9)
So Case 4 is actually a death-trap then? Is there any specification or further reading that explains why int* test,test2 only makes the first variable a pointer?Gasiform
@ Michael Stum It's C++ so do you really think there is a logical explanation?Bedchamber
Read K&R (The C Programming Language). It explains all this very clearly.Nightclub
Noted, pretty much everyone recommends that book, i'll grab a copy off Amazon then. But now at least I remember why I was so reluctant to pick up C/C++ in the Past :-)Gasiform
Cases 4, 5 and 6 are "death-traps". This is one reason why many C/C++ style gudes suggest only one declaration per statement.Thinskinned
Whitespace is insignificant to a C compiler (ignoring the preprocessor). So no matter how many spaces there are or aren't between the asterisk and its surroundings, it has exactly the same meaning.Fermata
@MichaelStum Death trap? int *p, i, (*f)(); is a perfectly clear declaration. Just know your language before you use it, and name your variables appropriately (int p, *i; is a trap, but so is int p; int *i;).Savarin
@Peter-ReinstateMonica If someone feels "X" is not clear, you stating "X is perfectly clear" has zero meaning and even less impact on reality. You gain NOTHING from cramming all of int *p, i, (*f)(); in one line. It also doesn't matter if you "know your language" - you're writing the code for someone five years from now who may very well NOT know the language fully.Polyvinyl
@AndrewHenle Zero meaning? Implicitly it means at least, obviously, "perfectly clear to me". That I feel that worth mentioning, in turn, implies that I think the fact that I understand it brings at least some value to the discussion: To one user it is unclear, to another user it is clear. The source code is identical, the user is not. The reason it is unclear, therefore, may be with the user, not the code. Now we can talk about which user is more representative for the general coder population, whether the grammar is dangerous, etc. -- but it certainly has meaning and impact.Savarin
P
57

White space around asterisks have no significance. All three mean the same thing:

int* test;
int *test;
int * test;

The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:

int *var1;
int var2;
Pressurecook answered 7/10, 2008 at 21:5 Comment(3)
the space before or after the asterisk is just a matter of aesthetics. However, the Google Coding standard goes with int *test (google-styleguide.googlecode.com/svn/trunk/…). Just be consistentSecrecy
@SebastianRaschka The Google C++ Style Guide explicitly allows either asterisk placement. Perhaps it has changed since you read it.Whack
@JaredBeck google.github.io/styleguide/…Diffluent
F
46

Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.


A bit of an aside I know, but something I found useful is to read declarations backwards.

int* test;   // test is a pointer to an int

This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.

int* const test; // test is a const pointer to an int

int const * test; // test is a pointer to a const int ... but many people write this as  
const int * test; // test is a pointer to an int that's const
Fabre answered 7/10, 2008 at 21:13 Comment(2)
While the "one variable per line" seems useful, we still have not completely solved the situation where the asterisk is more to the left, or more to the right. I am quite sure that in code out in the wild one variant prevails; a bit like some countries drive on the right side, and the others drive the wrong direction, such as the UK. ;-)Whoso
Unfortunately from my adventures in to the wild I see plenty of both styles. In my team we now use clang-format with a style we've agreed on. This at least means all of the code our team produces has the same style for where the whitespace goes.Fabre
P
42

There are three pieces to this puzzle.

The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.

During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like

inttest;

the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:

int test;

The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write

int*test;

the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of

int *test;
int* test;
int*test;
int     *     test;

are interpreted the same way.


The second piece to the puzzle is how declarations actually work in C and C++¹. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration

unsigned long int a[10]={0}, *p=NULL, f(void);

the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.

The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.

This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as

int[10] a;

because the operand of the [] operator is a, not int. Similarly, in the declaration

int* p;

the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.

Therefore, if you write

int* p, q;

the operand of * is p, so it will be interpreted as

int (*p), q;

Thus, all of

int *test1, test2;
int* test1, test2;
int * test1, test2;

do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.

Declarators can get arbitrarily complex. You can have arrays of pointers:

T *a[N];

you can have pointers to arrays:

T (*a)[N];

you can have functions returning pointers:

T *f(void);

you can have pointers to functions:

T (*f)(void);

you can have arrays of pointers to functions:

T (*a[N])(void);

you can have functions returning pointers to arrays:

T (*f(void))[N];

you can have functions returning pointers to arrays of pointers to functions returning pointers to T:

T *(*(*f(void))[N])(void); // yes, it's eye-stabby.  Welcome to C and C++.

and then you have signal:

void (*signal(int, void (*)(int)))(int);

which reads as

       signal                             -- signal
       signal(                 )          -- is a function taking
       signal(                 )          --   unnamed parameter
       signal(int              )          --   is an int
       signal(int,             )          --   unnamed parameter
       signal(int,      (*)    )          --   is a pointer to
       signal(int,      (*)(  ))          --     a function taking
       signal(int,      (*)(  ))          --       unnamed parameter
       signal(int,      (*)(int))         --       is an int
       signal(int, void (*)(int))         --     returning void
     (*signal(int, void (*)(int)))        -- returning a pointer to
     (*signal(int, void (*)(int)))(   )   --   a function taking
     (*signal(int, void (*)(int)))(   )   --     unnamed parameter
     (*signal(int, void (*)(int)))(int)   --     is an int
void (*signal(int, void (*)(int)))(int);  --   returning void
    

and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.

One thing to watch out for - const can modify both the pointer type and the pointed-to type:

const int *p;  
int const *p;

Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:

const int x = 1;
const int y = 2;

const int *p = &x;
p = &y;

but you cannot write to the pointed-to object:

*p = 3; // constraint violation, the pointed-to object is const

However,

int * const p;

declares p as a const pointer to a non-const int; you can write to the thing p points to

int x = 1;
int y = 2;
int * const p = &x;

*p = 3;

but you can't set p to point to a different object:

p = &y; // constraint violation, p is const

Which brings us to the third piece of the puzzle - why declarations are structured this way.

The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:

printf( "%d", *ap[i] );

The expression *ap[i] has type int; thus, the declaration of ap is written as

int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
            // of the type specifier and declarator

The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).

As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as

printf( "%d", (*pa)[i] );

The type of the expression (*pa)[i] is int, so the declaration is written as

int (*pa)[N];

Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.

The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.


So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like

void foo( int *, int (*)[10] );

There's no declarator, aren't the * and [] operators modifying the type directly?

Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and

void foo( int λ, int (*λ)[10] );

and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.


And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as

T* p;

and consider it bad practice for the following reasons:

  1. It's not consistent with the syntax;
  2. It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
  3. It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
  4. It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
  5. The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.

In the end, it just indicates confused thinking about how the two languages' type systems work.

There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.

I consider it akin to deliberately writing all your simple for loops as

i = 0;
for( ; i < N; ) 
{ 
  ... 
  i++; 
}

Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.


¹ I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.

Pushing answered 13/10, 2020 at 16:20 Comment(2)
This is the best answer to this question. It should be higher voted.Checkerboard
This means that the same goes for a reference declaration: int &ref = x;Paco
T
35

Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;

There are three simple steps to follow:

  1. Starting with the unknown element, move in a spiral/clockwise direction; when encountering the following elements replace them with the corresponding english statements:

    [X] or []: Array X size of... or Array undefined size of...

    (type1, type2): function passing type1 and type2 returning...

    *: pointer(s) to...

  2. Keep doing this in a spiral/clockwise direction until all tokens have been covered.
  3. Always resolve anything in parenthesis first!

Also, declarations should be in separate statements when possible (which is true the vast majority of times).

Thinskinned answered 7/10, 2008 at 21:27 Comment(6)
That looks daunting and quite horrible, sorry to say.Bedchamber
it does, but it seems quite a good explanation for some of the more complicated constructsGasiform
@d03boy: There's no question - C/C++ declarations can be a nightmare.Thinskinned
The "spiral" doesn't make any sense, much less the "clockwise". I'd rather name it the "right-left rule", since the syntax doesn't make you look right-bottom-left-top, only right-left.Dissemble
I learned this as the "right-left-right" rule. C++ folks often like to pretend all the type information is on the left, which leads to the int* x; style rather than the traditional int *x; style. Of course, the spacing doesn't matter to the compiler, but it does affect the humans. Denial of the actual syntax leads to style rules that can annoy and confound readers.Maleate
It's neither "spiral" nor "right-left" nor any other specific pattern: It's simply applying the operators according to parentheses, precedence and respective evaluation order (left-to-right or right-to-left) just like in the corresponding expression which yields the built-in type to the left. Where is your spiral or left-right in int *arr[1][2][3][4]??Savarin
I
15

As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:

int* x; // "x is a pointer to int"

or this way:

int *x; // "*x is an int"

FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:

int* x,y; // "x is a pointer to int, y is an int"

which is potentially misleading; instead you would write either

int *x,y; // it's a little clearer what is going on here

or if you really want two pointers,

int *x, *y; // two pointers

Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.

Interrogate answered 31/12, 2011 at 0:46 Comment(3)
this is bogus, what do you call int *MyFunc(void) ? a *MyFunc is a function returning an int ? no. Obviously we should write int* MyFunc(void), and say MyFunc is a function returning a int*. So to me this is clear, the C and C++ grammar parsing rules are simply wrong for variable declaration. they should have included pointer qualification as part of the shared type for the whole comma sequence.Idyllist
But *MyFunc() is an int. The problem with the C syntax is mixing prefix and postfix syntax - if only postfix was used, there would be no confusion.Shawm
The first camp fights the language's syntax, leading to confusing constructs like int const* x;, which I find as misleading as a * x+b * y.Maleate
S
11
#include <type_traits>

std::add_pointer<int>::type test, test2;
Semicolon answered 15/9, 2012 at 16:27 Comment(2)
#include <windows.h>LPINT test, test2;Martainn
For those who are wondering: in this case both test and test2 are of type int*.Metaphase
T
5

In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.

Toothed answered 7/10, 2008 at 21:5 Comment(0)
E
4

The rationale in C is that you declare the variables the way you use them. For example

char *a[100];

says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.

This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)

Ecdysiast answered 1/1, 2016 at 20:29 Comment(4)
Yet writing char* a[100]; also deduces that *a[42]; will be a char and a[42]; a char pointer.Asseveration
Well, we all deduce the same conclusions, only the order is varying.Ecdysiast
Quote: "says that *a[42] will be a char. And a[42] a char pointer". Are you sure it is not the other way around?Courtneycourtrai
If you prefer the other way, say a[42] is a char pointer, and *a[42] is a char.Ecdysiast
C
3

I would say that the initial convention was to put the star on the pointer name side (right side of the declaration

You can follow the same rules, but it's not a big deal if you put stars on the type side. Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.

Cramped answered 28/9, 2017 at 8:48 Comment(1)
Well - the parser appears to allow either variant, but if Dennis and Linus say it should be on the right side, that is quite compelling. But still, we kind of lack some rationale, and then also the explanation why this is done. It's a bit like tab versus space situation - except that one got solved, because people who use spaces rather than tabs, make more money, according to StackOverflow ... :-)Whoso
C
3

In my opinion, the answer is BOTH, depending on the situation. Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:

int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2;  // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors

Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).

However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:

MyClass *volatile MyObjName

void test (const char *const p) // const value pointed to by a const pointer

Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:

void* ClassName::getItemPtr () {return &item;} // Clear at first sight

Courtneycourtrai answered 2/5, 2019 at 18:19 Comment(0)
E
2

This is more of an addendum to @John Bode’s answer, which is a beautiful piece of writing.

As Bode has alluded to, much of the current confusion in C over the placement of the unary operator * in a pointer declaration has a C++ origin.

It is best illustrated by the following paragraph from Jens Gustedt’s Modern C (remember G. is a co-editor of the ISO C Standard):

Please note that the * character plays two different roles in the definition of double_swap. In a declaration, it creates a new type (a pointer type), whereas in an expression it dereferencesC the object to which a pointer refersC . To help distinguish these two usages of the same symbol, we usually flush the * to the left with no blanks in between if it modifies a type (such as double*) and to the right if it dereferences a pointer (*p0).

This is a perversion of K&R, who stated that the use of * in a pointer declaration ‘is intended as a mnemonic’, but becomes easier to understand when one realises M. Gustedt has a background in C++.

Ectomy answered 1/8, 2023 at 7:14 Comment(1)
This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From ReviewAffaire
S
1

The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.

1,2 and 3) Test is of type (int *). Whitespace doesn't matter.

4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.

Spillman answered 7/10, 2008 at 21:7 Comment(0)
I
1

I have always preferred to declare pointers like this:

int* i;

I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.

It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:

The declaration of the pointer ip,

int *ip;

is intended as a mnemonic; it says that the expression *ip is an int. The syntax of the declaration for a variable mimics the syntax of expressions in which the variable might appear. This reasoning applies to function declarations as well. For example,

double *dp, atof(char *);

says that in an expression *dp and atof(s) have values of type double, and that the argument of atof is a pointer to char.

So, by the reasoning of the C language, when you declare

int* test, test2;

you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.

A compiler is perfectly happy to accept the following:

int *ip, i;
i = *ip;

because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.

Icaria answered 21/3, 2022 at 18:23 Comment(1)
Interestingly, the declaration int *ip = 0, i = 0 initializes ip = (int*) 0and i = (int) 0, so the expression syntax mimicking doesn't extend to the assignment operatorIcaria
P
-2

A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.

Take for example:

int const bla;

The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.

Polyphonic answered 7/10, 2008 at 21:50 Comment(1)
This doesn't answer the question. Worse, if we try to infer an answer from it, then it implies the asterisk binds to the type at its left, which as everyone else has said, is false. It binds to the single variable name at its right.Interrelated

© 2022 - 2024 — McMap. All rights reserved.