Why do C and C++ support memberwise assignment of arrays within structs, but not generally?
Asked Answered
P

5

93

I understand that memberwise assignment of arrays is not supported, such that the following will not work:

int num1[3] = {1,2,3};
int num2[3];
num2 = num1; // "error: invalid array assignment"

I just accepted this as fact, figuring that the aim of the language is to provide an open-ended framework, and let the user decide how to implement something such as the copying of an array.

However, the following does work:

struct myStruct { int num[3]; };
struct myStruct struct1 = {{1,2,3}};
struct myStruct struct2;
struct2 = struct1;

The array num[3] is member-wise assigned from its instance in struct1, into its instance in struct2.

Why is member-wise assignment of arrays supported for structs, but not in general?

edit: Roger Pate's comment in the thread std::string in struct - Copy/assignment issues? seems to point in the general direction of the answer, but I don't know enough to confirm it myself.

edit 2: Many excellent responses. I choose Luther Blissett's because I was mostly wondering about the philosophical or historical rationale behind the behavior, but James McNellis's reference to the related spec documentation was useful as well.

Prom answered 9/8, 2010 at 3:10 Comment(4)
I'm making this have both C and C++ as tags, because this originates from C. Also, good question.Argueta
It might be worth noting that a long time ago in C, structure assignment was not generally possible and you had to use memcpy() or similar.Willi
Just a little FYI... boost::array (boost.org/doc/libs/release/doc/html/array.html) and now std::array (en.cppreference.com/w/cpp/container/array) are STL-compatible alternatives to the messy old C arrays. They support copy-assignment.Leitao
@EmileCormier And they are -- tada! -- structures around arrays.Hartebeest
P
51

Here's my take on it:

The Development of the C Language offers some insight in the evolution of the array type in C:

I'll try to outline the array thing:

C's forerunners B and BCPL had no distinct array type, a declaration like:

auto V[10] (B)
or 
let V = vec 10 (BCPL)

would declare V to be a (untyped) pointer which is initialized to point to an unused region of 10 "words" of memory. B already used * for pointer dereferencing and had the [] short hand notation, *(V+i) meant V[i], just as in C/C++ today. However, V is not an array, it is still a pointer which has to point to some memory. This caused trouble when Dennis Ritchie tried to extend B with struct types. He wanted arrays to be part of the structs, like in C today:

struct {
    int inumber;
    char name[14];
};

But with the B,BCPL concept of arrays as pointers, this would have required the name field to contain a pointer which had to be initialized at runtime to a memory region of 14 bytes within the struct. The initialization/layout problem was eventually solved by giving arrays a special treatment: The compiler would track the location of arrays in structures, on the stack etc. without actually requiring the pointer to the data to materialize, except in expressions which involve the arrays. This treatment allowed almost all B code to still run and is the source of the "arrays convert to pointer if you look at them" rule. It is a compatiblity hack, which turned out to be very handy, because it allowed arrays of open size etc.

And here's my guess why array can't be assigned: Since arrays were pointers in B, you could simply write:

auto V[10];
V=V+5;

to rebase an "array". This was now meaningless, because the base of an array variable was not a lvalue anymore. So this assigment was disallowed, which helped to catch the few programs that did this rebasing on declared arrays. And then this notion stuck: As arrays were never designed to be first class citized of the C type system, they were mostly treated as special beasts which become pointer if you use them. And from a certain point of view (which ignores that C-arrays are a botched hack), disallowing array assignment still makes some sense: An open array or an array function parameter is treated as a pointer without size information. The compiler doesn't have the information to generate an array assignment for them and the pointer assignment was required for compatibility reasons. Introducing array assignment for the declared arrays would have introduced bugs though spurious assigments (is a=b a pointer assignment or an elementwise copy?) and other trouble (how do you pass an array by value?) without actually solving a problem - just make everything explicit with memcpy!

/* Example how array assignment void make things even weirder in C/C++, 
   if we don't want to break existing code.
   It's actually better to leave things as they are...
*/
typedef int vec[3];

void f(vec a, vec b) 
{
    vec x,y; 
    a=b; // pointer assignment
    x=y; // NEW! element-wise assignment
    a=x; // pointer assignment
    x=a; // NEW! element-wise assignment
}

This didn't change when a revision of C in 1978 added struct assignment ( http://cm.bell-labs.com/cm/cs/who/dmr/cchanges.pdf ). Even though records were distinct types in C, it was not possible to assign them in early K&R C. You had to copy them member-wise with memcpy and you could pass only pointers to them as function parameters. Assigment (and parameter passing) was now simply defined as the memcpy of the struct's raw memory and since this couldn't break exsisting code it was readily adpoted. As a unintended side effect, this implicitly introduced some kind of array assignment, but this happended somewhere inside a structure, so this couldn't really introduce problems with the way arrays were used.

Phagocytosis answered 9/8, 2010 at 7:16 Comment(4)
It's too bad C didn't define a syntax e.g. int[10] c; to make the lvalue c behave as an array of ten items, rather than as a pointer to the first item of a ten-item array. There are a few situations where it's useful to be able to create a typedef that allocates space when used for a variable, but passes a pointer when used as a function argument, but the inability to have a value of array type is a significant semantic weakness in the language.Inositol
Instead of saying "pointer which has to point to some memory", the important point is that the pointer itself has to be stored in memory like a regular pointer. This does come across in your later explanation, but I think that highlights the key difference better. (In modern C, the name of an array variable does refer to a block of memory, so that's not the difference. It's that pointer itself is not logically stored anywhere in the abstract machine.)Incult
See C's aversion to arrays for a nice summary of the history.Incult
Very good explanation. Makes absolutely sense. Just to express the difference between an array and a struct containing an array in slightly different words: With one exception defined structs (with or w/o arrays) are usually used as complete types (i.e. its complete in respect of its size), whereas arrays are often used open ended (in both directions) for the very historial reasons you pointed out. Thats means its kinda safe to assume how to copy structs (even containing arrays). (The exception is the flexible array member which is allowed at the very end.)Stockbreeder
B
32

Concerning the assignment operators, the C++ standard says the following (C++03 §5.17/1):

There are several assignment operators... all require a modifiable lvalue as their left operand

An array is not a modifiable lvalue.

However, assignment to a class type object is defined specially (§5.17/4):

Assignment to objects of a class is defined by the copy assignment operator.

So, we look to see what the implicitly-declared copy assignment operator for a class does (§12.8/13):

The implicitly-defined copy assignment operator for class X performs memberwise assignment of its subobjects. ... Each subobject is assigned in the manner appropriate to its type:
...
-- if the subobject is an array, each element is assigned, in the manner appropriate to the element type
...

So, for a class type object, arrays are copied correctly. Note that if you provide a user-declared copy assignment operator, you cannot take advantage of this, and you'll have to copy the array element-by-element.


The reasoning is similar in C (C99 §6.5.16/2):

An assignment operator shall have a modifiable lvalue as its left operand.

And §6.3.2.1/1:

A modifiable lvalue is an lvalue that does not have array type... [other constraints follow]

In C, assignment is much simpler than in C++ (§6.5.16.1/2):

In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.

For assignment of struct-type objects, the left and right operands must have the same type, so the value of the right operand is simply copied into the left operand.

Butane answered 9/8, 2010 at 3:42 Comment(6)
Why are arrays immutable? Or rather, why isn't assignment defined specially for arrays like it is when it's in a class-type?Argueta
@GMan: That's the more interesting question, isn't it. For C++ the answer is probably "because that's how it is in C," and for C, I'd guess it's just due to how the language evolved (i.e., the reason is historical, not technical), but I wasn't alive when most of that took place, so I'll leave it to someone more knowledgeable to answer that part :-P (FWIW, I can't find anything in the C90 or C99 rationale documents).Butane
Does anyone know where the definition of "modifiable lvalue" is in the C++03 standard? It should be in §3.10. The index says it is defined on that page, but it's not. The (non-normative) note at §8.3.4/5 says "Objects of array types cannot be modified, see 3.10," but §3.10 does not once use the word "array."Butane
@James: I was just doing the same. It seems to refer to a removed definition. And yea, I've always wanted to know the real reason behind it all, but it seems a mystery. I've heard things like "prevent people from being inefficient by accidentally assigning arrays", but that's ridiculous.Argueta
@GMan, James: There recently was a discussion on comp.lang.c++ groups.google.com/group/comp.lang.c++/browse_frm/thread/… if you missed it and are still interested. Apparently it's not because an array isn't a modifiable lvalue (an array certainly is an lvalue and all non-const lvalues are modifiable), but because = requires an rvalue on the RHS and an array can't be an rvalue! The lvalue-to-rvalue conversion is forbidden for arrays, replaced with lvalue-to-pointer. static_cast isn't any better at making an rvalue because it's defined in the same terms.Breakwater
@Potatoswatter: Thanks a lot for the link (I don't usually check comp.lang.c++, since most of the interesting discussion is on .moderated and comp.std.c++). That makes a lot of sense and is quite clearly stated in 4.1/1.Butane
Z
2

In this link: http://www2.research.att.com/~bs/bs_faq2.html there's a section on array assignment:

The two fundamental problems with arrays are that

  • an array doesn't know its own size
  • the name of an array converts to a pointer to its first element at the slightest provocation

And I think this is the fundamental difference between arrays and structs. An array variable is a low level data element with limited self knowledge. Fundamentally, its a chunk of memory and a way to index into it.

So, the compiler can't tell the difference between int a[10] and int b[20].

Structs, however, do not have the same ambiguity.

Ziguard answered 9/8, 2010 at 3:40 Comment(4)
That page talks about passing arrays to functions (which cannot be done, so it's just a pointer, which is what he means when he says it loses its size). That has nothing to do with assigning arrays to arrays. And no, an array variable isn't just "really" a pointer to the first element, it's an array. Arrays are not pointers.Argueta
Thanks for the comment, but when I read that section of the article he says up front that arrays do not know its own size, then uses an example where arrays are passed as arguments to illustrate that fact. So, when arrays are passes as arguments, did they lose the information about their size, or did they never have the information to begin with. I assumed the latter.Ziguard
The compiler can tell the difference between two different-sized arrays - try printing sizeof(a) vs. sizeof(b) or passing a to void f(int (&)[20]);.Funerary
It's important to understand that each array size constitutes its own type. The rules for parameter passing ensure that you can write poor man's "generic" functions that take array arguments of any size, at the expense of needing to pass the size separately. If that were not the case (and in C++ you can -- and must! -- define reference parameters to arrays of specific size), you'd need a specific function for each different size, clearly nonsense. I wrote about it in another post.Hartebeest
P
0

I know, everyone who answered are experts in C/C++. But I thought, this is the primary reason.

num2 = num1;

Here you are trying to change the base address of the array, which is not permissible.

and of course, struct2 = struct1;

Here, object struct1 is assigned to another object.

Painkiller answered 9/8, 2010 at 12:29 Comment(3)
And assigning structs will eventually assign the array member, which begs the exact same question. Why is one allowed and not the other, when it's an array in both situations?Argueta
Agreed. But the first one is prevented by the compiler (num2=num1). The second one is not prevented by the compiler. That makes a huge difference.Painkiller
If arrays were assignable, num2 = num1 would be perfectly well behaved. The elements of num2 would have the same value of the corresponding element of num1.Lateritious
H
0

Another reason no further efforts were made to beef up arrays in C is probably that array assignment would not be that useful. Even though it can be easily achieved in C by wrapping it in a struct (and the struct's address can be simply cast to the array's address or even the array's first element's address for further processing) this feature is rarely used. One reason is that arrays of different sizes are incompatible which limits the benefits of assignment or, related, passing to functions by value.

Most functions with array parameters in languages where arrays are first-class types are written for arrays of arbitrary size. The function then usually iterates over the given number of elements, an information that the array provides. (In C the idiom is, of course, to pass a pointer and a separate element count.) A function which accepts an array of just one specific size is not needed as often, so not much is missed. (This changes when you can leave it to the compiler to generate a separate function for any occurring array size, as with C++ templates; this is the reason why std::array is useful.)

Hartebeest answered 13/7, 2019 at 15:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.