Array of non-contiguous objects
Asked Answered
A

1

1
#include <iostream> 
#include <cstring>
// This struct is not guaranteed to occupy contiguous storage
// in the sense of the C++ Object model (§1.8.5):
struct separated { 
  int i; 
  separated(int a, int b){i=a; i2=b;} 
  ~separated(){i=i2=-1;} // nontrivial destructor --> not trivially   copyable
  private: int i2;       // different access control --> not standard layout
};
int main() {
  static_assert(not std::is_standard_layout<separated>::value,"sl");
  static_assert(not std::is_trivial<separated>::value,"tr");
  separated a[2]={{1,2},{3,4}};
  std::memset(&a[0],0,sizeof(a[0]));
  std::cout<<a[1].i;    
  // No guarantee that the previous line outputs 3.
}
// compiled with Debian clang version 3.5.0-10, C++14-standard 
// (outputs 3) 
  1. What is the rationale behind weakening standard guarantees to the point that this program may show undefined behaviour?

  2. The standard says: "An object of array type contains a contiguously allocated non-empty set of N subobjects of type T." [dcl.array] §8.3.4. If objects of type T do not occupy contiguous storage, how can an array of such objects do?

edit: removed possibly distracting explanatory text

Acculturation answered 30/9, 2016 at 12:42 Comment(12)
What do you mean the object does not occupy contiguous storage? Are you talking about the padding that could be in between the member variables?Virulent
Mathematics say 'sparse arrays' query in google maybe help You??? , sorry, my English too bad, to help deeper.Electrolyte
For your first question: Because no one wants to design C++ around C stuff like memset. C structs need to work with memset for compatibility, the rest does not really matter.Raffia
Where is this from? Have you run it and not gotten 3? There is a comment that says "No guarantee that ..." but I don't know who is asserting that.Early
It is just that the standard does not positively assert that the program outputs 3. Even if there was no problem with memset (one could use placement new and char arrays only) the two ints could reside in different address spaces. ("sequences of contiguous bytes" in standardese)Acculturation
Your logic is flawed: trivially copyable or standard-layout implies contiguous bytes of storage. 1.8.5 does not forbid that non-standard-layout types are placed continously in memory.Nero
@knivil: I haven't said: "It is forbidden." I said: "It is not guaranteed."Acculturation
The call to memset() in your example only sets the very first integer in the array to 0. If you want to zero the whole array, you should use memset(a, 0, sizeof a).Trochanter
But I intended to zero a[0] only. The question is: If I do it in this way, may I inadvertently write into a[1] on machines that are slightly more exotic than a desktop PC?Acculturation
The rationale is that the standard does not want to constrain implementations of complicated classesAntakiya
@JoachimPileborg the standard permits parts of the storage required to implement object to be in completely separate memory regions (e.g. vtables)Antakiya
There are many good reasons beside object non-contiguity why memsetting a "complex" object should be UB.Slipsheet
A
2

1. This is an instance of Occam's razor as adopted by the dragons that actually write compilers: Do not give more guarantees than needed to solve the problem, because otherwise your workload will double without compensation. Sophisticated classes adapted to fancy hardware or to historic hardware were part of the problem. (hinting by BaummitAugen and M.M)

2. (contiguous=sharing a common border, next or together in sequence)

First, it is not that objects of type T either always or never occupy contiguous storage. There may be different memory layouts for the same type within a single binary.

[class.derived] §10 (8): A base class subobject might have a layout different from ...

This would be enough to lean back and be satisfied that what is happening on our computers does not contradict the standard. But let's amend the question. A better question would be:

Does the standard permit arrays of objects that do not occupy contiguous storage individually, while at the same time every two successive subobjects share a common border?

If so, this would influence heavily how char* arithmetic relates to T* arithmetic.

Depending on whether you understand the OP standard quote meaning that only the subobjects share a common border, or that also within each subobject, the bytes share a common border, you may arrive at different conclusions.

Assuming the first, you find that 'contiguously allocated' or 'stored contiguously' may simply mean &a[n]==&a[0] + n (§23.3.2.1), which is a statement about subobject addresses that would not imply that the array resides within a single sequence of contiguous bytes.

If you assume the stronger version, you may arrive at the 'element offset==sizeof(T)' conclusion brought forward in T* versus char* pointer arithmetic That would also imply that one could force otherwise possibly non-contiguous objects into a contiguous layout by declaring them T t[1]; instead of T t;

Now how to resolve this mess? There is a fundamentally ambiguous definition of the sizeof() operator in the standard that seems to be a relict of the time when, at least per architecture, type roughly equaled layout, which is not the case any more. (How does placement new know which layout to create?)

When applied to a class, the result [of sizeof()] is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. [expr.sizeof] §5.3.3 (2)

But wait, the amount of required padding depends on the layout, and a single type may have more than one layout. So we're bound to add a grain of salt and take the minimum over all possible layouts, or do something equally arbitrary.

Finally, the array definition would benefit from a disambiguation in terms of char* arithmetic, in case this is the intended meaning. Otherwise, the answer to question 1 applies accordingly.


A few remarks related to now deleted answers and comments: As is discussed in Can technically objects occupy non-contiguous bytes of storage?, non-contiguous objects actually exist. Furthermore, memseting a subobject naively may invalidate unrelated subobjects of the containing object, even for perfectly contiguous, trivially copyable objects:

#include <iostream>
#include <cstring>
struct A {
  private: int a;
  public: short i;
};
struct B :  A {
  short i;
};
int main()
{
   static_assert(std::is_trivial<A>::value , "A not trivial.");
   static_assert(not std::is_standard_layout<A>::value , "sl.");
   static_assert(std::is_trivial<B>::value , "B not trivial.");
   B object;
   object.i=1;
   std::cout<< object.B::i;
   std::memset((void*)&(A&)object ,0,sizeof(A));
   std::cout<<object.B::i;
}
// outputs 10 with g++/clang++, c++11, Debian 8, amd64     

Therefore, it is conceivable that the memset in the question post might zero a[1].i, such that the program would output 0 instead of 3.

There are few occasions where one would use memset-like functions with C++-objects at all. (Normally, destructors of subobjects will fail blatantly if you do that.) But sometimes one wishes to scrub the contents of an 'almost-POD'-class in its destructor, and this might be the exception.

Acculturation answered 5/10, 2016 at 6:20 Comment(12)
Since one can place an object in a suitably aligned character array of an appropriate size, it seems that yes, one at least "could force otherwise possibly non-contiguous objects into a contiguous layout", regardless of one's interpretation of pointer arithmetic.Slipsheet
Further, since one can manually call a destructor and then forcibly place a new object in the now-empty storage location, it seems that an implementation has no choice but use the same contiguous layout for all most-derived objects of the same type.Slipsheet
@n.m. I guess you mean placement-new, but the "non-contiguous" layout remains, there may be parts of the object that are not placed within the buffer. A vtable is a common example of this.Antakiya
How can a single type have multiple layouts (on one compiler+OS+architecture)?Wrac
@Antakiya A vtable is not a part of an object by any stretch of imagination. It typically exists before the object is created and after it is destroyed, and is shared between many objects of the same type. If you call a vtable "a part of an object", call a function "a part of a function pointer".Slipsheet
@n.m. nevertheless, that's what is meant when the standard says that non-standard-layout objects might not occupy contiguous storage.Antakiya
@Antakiya Really? Citation needed.Slipsheet
@n.m. I don't have a citation but have seen the topic discussed before.Antakiya
@Antakiya I believe some people indeed view the vtable as a part of the object. This makes no sense whatsoever to me. If this was the intent of the standard behind the wording in question, then IMO the reasoning was faulty and the wording needs to be revised. Another reason for it is that vtable has no size as far as sizeof is concerned, so there's no reason to view it as "occupying storage".Slipsheet
@n.m. The storage covered by the sizeof refers to contiguous storage from the first byte of the object, so it cannot be what is meant by non-contiguous storage.Antakiya
@M.M. "The storage covered by the sizeof refers to contiguous storage from the first byte of the object" The standard says nothing like that. "The sizeof operator yields the number of bytes in the object representation of its operand." That's it.Slipsheet
@n.m. OK [padding]Antakiya

© 2022 - 2024 — McMap. All rights reserved.