Allocating less memory than the specified size of a pointer-to-array

Asked 31/10, 2021 at 15:6 Answered 31/10, 2021 at 18:49

c malloc language-lawyer pointer-to-array

In C, is it "legal" to under-allocate memory to a pointer-to-array if we then only access elements that fall within the allocated memory? Or does this invoke undefined behavior?

int (*foo)[ 10 ];                  //Pointer to array of 10 ints
foo = malloc( sizeof( int ) * 5 ); //Under-allocation! 
                                   //Only enough memory for 5 ints
//Now we only ever access (*foo)[ 0 - 4 ]

If this, in and of itself, is not undefined behavior, then could accessing another, unrelated object whose memory address happens to fall within the address space of unallocated part of the array cause a strict-aliasing violation?

Tenaculum answered 31/10, 2021 at 15:6 Comment(5)

Hmm. Interestingly, the code analyser for clang-cl warns about your malloc line: warning GED7FF984: Result of 'malloc' is converted to a pointer of type 'int [10]', which is incompatible with sizeof operand type 'int' [clang-analyzer-unix.MallocSizeof]. And a similar warning for foo = malloc(sizeof(int[5])); (but not when using 10 in that). – Churchlike 31/10, 2021 at 15:33

The native MSVC code analyser doesn't give a warning. But note that the presence/absence of a diagnostic (even by clang) doesn't (dis)prove UB. – Churchlike 31/10, 2021 at 15:36

I think technically it is undefined. Consider int (*foo)[16] and setting the first 12 elements to some value. By operating on *foo, you will have told the compiler there is an int [16] object there, meaning 64 bytes have been reserved for it. So the compiler might decide to use a nice fast AVX-512 instruction to store to the first 12 elements. That instruction also stores to the last 4 elements, but the compiler figures this is okay because their values are indeterminate, so it can make them anything it wants. Then the compiler is overwriting unallocated memory. It may even be unmapped. – Zeppelin 31/10, 2021 at 15:36

@EricPostpischil in the 80s there were lots of universities where students learned C using interpreters of C. Nowadays it is common to say C compiler, as if the language and the compiler would be the same :). – Darling 1/11, 2021 at 11:25

There is no operating on *foo. – Mcminn 18/11, 2021 at 19:40

This is undefined behavior.

foo is supposed to point to an object (or the first of an array of objects) of type int[10]. This is considered an object of array type, defined in section 6.2.5p20 of the C standard

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T , the array type is sometimes called ‘‘array of T ’’. The construction of an array type from an element type is called ‘‘array type derivation’’

The part I've highlighted in bold is the important part. An int[10] is therefore a contiguously allocated set of 10 objects of type int.

You don't allocate enough space, so the expression *foo which has type int[10] accesses an object of that type, but doing so reads past the end of an allocated memory segment.

Door answered 31/10, 2021 at 15:36 Comment(4)

This being a [language-lawyer] question, it would be appropriate to include citations to the sections of the spec that support this position. – Superhighway 31/10, 2021 at 15:38

@JohnBollinger: I added the language-lawyer tag because it seemed in the nature of the question, but OP ought to confirm or disavow their interest. – Zeppelin 31/10, 2021 at 15:40

@EricPostpischil I think the language-lawyer tag fits here :) I'm interested in this specific point and what the standards say about it, not finding a solution to a tangentially related problem. – Tenaculum 31/10, 2021 at 15:44

I object that the expression *foo which has type int[10] accesses an object of that type not any more than with int bar[10] the expression bar accesses the object, namely not at all, since *foo as well as bar are implicitly converted to a pointer to the first element before any access could take place. – Mcminn 18/11, 2021 at 19:58

As @dbush describes in his answer, an array is defined to be a contiguously allocated non-empty set of objects of the element type (C17 6.2.5/20). Clearly, then, malloc( sizeof( int ) * 5 ) does not allocate enough space for an int[10].

But I found it difficult to formally support the last part of that answer, claiming that the size differential makes (for example) (*foo)[4] have undefined behavior. That conclusion seems plausible, but where does the standard actually say so?

One of the main problems here is that (dynamically) allocated objects have no declared type, only, under some circumstances, an effective type determined by how they are and have been accessed. (C17 6.5/6 and footnote 88). We do know that on success, malloc(n) returns a pointer to an object of size n (C17 7.22.3.4/2), but how do we attribute undefined behavior specifically to the association with that object of an effective type describing objects of size larger than n?

I ultimately decided that the best way to connect the dots is as follows. Suppose that o is an allocated object of size n, T is a complete type having sizeof(T) > n, and o is read or written via an lvalue of type T. Then paragraph 6.5/6 attributes effective type T to object o, but because o's size is insuficient we must conclude that its representation constitutes a trap representation of type T (C17 3.19.4). Paragraph 6.2.6.1/5 then reiterates the definition of "trap representation" and gets us to where we want to go:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

(Emphasis added.)

Superhighway answered 31/10, 2021 at 18:49 Comment(18)

This is the correct answer. The undefined behavior comes from a trap representation. – Darling 1/11, 2021 at 10:31

I disagree that this qualifies as a trap representation. Just because a read or write extends past allocated memory doesn't mean that the representation read/written is a trap representation. Assuming that an int is twos complement with no padding bits, there's no trap representation whether or not access goes into unallocated space. – Door 1/11, 2021 at 15:40

@dbush, it is not a trap representation in any generalized sense. In fact, there is no such thing as a trap representation independent of a data type. But if the representation of the allocated object is interpreted as a value of type T, then it is necessarily a trap representation of that type (I claim) because it is too small to be otherwise. Else we need to invent a third category, neither valid representation nor trap representation, and we also need a different foundation in the spec for the access having UB. – Superhighway 1/11, 2021 at 15:50

It is not about the read or write in particular; it is about the size of the object being accessed. – Superhighway 1/11, 2021 at 15:54

I also try to understand.A trap representation happens when you put invalid data into a CPU register, not into memory. The bits corresponding to the last 5 integer values of the array type are random and can trigger traps if they are placed in some registers. So it is like having an array with uninitialised values. This is a trap representation. On the other hand it is like accessing a memory location beyond the dynamic allocated array with malloc. Which one of these possibilities stands? – Darling 1/11, 2021 at 17:5

@alinsoar, I am not suggesting that individual ints have trap representations in this case. I am saying that the representation of the overall array is a trap representation. – Superhighway 1/11, 2021 at 17:38

@JohnBollinger I understand this but I don't think you can move everything in a register, by using the full array as an lvalue. The trap representation is a concept linked to the concept of CPU register. Maybe the C designers thought about vector machines as well? Can a vector machine trigger an error when you put an invalid array in a given register? – Darling 1/11, 2021 at 17:44

@alinsoar, registers have nothing to do with it. They are not a C-language concept. Even the register keyword does not depend on CPU registers, and the C definition of "trap representation" certainly does not depend in any way on CPU registers. – Superhighway 1/11, 2021 at 18:21

@JohnBollinger The register keyword has nothing to do here. It is about the concept of lvalue. This is linked to the concept of register. In all known architectures the traps are triggered when you store some data in some registers. Some system trap is triggered. In the c language there is no lvalue for the full array. So any array operation won't trigger a trap. But in this case it is meaningful to talk about trap representation for an array. – Darling 1/11, 2021 at 19:56

Again, @alinsoar, the C language has no concept of a CPU register. Furthermore, although C's concept of a trap representation may be inspired by the behavior of various hardware, it is not tied to hardware in any way. In particular, it is not defined in terms of causing the processor to perform a trap. "Trap Representation" is a defined term in C, so it is the specification's own definition that governs. Definitions or common use of the same term in other contexts are not directly relevant. – Superhighway 1/11, 2021 at 20:19

@JohnBollinger from practical viewpoint I expected the C trap representation to be linked to some invalid value that comes into some CPU registers. To what concretely is that linked? – Darling 1/11, 2021 at 21:25

@alinsoar, as this answer already says, the C definition "trap representation" is given in C17 section 3.19.4. That is: "an object representation that need not represent a value of the object type". That's it. You will recognize that that is reiterated at the beginning of paragraph 6.2.6.1/5, quoted above. If by "concretely linked" you are asking about an association with hardware architecture or behavior then there is no concrete link. Neither is there a concrete link for much of anything else in the specification. Creating such links is the role of implementations. – Superhighway 1/11, 2021 at 21:40

@JohnBollinger I think you are talking about option b, here: open-std.org/jtc1/sc22/wg14/www/docs/… – Darling 3/11, 2021 at 2:44

@alinsoar, no not really. The proposal you linked is about changing the sections of the spec that I'm relying upon. I'm talking about what the spec says now. – Superhighway 4/11, 2021 at 12:35

@JohnBollinger aha, thank you. Actually the topic of trap representations was in my focus a few times in the past and I still do not have a clear idea about it, even if I have read quite a lot about it. Can you link me some story-like article that is clearly explained ? – Darling 4/11, 2021 at 12:48

Sorry, @alinsoar, I have no such article to recommend to you. But I also don't understand the problem. As long as you are careful to maintain the distinction between the C language meaning of "trap representation" and other, related meanings, there really isn't much to the C definition. A trap representation of a type T is a bit pattern that does not represent a value of type T. That is, the bit pattern is invalid in some way with respect to T. – Superhighway 4/11, 2021 at 14:6

Earlier, you were describing bit patterns that cause a processor to perform a trap under certain circumstances. These are also called "trap representations", but in a different sense, not the C language sense. On specific machines, there may well be overlap between the values that are trap representations in one sense and those that are trap representions in the other sense, but it is important to avoid construing the two senses as equivalent. – Superhighway 4/11, 2021 at 14:10

It seems to me that this best way to connect the dots is a dead end where it says "read or written via an lvalue of type T", as in the case at hand T is an array type. – Mcminn 18/11, 2021 at 20:10

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags