Does malloc() allocate a contiguous block of memory?
Asked Answered
R

14

40

I have a piece of code written by a very old school programmer :-) . it goes something like this

typedef struct ts_request
{ 
  ts_request_buffer_header_def header; 
  char                         package[1]; 
} ts_request_def; 

ts_request_def* request_buffer = 
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));

the programmer basically is working on a buffer overflow concept. I know the code looks dodgy. so my questions are:

  1. Does malloc always allocate contiguous block of memory? because in this code if the blocks are not contiguous, the code will fail big time

  2. Doing free(request_buffer) , will it free all the bytes allocated by malloc i.e sizeof(ts_request_def) + (2 * 1024 * 1024), or only the bytes of the size of the structure sizeof(ts_request_def)

  3. Do you see any evident problems with this approach, I need to discuss this with my boss and would like to point out any loopholes with this approach

Romance answered 9/3, 2009 at 7:16 Comment(2)
Is it not the same pattern as this https://mcmap.net/q/333479/-how-to-include-a-dynamic-array-inside-a-struct-in-cPibroch
"the blocks" -- This question assumes that malloc (and free) can distinguish the addends of its argument and produce two "blocks" because there's a + in the calculation, which is obviously absurd.Rockrose
U
55

To answer your numbered points.

  1. Yes.
  2. All the bytes. Malloc/free doesn't know or care about the type of the object, just the size.
  3. It is strictly speaking undefined behaviour, but a common trick supported by many implementations. See below for other alternatives.

The latest C standard, ISO/IEC 9899:1999 (informally C99), allows flexible array members.

An example of this would be:

int main(void)
{       
    struct { size_t x; char a[]; } *p;
    p = malloc(sizeof *p + 100);
    if (p)
    {
        /* You can now access up to p->a[99] safely */
    }
}

This now standardized feature allowed you to avoid using the common, but non-standard, implementation extension that you describe in your question. Strictly speaking, using a non-flexible array member and accessing beyond its bounds is undefined behaviour, but many implementations document and encourage it.

Furthermore, gcc allows zero-length arrays as an extension. Zero-length arrays are illegal in standard C, but gcc introduced this feature before C99 gave us flexible array members.

In a response to a comment, I will explain why the snippet below is technically undefined behaviour. Section numbers I quote refer to C99 (ISO/IEC 9899:1999)

struct {
    char arr[1];
} *x;
x = malloc(sizeof *x + 1024);
x->arr[23] = 42;

Firstly, 6.5.2.1#2 shows a[i] is identical to (*((a)+(i))), so x->arr[23] is equivalent to (*((x->arr)+(23))). Now, 6.5.6#8 (on the addition of a pointer and an integer) says:

"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

For this reason, because x->arr[23] is not within the array, the behaviour is undefined. You might still think that it's okay because the malloc() implies the array has now been extended, but this is not strictly the case. Informative Annex J.2 (which lists examples of undefined behaviour) provides further clarification with an example:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

Underlying answered 9/3, 2009 at 8:40 Comment(7)
+1, for the flexible and zero-length arrays. You could maybe also add that the benefit of this practice is that you save the memory for one pointer and reduce it to only one (expensive) allocation.Weirdo
I disagree about undefined behaviour. malloc() is guaranteed to return continuous block of memory so you can safely access memory beyond the struct using either pointer arithmetics of array index - according to the standard they are the same. So it is defined behaviour.Eve
@qrdl: The standard specifically disallows accessing beyond the array. I have edited my post to explain why it's undefined.Underlying
@Chris: This may be nitpicking, but as I understand it malloc will allocate contiguous virtual memory space, but the actual physical memory which backs it may not be contiguous. At least that's how it seems to work in Linux, AFAIK.Catfish
@Robert S. Barnes: You are not incorrect, but the physical layout is entirely irrelevant to the C standard. It only matters that it appears contiguous to the program when accessed in a well-defined manner. It's equally true and irrelevant to point out that the memory might not be contiguous because it may span several pieces of silicon.Underlying
For char types this is not UB.Longstanding
The array size is declared to be 1. Therefore, using any array index other than 0 is undefined behaviour. Therefore, the compiler can assume that every array index is 0.Haematoblast
H
12

3 - That's a pretty common C trick to allocate a dynamic array at the end of a struct. The alternative would be to put a pointer into the struct and then allocate the array separately, and not forgetting to free it too. That the size is fixed to 2mb seems a bit unusual though.

Hayner answered 9/3, 2009 at 7:41 Comment(2)
thanks a lot for your comments . basically we receive data from socket.we do not know the exact size we are going to receive and have capped it at 2 MB . the data we receive is copied into this structure . This change was done because this was the one with the min impact.Romance
@unknown (google), if the size is fixed, you can also change the array size from 1 to your fixed size. This trick makes only sense for arrays with variable lengths.Weirdo
E
9

This is a standard C trick, and isn't more dangerous that any other buffer.

If you are trying to show to your boss that you are smarter than "very old school programmer", this code isn't a case for you. Old school not necessarily bad. Seems the "old school" guy knows enough about memory management ;)

Eve answered 9/3, 2009 at 8:23 Comment(0)
E
8

1) Yes it does, or malloc will fail if there isn't a large enough contiguous block available. (A failure with malloc will return a NULL pointer)

2) Yes it will. The internal memory allocation will keep track of the amount of memory allocated with that pointer value and free all of it.

3)It's a bit of a language hack, and a bit dubious about it's use. It's still subject to buffer overflows as well, just may take attackers slightly longer to find a payload that will cause it. The cost of the 'protection' is also pretty hefty (do you really need >2mb per request buffer?). It's also very ugly, although your boss may not appreciate that argument :)

Elanorelapid answered 9/3, 2009 at 7:23 Comment(0)
H
5

I don't think the existing answers quite get to the essence of this issue. You say the old-school programmer is doing something like this;

typedef struct ts_request
{ 
  ts_request_buffer_header_def header; 
  char                         package[1]; 
} ts_request_def;

ts_request_buffer_def* request_buffer = 
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));

I think it's unlikely he's doing exactly that, because if that's what he wanted to do he could do it with simplified equivalent code that doesn't need any tricks;

typedef struct ts_request
{ 
  ts_request_buffer_header_def header; 
  char                         package[2*1024*1024 + 1]; 
} ts_request_def;

ts_request_buffer_def* request_buffer = 
malloc(sizeof(ts_request_def));

I'll bet that what he's really doing is something like this;

typedef struct ts_request
{ 
  ts_request_buffer_header_def header; 
  char                         package[1]; // effectively package[x]
} ts_request_def;

ts_request_buffer_def* request_buffer = 
malloc( sizeof(ts_request_def) + x );

What he wants to achieve is allocation of a request with a variable package size x. It is of course illegal to declare the array's size with a variable, so he is getting around this with a trick. It looks as if he knows what he's doing to me, the trick is well towards the respectable and practical end of the C trickery scale.

Hogg answered 10/3, 2009 at 23:19 Comment(0)
G
3

As for #3, without more code it's hard to answer. I don't see anything wrong with it, unless its happening a lot. I mean, you don't want to allocate 2mb chunks of memory all the time. You also don't want to do it needlessly, e.g. if you only ever use 2k.

The fact that you don't like it for some reason isn't sufficient to object to it, or justify completely re-writing it. I would look at the usage closely, try to understand what the original programmer was thinking, look closely for buffer overflows (as workmad3 pointed out) in the code that uses this memory.

There are lots of common mistakes that you may find. For example, does the code check to make sure malloc() succeeded?

Grilled answered 9/3, 2009 at 7:28 Comment(0)
T
3

The exploit (question 3) is really up to the interface towards this structure of yours. In context this allocation might make sense, and without further information it is impossible to say if it's secure or not.
But if you mean problems with allocating memory bigger than the structure, this is by no means a bad C design (I wouldn't even say it's THAT old school... ;) )
Just a final note here - the point with having a char[1] is that the terminating NULL will always be in the declared struct, meaning there can be 2 * 1024 * 1024 characters in the buffer, and you don't have to account for the NULL by a "+1". Might look like a small feat, but I just wanted to point out.

Treehopper answered 9/3, 2009 at 7:37 Comment(2)
Also, the standard doesn't allow arrays of size 0, though some compilers do.Hayner
No he can't; a char * would address memory somewhere else completely, instead of contiguous with the structure. For C99, the proper declaration for this is a flexible-size array "char package[]". But pretty much any compiler supporting that also supports the GNU extension for size 0.Tinnitus
L
3

I've seen and used this pattern frequently.

Its benefit is to simplify memory management and thus avoid risk of memory leaks. All it takes is to free the malloc'ed block. With a secondary buffer, you'll need two free. However one should define and use a destructor function to encapsulate this operation so you can always change its behavior, like switching to secondary buffer or add additional operations to be performed when deleting the structure.

Access to array elements is also slightly more efficient but that is less and less significant with modern computers.

The code will also correctly work if memory alignment changes in the structure with different compilers as it is quite frequent.

The only potential problem I see is if the compiler permutes the order of storage of the member variables because this trick requires that the package field remains last in the storage. I don't know if the C standard prohibits permutation.

Note also that the size of the allocated buffer will most probably be bigger than required, at least by one byte with the additional padding bytes if any.

Lilylivered answered 9/3, 2009 at 8:29 Comment(1)
The C standard requires members to be in the order you put them in the struct. However, it's undefined behaviour for reasons I explained in my answer.Underlying
A
3

Yes. malloc returns only a single pointer - how could it possibly tell a requester that it had allocated multiple discontiguous blocks to satisfy a request?

Abrego answered 11/3, 2009 at 0:47 Comment(2)
Right, that is the job for the OS and virtual memory through the MMU. The actual physical blocks of RAM are likely all over the place.Missymist
"void *malloc(size_t size); The malloc() function allocates size bytes and returns a pointer to one of them." Ok, I have made that up :)Introgression
S
2

Would like to add that not is it common but I might also called it a standard practice because Windows API is full of such use.

Check the very common BITMAP header structure for example.

http://msdn.microsoft.com/en-us/library/aa921550.aspx

The last RBG quad is an array of 1 size, which depends on exactly this technique.

Sheela answered 9/3, 2009 at 8:38 Comment(0)
S
2

In response to your third question.

free always releases all the memory allocated at a single shot.

int* i = (int*) malloc(1024*2);

free(i+1024); // gives error because the pointer 'i' is offset

free(i); // releases all the 2KB memory
Subdebutante answered 13/3, 2009 at 7:31 Comment(0)
E
2

This common C trick is also explained in this StackOverflow question (Can someone explain this definition of the dirent struct in solaris?).

Extern answered 13/3, 2009 at 7:37 Comment(0)
C
1

The answer to question 1 and 2 is Yes

About ugliness (ie question 3) what is the programmer trying to do with that allocated memory?

Chenab answered 9/3, 2009 at 7:22 Comment(0)
B
0

the thing to realize here is that malloc does not see the calculation being made in this

malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));

Its the same as

  int sz = sizeof(ts_request_def) + (2 * 1024 * 1024);
   malloc(sz);

YOu might think that its allocating 2 chunks of memory , and in yr mind they are "the struct", "some buffers". But malloc doesnt see that at all.

Beverle answered 5/9, 2017 at 21:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.