What's the point of VLA anyway?
Asked Answered
H

6

43

I understand what variable length arrays are and how they are implemented. This question is about why they exist.

We know that VLAs are only allowed within function blocks (or prototypes) and that they basically cannot be anywhere but on the stack (assuming the normal implementation): C11, 6.7.6.2-2:

If an identifier is declared as having a variably modified type, it shall be an ordinary identifier (as defined in 6.2.3), have no linkage, and have either block scope or function prototype scope. If an identifier is declared to be an object with static or thread storage duration, it shall not have a variable length array type.

Let's take a small example:

void f(int n)
{
    int array[n];
    /* etc */
}

there are two cases that need to be taken care of:

  • n <= 0: f has to guard against this, otherwise the behavior is undefined: C11, 6.7.6.2-5 (emphasis mine):

    If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. The size of each instance of a variable length array type does not change during its lifetime. Where a size expression is part of the operand of a sizeof operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated.

  • n > stack_space_left / element_size: There is no standard way of finding how much stack space is left (since there is no such thing as stack so long as the standard is concerned). So this test is impossible. Only sensible solution is to have a predefined maximum possible size for n, say N, to make sure stack overflow doesn't occur.

In other words, the programmer must make sure 0 < n <= N for some N of choice. However, the program should work for n == N anyway, so one might as well declare the array with constant size N rather than variable length n.

I am aware that VLAs were introduced to replace alloca (as also mentioned in this answer), but in effect they are the same thing (allocate variable size memory on stack).

So the question is why did alloca and consequently VLA exist and why weren't they deprecated? The only safe way to use VLAs seem to me to be with a bounded size in which case taking a normal array with the maximum size is always a viable solution.

Hagridden answered 20/3, 2014 at 10:40 Comment(25)
alloca isn't in standard. And VLA became optional in C11. Both are unsafe, but large compile-time-const-sized arrays aren't safe too.Pleasantry
@presiuslitelsnoflek, yes stack overflows can be obtained in many ways. Obviously infinite recursion is not prevented by a bounded size on variable length arrays. but unbounded size for VLA means possible stack overflow. I don't understand how that is related to The only safe way to use VLAs seem to me to be with a bounded size not being true.Hagridden
@keltar, thanks but I am aware of that. The question was why did they use alloca in the first place? and why despite such problems people kept using it to the point that it found its way into the standard?Hagridden
n >= 0 can be solved by making n a size_t (which is what you should use for sizes anyway). If the caller fills in a negative number then the subsequent mayhem is their responsibility.Bialystok
They? Who's they? alloca is great for small buffers, e.g. printf - I personally don't want it to use mallocs or other heap allocation.Pleasantry
@keltar, if no one used alloca, there wouldn't have been any reason for it to become standard (in the form of VLA), so there must have been enough people that continued using it. Also, alloca is great for small buffers, means you can guarantee that the buffers are small (if you can't, then they are not small buffers anymore), i.e. you can guarantee there is an upper bound. If you can do that, why not get a fixed size array of that size? Lastly, I do have an implementation of fprintf and I don't see why you would want to have VLA in its implementation.Hagridden
@Kninnug, yes I should have made it size_t (the standard uses int in its examples and I just used int without much thinking). Nevertheless, size_t still doesn't guard against n == 0, so you need a check anyway, be it if (n == 0) fail or if (n <= 0) fail.Hagridden
@keltar, just to be clear, I am not saying usages of VLA should be replaced by malloc. That would be folly.Hagridden
@Hagridden you don't actually know length of resulting formatted string; it could be done with iterating upon static-sized buffer, but it is tricky. As for reasons for alloca - in case of recursion it would reduce stack usage so you may perform more calls before reaching stack limit; reading input from file that is known to be short; whatever. You always have limitations; you don't have stack guards for const-sized arrays too, but you don't asking about them :-). I personally prefer alloca because it is explicitly states what programmer meant to do.Pleasantry
What's really wrong with a 0-sized VLA anyway?Pulcheria
@keltar, you don't need to know the length of the resulting string. You output char by char which gets buffered at a lower level. I.e. fputc writes to a buffer and fprintf uses fputc. Regarding recursion, VLA doesn't make sense. If your input is large, you can recurse less and if it's small you can recurse more? The requirement is often the other way around.Hagridden
This one suggests VLA can have certain performance advantages sometimes: programmers.stackexchange.com/questions/190546/…Jardine
Forced to choose between a stack buffer overflow and a segfault, many programmers appear to favor their program failing on a buffer overflow. That's a significant problem, buffer overflows are quite undiagnosable and a serious security problem, VLAs try to address that.Windowsill
@harold, I'd think it would be the fact that the 0-size array can't contain anything. If you access any index of such an array, it causes undefined behavior. So, what's the point of having one?Hagridden
@Hagridden to avoid having to special-case it - for example, it's quite natural to allocate something with n elements and then iterate over n elements, if n = 0 that's still no problem. And anyway "what's the point" doesn't argue strongly in favor of banning something.Pulcheria
@HansPassant, can you expand a bit? If you need such large memory that you risk stack overflow, you'd have to use malloc anyway. I mean, you don't choose between stack overflow and segfault. Both are bad. You can do without either.Hagridden
@Hagridden more like "you can call it with small data even if you're already very deep in recursion (caused by other functions)". C is very close to hardware, it is great when it allows you do something even if you wouldn't actually use it. But it isn't always good if it is implicit, sadly.Pleasantry
@harold, yes I understand and I do agree. But probably that complicates things somewhat. Take this for example int array[N] = {0}. If the standard asks N to be strictly positive, all is well. If it allows N to be zero, it has to then create a corner case saying in such a case initialization is not possible. That was for fixed-size arrays (C11, 6.7.6.2-1). Probably due to symmetry, they added the same rule for VLA.Hagridden
Without VLA, alloca() and recursion, an upper bound on stack depth may be determined before the program is even run. Thus adequate stack space may be assigned on program invocation. Recursion prevents simple analysis on the needed stack depth. Yet recursion is allowed (in many environments). IMO, use of VLA fundamentally incurs no more risk than recursion. If code does not want the risks of VLA, forbid recursion too.Caracul
@chux, I had been actually thinking about it. To be honest, unless you can make sure the recursive function is tail recursive, it's not usable for any reasonably large data set. Many of the graph algorithms for example (a prime example of recursive functions) end up being implemented as non-recursive for this reason. In functional languages, it's of utmost importance to make the functions tail-recursive. So no, recursion is not such a great tool either. However, recursion comes naturally with the way functions are called. Forbidding them is rather artificial.Hagridden
Another common case of (theoretically) recursive functions are parsers, but they also end up being implemented as non-recursive in any serious parser.Hagridden
@chux, Recursion would be useful if the problem is inherently limited. For example, in a GUI, you won't have a huge recursion over widgets, simply because no one is that crazy to design such a GUI. But in general, recursion is not much of a good solution.Hagridden
Forbidding recursion in not artificial. Common to bar them in embedded designs. But my main point is not the merits/failings of recursion, but that recursion and VLA fundamentally challenge the stack in a similar fashion.Caracul
technically, VLA's don't have to be on the stackBailsman
There isn't robust security model around VLAs and your code is exposed to hard-to-diagnose bugs. One of the well experienced C project is Linux kernel, which is free of VLAs since 2018; kernel devs decided to wipe clean all VLA usages because the headaches (security, performance-inconsistencies etc.) overweigh the benefits. Note: LLVM community has also pushed back on VLA, so clang only supports them to the extent of C99, not beyond that (VLA fields in struct etc. which GCC supports). With these kinds of discussions and cases brought to light, committee decided to make VLA optional in ISO C11.Stinkhorn
C
94

For reasons that are not entirely clear to me, almost every time the topic of C99 VLA pops up in a discussion, people start talking predominantly about the possibility of declaring run-time-sized arrays as local objects (i.e. creating them "on the stack"). This is rather surprising and misleading, since this facet of VLA functionality - support for local array declarations - happens to be a rather auxiliary, secondary capability provided by VLA. It does not really play any significant role in what VLA can do. Most of the time, the matter of local VLA declarations and their accompanying potential pitfalls is forced into the foreground by VLA critics, who use it as a "straw man" intended to derail the discussion and bog it down among barely relevant details.

The essence of VLA support in C is, first and foremost, a revolutionary qualitative extension of the language's concept of type. It involves the introduction of such fundamentally new kind of types as variably modified types. Virtually every important implementation detail associated with VLA is actually attached to its type, not to the VLA object per se. It is the very introduction of variably modified types into the language that makes up the bulk of the proverbial VLA cake, while the ability to declare objects of such types in local memory is nothing more than a insignificant and fairly inconsequential icing on that cake.

Consider this: every time one declares something like this in one's code

/* Block scope */
int n = 10;
...
typedef int A[n];
...
n = 5; /* <- Does not affect `A` */

size-related characteristics of the variably modified type A (e.g. the value of n) are finalized at the exact moment when the control passes over the above typedef-declaration. Any changes in the value of n made further down the line (below this declaration of A) don't affect the size of A. Stop for a second and think about what it means. It means that the implementation is supposed to associate with A a hidden internal variable, which will store the size of the array type. This hidden internal variable is initialized from n at run time when the control passes over the declaration of A.

This gives the above typedef-declaration a rather interesting and unusual property, something we haven't seen before: this typedef-declaration generates executable code (!). Moreover, it doesn't just generate executable code, it generates critically important executable code. If we somehow forget to initialize the internal variable associated with such typedef-declaration, we'll end up with a "broken"/uninitialized typedef alias. The importance of that internal code is the reason why the language imposes some unusual restrictions on such variably modified declarations: the language prohibits passing control into their scope from outside of their scope

/* Block scope */
int n = 10;
goto skip; /* Error: invalid goto */

typedef int A[n];

skip:;

Note once again that the above code does not define any VLA arrays. It simply declares a seemingly innocent alias for a variably modified type. Yet, it is illegal to jump over such typedef-declaration. (We are already familiar with such jump-related restrictions in C++, albeit in other contexts).

A code-generating typedef, a typedef that requires run-time initialization is a significant departure from what typedef is in the "classic" language. (It also happens to pose a significant hurdle of the way of adoption of VLA in C++.)

When one declares an actual VLA object, in addition to allocating the actual array memory the compiler also creates one or more hidden internal variables, which hold the size(s) of the array in question. One has to understand that these hidden variables are associated not with the array itself, but rather with its variably modified type.

One important and remarkable consequence of this approach is as follows: the additional information about array size, associated with a VLA, is not built directly into the object representation of the VLA. It is actually stored besides the array, as "sidecar" data. This means that object representation of a (possibly multidimensional) VLA is fully compatible with object representation of an ordinary classic compile-time-sized array of the same dimensionality and the same sizes. For example

void foo(unsigned n, unsigned m, unsigned k, int a[n][m][k]) {}
void bar(int a[5][5][5]) {}

int main(void)
{
  unsigned n = 5;
  int vla_a[n][n][n];
  bar(a);

  int classic_a[5][6][7];
  foo(5, 6, 7, classic_a); 
}

Both function calls in the above code are perfectly valid and their behavior is fully defined by the language, despite the fact that we pass a VLA where a "classic" array is expected, and vice versa. Granted, the compiler cannot control the type compatibility in such calls (since at least one of the involved types is run-time-sized). However, if desired, the compiler (or the user) has everything necessary to perform the run-time check in debug version of code.

(Note: As usual, parameters of array type are always implicitly adjusted into parameters of pointer type. This applies to VLA parameter declarations exactly as it applies to "classic" array parameter declarations. This means that in the above example parameter a actually has type int (*)[m][k]. This type is unaffected by the value of n. I intentionally added a few extra dimensions to the array to maintain its dependence on run-time values.)

Compatibility between VLA and "classic" arrays as function parameters is also supported by the fact that the compiler does not have to accompany a variably modified parameter with any additional hidden information about its size. Instead, the language syntax forces the user to pass this extra information in the open. In the above example the user was forced to first include parameters n, m and k into function parameter list. Without declaring n, m and k first, the user would not have been able to declare a (see also the above note about n). These parameters, explicitly passed into the function by the user, will bring over the information about the actual sizes of a.

For another example, by taking advantage of VLA support we can write the following code

#include <stdio.h>
#include <stdlib.h>

void init(unsigned n, unsigned m, int a[n][m])
{
  for (unsigned i = 0; i < n; ++i)
    for (unsigned j = 0; j < m; ++j)
      a[i][j] = rand() % 100;
}

void display(unsigned n, unsigned m, int a[n][m])
{
  for (unsigned i = 0; i < n; ++i)
    for (unsigned j = 0; j < m; ++j)
      printf("%2d%s", a[i][j], j + 1 < m ? " " : "\n");
  printf("\n");
}

int main(void) 
{
  int a1[5][5] = { 42 }; 
  display(5, 5, a1);
  init(5, 5, a1);
  display(5, 5, a1);

  unsigned n = rand() % 10 + 5, m = rand() % 10 + 5;
  int (*a2)[n][m] = malloc(sizeof *a2);
  init(n, m, *a2);
  display(n, m, *a2);
  free(a2);
}

This code is intended to draw your attention to the following fact: this code makes heavy use of valuable properties of variably modified types. It is impossible to implement elegantly without VLA. This is the primary reason why these properties are desperately needed in C to replace the ugly hacks that were used in their place previously. Yet at the same time, not even a single VLA is created in local memory in the above program, meaning that this popular vector of VLA criticism is not applicable to this code at all.

Basically, the two last examples above is a concise illustration of what the point of VLA support is.

Chiropteran answered 12/1, 2019 at 20:9 Comment(12)
Given the obvious utility of VLA types, it is frustrating that they were made optional with C11.Arsenite
Thanks for this. The points about VLA's type were enlightening. The examples were also helpful to remind of the problem with writing "generic matrix math functions" and realize that VLAs are the answer.Hagridden
This answer demonstrates the large implications for the C type system that VLAs have. See stackoverflow.com/a/21519062 for further discussion of this.Suite
@Suite So, they're too difficult for MS and C++? ;-)Unclear
@AndrewHenle In the sense that VLAs are incompatible with existing C++ features. C++ relies on types way more than C does. What does it even mean for overload resolution to fail at runtime?Mender
@Mender Microsoft has a C compiler, too, that's not standard-compliant. Note that this question does not have a C++ tag.Unclear
@AndrewHenle Since C11 it is allowed not to implement VLAs. And I was responding to your comment that it was "too difficult" for C++.Mender
@Mender Now explain Microsoft and C99. This century is almost 1/4 over, and Microsoft still doesn't even support last century's C standardUnclear
@AndrewHenle Microsoft thought it was a better business decision to not conform to the standard than to implement VLAs.Mender
@Mender And the reason they concocted for that is dispelled by this very answer. Care to venture what century Microsoft will finally pretend they're supporting C23 with its mandatory reintroduction of VLAs? That addresses Microsoft's stated concerns with C99 VLAs?Unclear
Let us continue this discussion in chat.Mender
Your second to last example: void foo(unsigned n,... does not compile.Resolutive
H
25

Looking at the comments and the answers, it seems to me that VLAs are useful when you know that normally your input is not too big (similar to knowing your recursion is probably not too deep), but you don't actually have an upper bound, and you would generally ignore the possible stack overflow (similar to ignoring them with recursion) hoping they don't happen.

It may actually be not an issue altogether either, for example if you have unlimited stack size.

That said, here's another use for them I have found which doesn't actually allocate memory on stack, but makes working with dynamic multi-dimensional arrays easier. I'll demonstrate by a simple example:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    size_t n, m;

    scanf("%zu %zu", &n, &m);

    int (*array)[n][m] = malloc(sizeof *array);

    for (size_t i = 0; i < n; ++i)
        for (size_t j = 0; j < m; ++j)
            (*array)[i][j] = i + j;

    free(array);
    return 0;
}
Hagridden answered 19/1, 2015 at 10:32 Comment(2)
Why size_t i = 0; in the 4 loops?Secund
@0People, size_t because n and m are size_t (because they refer to a size). Is there anything else that's not clear?Hagridden
M
15

Despite of all the points you mentioned about VLA, the best part of VLA is that the compiler automatically handles the storage management and the complexities of index calculations of arrays whose bounds are not compile-time constants.
If you want local dynamic memory allocation then the only option is VLA.

I think this could be the reason that VLA is adopted in C99 (optional on C11).


One thing I want to clear that is there are some remarkable differences between alloca and VLA. This post points out the differences:

  • The memory alloca() returns is valid as long as the current function persists. The lifetime of the memory occupied by a VLA is valid as long as the VLA's identifier remains in scope.
  • You can alloca() memory in a loop for example and use the memory outside the loop, a VLA would be gone because the identifier goes out of scope when the loop terminates.
Macrocosm answered 20/3, 2014 at 10:50 Comment(7)
Actually, rather than “VLA is adopted in C99 and later”, the history of VLA is that is was adopted in C99 and made optional in C11.Feet
It also correctly handles sizeof, which is arguably either good or bad thing.Pleasantry
@PascalCuoq; I am not sure about C11.Macrocosm
@Macrocosm See en.wikipedia.org/wiki/…Feet
@haccks, C11, 6.7.6.2-4 says ... Variable length arrays are a conditional feature that implementations need not support. Back to my question, can you name an example where VLA can be safely used (making sure it can't cause stack overflow, at least with a reasonable ulimit setting) and an array with constant size couldn't replace it?Hagridden
@Shahbaz; Yes. Got it now :)Macrocosm
@Hagridden If you have a recursive algorithm which operates only on a part of the given buffer and due to some reasons must copy it. Then the needed stack space would be datalength * recursiondepth, while with VLAs it can (and will) be less, allowing for deeper recursion.Austroasiatic
V
9

Your argument seems to be that since one has to bound check the size of the VLA, why not just allocate the maximum size and be done with the runtime allocation.

That argument overlooks the fact that memory is a limited resource in the system, shared between many processes. Memory wastefully allocated in one process is not available to any other (or perhaps it is, but at the expense of swapping to disk).

By the same argument we would not need to malloc an array at run time when we could statically allocate the maximum size that could be needed. In the end heap exhaustion is only slightly preferable to stack overflow.

Vishinsky answered 20/3, 2014 at 11:13 Comment(5)
Ok that somewhat makes sense. There are two differences that I think are significant. One is that stack size is often very small, while heap is quite large. On microcontrollers, often heap doesn't exist and you have strict bounds on data size, so VLAs on microcontrollers aren't used anyway. In other words, wasting some stack size is not a big deal, while wasting half of the heap is.Hagridden
Second is that if you run out of heap memory, malloc nicely returns NULL and you can take action upon it. You can nicely tell the user e.g. that the operation is not possible or gracefully save some states, give back resources and return to a previous state that is good. With stack overflow you just die, which is bad.Hagridden
malloc can return non-NULL and then cause SIGSEGV when memory mapping will fail, so it isn't too reliable.Pleasantry
@Hagridden that is probably why VLAs are optional in more recent standards. In a server / desktop the stack size can grow (in a single threaded program, possibly until it hits the heap).Vishinsky
This argument doesn't fly on any modern operating system. You allocate virtual memory, not RAM. Over-allocating doesn't cost anything.Windowsill
C
3

Stack allocation (a so VLA allocation) is VERY fast, just requires a quick modification to the stack pointer (typically a single CPU instuction). No need for expensive heap allocation/deallocation.

But, why not just use a constant size array instead?

Let suppose you are writing a high performance code, and you need a variable size buffer, let say between 8 and 512 elements. You can just declare a 512 elements array, but if most of the times you only require 8 elements then overallocating can affect the performance due to affecting the cache locality in the stack memory. Now imagine this function has to be called millions of times.

Another example, imagine your function (with a local VLA) is recursive, you know beforehand that in any moment the total size of all recursively allocated VLAs is limited (i.e. the arrays have variable size, but the sum of all sizes is bounded). In this case, if you use the maximun possible size as fixed local array size you may allocate much more memory than otherwise required, making your code slower (due to cache misses) and even causing stack overflows.

Cabstand answered 5/1, 2021 at 3:17 Comment(1)
The problem with these examples is that they are theoretical (and often made up to exaggerate the perceived benefit of the feature). Like what high performance code works with variable but bounded sized data and requires an array allocation? What recursive algorithm works with variable sized arrays, yet has guarantees that there won't be a stack overflow regardless of input size?Hagridden
M
3

VLAs do not have to allocate any memory or only stack memory. They are very handy in many aspects of programming.

Some examples

  1. Used as function parameters.
int foo(size_t cols, int (*array)[cols])
{
    //access as normal 2D array
    prinf("%d", array[5][6]);
    /* ... */
}
  1. Allocate 2D(or more) array dynamically
inr foo(size_t rows, size_t cols)
{
    int (*array)[cols] = malloc(rows * sizeof(*array));
    /* ... */
    //access as normal 2D array
    prinf("%d", array[5][6]);
    /* ... */
Mortician answered 7/1, 2022 at 20:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.