How come std::initializer_list is allowed to not specify size AND be stack allocated at the same time?
Asked Answered
S

4

20

I understand from here that std::initializer_list doesn't need to allocate heap memory. Which is very strange to me since you can take in an std::initializer_list object without specifying the size whereas for arrays you always need to specify the size. This is although initializer lists internally almost the same as arrays (as the post suggests).

What I have a hard time wrapping my head around is that with C++ as a statically typed language, the memory layout (and size) of every object must be fixed at compile time. Thus, every std::array is another type and we just spawn those types from a common template. But for std::initializer_list, this rule apparently doesn't apply, as the memory layout (while it can be derived from the arguments passed to its constructor) doesn't need to be considered for a receiving function or constructor. This makes sense to me only if the type heap allocates memory and only reserves storage to manage that memory. Then the difference would be much like std::array and std::vector, where for the later you also don't need to specify size.

Yet std::initializer_list doesn't use heap allocations, as my tests show:

#include <string>
#include <iostream>

void* operator new(size_t size)
{
    std::cout << "new overload called" << std::endl;    
    return malloc(size);
}


template <typename T>
void foo(std::initializer_list<T> args)
{
    for (auto&& a : args)
    std::cout << a << std::endl;
}

int main()
{
    foo({2, 3, 2, 6, 7});

    // std::string test_alloc = "some string longer than std::string SSO";
}

How is this possible? Can I write a similar implementation for my own type? That would really save me from blowing my binary whenever I play the compile time orchestra.

EDIT: I should note that the question I wanted to ask is not how the compiler knows what size it should instantiate the initializer list with (as can be achieved via template argument deduction) but how it is then not a different type from all the other instantiations of an initializer list (hence why you can pass initializer lists of different sizes to the same function).

Stylite answered 25/5, 2022 at 12:7 Comment(5)
It's possible because the compiler makes it so. std::initializer_list cannot be implemented in standard C++.Biosphere
The size is fixed at compile-time though because you can only construct and use it when the number of elements is known.Latchet
The values you store in the std::initializer_list are like simple arrays under the hood. If you check the generated asm for your example (godbolt.org/z/vEoc46Pn9), you can see that your array is in the binary. You cannot implement it since std::initializer_list is a special class "bounded" to the compiler. Just like constexpr construt_at, you cannot implement that neither...Reactivate
It's similar to how you don't need to specify the size in int a[] = {1,2,3}; - the compiler knows.Dulsea
Think about how const char s[] = "Hello World"; works the same way and s decays to a simple pointer. Or const char *s = "Hello World";.Remediable
S
23

The thing is, std::initializer_list does not hold the objects inside itself. When you instantiate it, compiler injects some additional code to create a temporary array on the stack and stores pointers to that array inside the initializer_list. For what its worth, an initializer_list is nothing but a struct with two pointers (or a pointer and a size):

template <class T>
class initializer_list {
private:
  T* begin_;
  T* end_;
public:
  size_t size() const { return end_ - begin_; }
  T const* begin() const { return begin_; }
  T const* end() const { return end_; }

  // ...
};

When you do:

foo({2, 3, 4, 5, 6});

Conceptually, here is what is happening:

int __tmp_arr[5] {2, 3, 4, 5, 6};
foo(std::initializer_list{arr, arr + 5});

One minor difference being, the life-time of the array does not exceed that of the initializer_list.

Suzisuzie answered 25/5, 2022 at 12:16 Comment(0)
A
17

... whereas for arrays you always need to specify the size ...

You mean like for

int a[] = {2, 3, 2, 6, 7};

?

What I have a hard time wrapping my head around is that with C++ as a statically typed language, the memory layout (and size) of every must be fixed at compile time.

The size of the initializer list is just as fixed at compile time as the size of the array above - it's fixed because you wrote the braced expression {2, 3, 2, 6, 7} out explicitly before compiling.

How is this possible? Can I write a similar implementation for my own type?

You can't intercept the parsing of a braced-init-list, no. As you can see, the rules for handling list initialization are pretty specific.

However, the std::initializer_list is intended to be lightweight so you can use it directly. As the other answer says, you can consider it as a normal implicitly-sized array with an implicit conversion to a range-like view.

Ankney answered 25/5, 2022 at 12:20 Comment(0)
M
9

Just to add a few random musings of my own, std::initializer_list is actually an interesting animal - it's a kind of chimera, part STL, part compiler construct.

If you look in the appropriate STL header file, you will find a definition for it that defines the API. But the actual implementation is, in effect, built into the compiler, so when you write, say:

std::initializer_list <int> l = { 1, 2, 3, 4, 5 };

The compiler goes "a-ha! An initializer list (and a list of int's, to boot), I know what that is, I'll construct one". And so it does. There's no code to do that in the STL itself.

To put it another way, to the compiler, std::initializer_list is, in part, a native type. Only it's not, not completely, and as such it is unique a member of a very select club (please see comments).

Mathematician answered 25/5, 2022 at 12:27 Comment(2)
It's certainly special in this regard, but not unique. std::type_info is also constructed by special compiler magic. The new std::source_location is a bit different, because a static member function is used to request the magic, but it's still compiler magic.Retarder
@BenVoigt Interesting, thank you. The compiler obviously recognises these identifiers and pulls rabbits out of a hat when it sees them, just as it does when, for example, you write int x = 42;. Tweaked my answer to reflect your input.Mathematician
S
2

One misconception you seem to have is that you need a size known at compile time to allocate on the stack. In fact, there is no technical reason to disallow objects with size determined at runtime from on the stack. In fact, C explicitly allows it with Variable Length Arrays. There's another question on this topic, although it asks about C.

While I don't know of a sane C++ way to actually do manual stack allocation (alloca() being a bad idea and not really a C++ thing), nothing is stopping the compiler from doing it for you.

Stack allocation is brutally simple - others might correct me, but to my knowledge it boils down to simply increasing the value in the stack pointer register.

Smolensk answered 26/5, 2022 at 11:41 Comment(5)
I thought of that too, basically std::initializer list is using VLA's behind the scene, right? So what about the stackoverflow implication issues regarding the initializer list?Stylite
@Stylite not really VLAs, just allocating stuff on the stack. No need to be so specific. Also, what do you mean by "stack overflow implication"? There's numerous ways to overflow the stack, and no language is safe from them. At least not as long as they allow recursion.Smolensk
I mean that if you were to initialize an object using a huge initializer list with a lot of strings, then you could run into the risk of overflowing the stack.Stylite
And? Like I said, that's always a risk. Nothing different between initalizer lists and large arrays. Not to mention that the stack size is dependent on the OS. On Windows it's 1 MiB. On Linux it's usually 8 or 10 MiB, but it can be changed with an appropriate system call. Also, I don't think compilers actually check any sort of estimated stack size during compilation (there's an option to generate the data in GCC, but few use it).Smolensk
There is a technical reason to disallow VLAs: To skip a bunch of bounds checks for the stack pointer, a stack is followed by a page of protected memory. When a program attempts to access the protected area, you get a stack overflow. VLAs allow you to move the stack pointer past the protected area.Varietal

© 2022 - 2024 — McMap. All rights reserved.