Struct hack equivalent in C++
Asked Answered
B

6

24

The struct hack where you have an array of length 0 as the last member of a struct from C90 and C99 is well known, and with the introduction of flexible array members in C99, we even got a standardized way of using it with []. Unfortunately, C++ provides no such construct, and (at least with Clang 3.4), compiling a struct with either [0] or [] will yield a compilation warning with --std=c++11 -pedantic:

$ cat test.cpp 
struct hack {
  char filler;
  int things[0];
};
$ clang++ --std=c++11 -pedantic test.cpp
\test.cpp:3:14: warning: zero size arrays are an extension [-Wzero-length-array]
  int things[0];

and similarly

$ cat test.cpp 
struct fam {
  char filler;
  int things[];
};
$ clang++ --std=c++11 -pedantic test.cpp
\test.cpp:3:7: warning: flexible array members are a C99 feature [-Wc99-extensions]
  int things[];

My question then is this; say that I want to have a struct that contains an array of variable size as the last item in C++. What is the right thing to do given a compiler that supports both? Should I go with the struct hack [0] (which is a compiler extension), or the FAM [] (which is a C99 feature)? As far as I understand it, either will work, but I am trying to figure out which is the lesser evil?

Also, before people start suggesting keeping an int* to a separately allocated piece of memory in the struct instead, that is not a satisfactory answer. I want to allocate a single piece of memory to hold both my struct and the array elements. Using a std::vector also falls into the same category. If you wonder why I don't want to use a pointer instead, the R.'s answer to another question gives a good overview.

There have been some similar questions elsewhere, but none give an answer to this particular question:

Bimonthly answered 29/11, 2013 at 16:47 Comment(12)
I am not so sure what you are asking, the linked questions clearly say that there is no such thing in standard c++, so whatever you try is implementation defined, and as long as it works on all the implementations you care about then, pick whatever you want.Chalcocite
Use int things[1]; ? For implementations where the struct hack is no issue, there's no inherent problem if there's padding, and accessing the elements later on is perfectly fine. You only have a tad more work when allocating it if you do not want to waste the space of 1 element + the possible padding.Carmen
@nos: How do you access later elements without violating pointer arithmetic rules?Trojan
@Chalcocite I recognize that neither feature is valid C++ without extensions. What I am after is an argument for which of the two is better given that your compiler supports both?Bimonthly
what is the reason for:" I want to allocate a single piece of memory to hold both my struct and the array elements.". You don't tell us about that, and that's the real reason you don't want to use C++ std::vector.Ist
#19442048Unpack
@BenVoigt You use an implementation which supports the struct hack, that does allow you to go past the array, into the extra allocated space. Naturally, if your c++ implementation does not guarantee that, or you need to conform to standard c++, it's a no go. This seems somewhat reasonable, since the OP is anyway asking about non-conforming C++ features.Carmen
What is the filler member? Does this struct only contain a dynamic array?Dosage
The filler argument is just there to avoid the compiler complaining about a zero-size struct in the simplified example struct. It could have been an arbitrary number of other fields of various types, but that would have unnecessarily complicated the question.Bimonthly
@Jonhoo In practice, I hope that one of the members contains the number of additional elements.Gruel
@JamesKanze Yes, you're right. My point was only that the struct is not only a dynamic array, it also contains other fields. The length obviously has to be stored somewhere for a practical solution as you say.Bimonthly
Can't you use a pointer at the end, allocate a block that is bigger than necessary and placement new the struct into the block? the pointer can point to its own address plus oneMacomber
G
12

You can get more or less the same effect using a member function and a reinterpret_cast:

int* buffer() { return reinterpret_cast<int*>(this + 1); }

This has one major defect: it doesn't guarantee correct alignment. For example, something like:

struct Hack
{
    char size;
    int* buffer() { return reinterpret_cast<int*>(this + 1); }
};

is likely to return a mis-aligned pointer. You can work around this by putting the data in the struct in a union with the type whose pointer you are returning. If you have C++11, you can declare:

struct alignas(alignof(int)) Hack
{
    char size;
    int* buffer() { return reinterpret_cast<int*>(this + 1); }
};

(I think. I've never actually tried this, and I could have some details of the syntax wrong.)

This idiom has a second important defect: it does nothing to ensure that the size field corresponds to the actual size of the buffer, and worse, there is no real way of using new here. To correct this, somewhat, you can define a class specific operator new and operator delete:

struct alignas(alignof(int)) Hack
{
    void* operator new( size_t, size_t n );
    void operator delete( void* );
    Hack( size_t n );
    char size;
    int* buffer() { return reinterpret_cast<int*>(this + 1); }
};

The client code will then have to use placement new to allocate:

Hack* hack = new (20) Hack(20);

The client still has to repeat the size, but he cannot ignore it.

There are also techniques which can be used to prevent creating instances which aren't allocated dynamically, etc., to end up with something like:

struct alignas(alignof(int)) Hack
{
private:
    void operator delete( void* p )
    {
        ::operator delete( p );
    }
    //  ban all but dynamic lifetime (and also inheritance, member, etc.)
    ~Hack() = default;

    //  ban arrays
    void* operator new[]( size_t ) = delete;
    void operator delete[]( void* p ) = delete;
public:
    Hack( size_t n );
    void* operator new( size_t, size_t n )
    {
        return ::operator new( sizeof(Hack) + n * sizeof(int) );
    }
    char size;
    //  Since dtor is private, we need this.
    void deleteMe() { delete this; }
    int* buffer() { return reinterpret_cast<int*>(this + 1); }
};

Given the fundamental dangers of such a class, it is debatable if so many protective measures are necessary. Even with them, it's really only usable by someone who fully understands all of the constraints, and is carefully paying attention. In all but extreme cases, in very low level code, you'd just make the buffer a std::vector<int> and be done with it. In all but the lowest level code, the difference in performance would not be worth the risk and effort.

EDIT:

As a point of example, g++'s implementation of std::basic_string uses something very similar to the above, with a struct containing a reference count, the current size and the current capacity (three size_t), followed directly by the character buffer. And since it was written long before C++11 and alignas/alignof, something like std::basic_string<double> will crash on some systems (e.g. a Sparc). (While technically a bug, most people do not consider this a critical problem.)

Gruel answered 29/11, 2013 at 17:22 Comment(3)
why +1 it would be better to replace it with sizeof(Hack)? and about new(20) Hack(20); its very error prone, because you need type two times same value, better solution would be newMe(int) that return new(20) Hack(20);Plourde
@Plourde Good point: make the ctor and operator new private as well, and provide a static member function for the new as well. (As an excuse, I'll point out that it has been something like 20 years since I last used this technique. In practice, I would avoid it except in special cases like std::basic_string, where its use is an implementation detail, and well hidden from anyone other than the author.)Gruel
alignas(alignof(T)) can be abbreviated to alignas(T) per [dcl.align]/3.Microcline
E
10

This is C++, so templates are available:

template <int N>
struct hack {
    int filler;
    int thing [N];
};

Casting between different pointers to different instantiations will be the difficult issue, then.

Edington answered 29/11, 2013 at 17:1 Comment(4)
+1, though std::size_t would be a better choice for the template parameter type.Makeup
Very neat solution, but it requires N to be known at compile-time, which is not necessarily the case. Say I have a struct that holds a double for every user on a machine; that's something you can only determine at runtime.Bimonthly
@Jonhoo In fact, between the requirement that N be a constant, and the fact that each different size is a totally unrelated type, which means that you can't simply pass pointers around to it, this really doesn't solve the proble. It can be useful in some special cases, but that's about it.Gruel
Unfortunately no, it doesn't solve the problem. It is a neat way of solving this if the values for N are known at compile-time though, as it also enforces type safety!Bimonthly
N
8

The first thing that comes to mind is DON't, don't write C in C++. In 99.99% of the cases this hack is not needed, won't make any noticeable improvement in performance over just holding a std::vector and will complicate your life and that of the other maintainers of the project in which you deploy this.

If you want a standard compliant approach, provide a wrapper type that dynamically allocates a chunk of memory large enough to contain the hack (minus the array) plus N*sizeof(int) for the equivalent of the array (don't forget to ensure proper alighnment). The class would have accessors that map the members and the array elements to the correct location in memory.

Ignoring alignment and boiler plate code to make the interface nice and the implementation safe:

template <typename T>
class DataWithDynamicArray {
   void *ptr;
   int* array() {
      return static_cast<int*>(static_cast<char*>(ptr)+sizeof(T)); // align!
   }
public:
   DataWithDynamicArray(int size) : ptr() {
      ptr = malloc(sizeof(T) + sizeof(int)*size); // force correct alignment
      new (ptr) T();
   }
   ~DataWithDynamicArray() { 
      static_cast<T*>(ptr)->~T();
      free(ptr);
   }
// copy, assignment...
   int& operator[](int pos) {
       return array()[pos];
   }
   T& data() {
      return *static_cast<T*>(ptr);
    }
};

struct JustSize { int size; };
DataWithDynamicArray<JustSize> x(10);
x.data().size = 10
for (int i = 0; i < 10; ++i) {
    x[i] = i;
}

Now I would really not implement it that way (I would avoid implementing it at all!!), as for example the size should be a part of the state of DataWithDynamicArray...

This answer is provided only as an exercise, to explain that the same thing can be done without extensions, but beware this is just a toy example that has many issues including but not limited to exception safety or alignment (and yet is better than forcing the user to do the malloc with the correct size). The fact that you can does not mean that you should, and the real question is whether you need this feature and whether what you are trying to do is a good design at all or not.

Newly answered 29/11, 2013 at 17:41 Comment(5)
This is fairly similar to @JamesKanze's answer, except that it also adds a level of indirection, which I'd like to avoid. I like the fact that it maintains the array index semantics though. Another thing this discards is the fact that having [] or [0] in the struct informs anyone reading the code that there will in fact be data following the struct that is directly related to the struct (it even has a name).Bimonthly
@Jonhoo If I understand his solution correctly, his class DataWithDynamicArray is actually a wrapper around the pointer, and not the class itself.Gruel
@Jonhoo: This solution is very similar to James' even if less developed. Had I read his, I would not have added this one. There is no additional indirection (at least in the generated code), the main difference is that users in the James' approach have a nicer interface to access the set of initial members, but my solution can be used with any set of initial members (or different versions of it) while his would require writing a different type for each case. That and of course the fact that the solution here is only half bakedDosage
"In 99.99% of the cases this hack is not needed" - I disagree; if you are smart enough to know what structure hack is then you are smart enough to know when to use it. I would estimate the % of time a structure hack is useful to someone who thinks of it as a solution to a particular problem is 99.99%.Stereochromy
@hoodaticus: there are many features in the language, and many of them are for compatibility with previous versions of the language or C. Some of the older features have safer replacements and should be avoided. That's what the comment says. The fact that you know what it does does not mean that this is the proper tool for the problem. I can drive a screw with a hammer, but a screw driver is better and electric screw drivers are better even (if you don't strict constrains on the torque, then you need something else...) You need to know the tools, but that does not mean you should use allDosage
M
4

If you really you feel the need to use a hack, why not just use

struct hack {
  char filler;
  int things[1];
};

followed by

hack_p = malloc(sizeof(struct hack)+(N-1)*sizeof int));

Or don't even bother about the -1 and live with a little extra space.

Marya answered 29/11, 2013 at 20:28 Comment(1)
oops, just saw that @Carmen suggested the same thing in comments.... But I don't understand the objections.Marya
M
3

C++ does not have the concept of "flexible arrays". The only way to have a flexible array in C++ is to use a dynamic array - which leads you to use int* things. You will need a size parameter if you are attempting to read this data from a file so that you can create the appropriate sized array (or use a std::vector and just keep reading until you reach the end of the stream).

The "flexible array" hack keeps the spatial locality (that is has the allocated memory in a contiguous block to the rest of the structure), which you lose when you are forced to use dynamic memory. There isn't really an elegant way around that (e.g. you could allocate a large buffer, but you would have to make it sufficiently large enough to hold any number of elements you wanted - and if the actual data being read in was smaller than the buffer, there would be wasted space allocated).

Also, before people start suggesting keeping an int* to a separately allocated piece of memory in the struct instead, that is not a satisfactory answer. I want to allocate a single piece of memory to hold both my struct and the array elements. Using a std::vector also falls into the same category.

A non-standard extension is not going to work when you move to a compiler that does not support it. If you keep to the standard (e.g. avoid using compiler-specific hacks), you are less likely to run into these types of issues.

Makeup answered 29/11, 2013 at 16:52 Comment(19)
No, it's not equivalent to a pointer. The hack (in C) allows you to treat the final member as an arbitrary-sized array, without an extra level of indirection, as long as you've allocated enough memory for it. There is no equivalent hack in standard C++.Sculpt
Not the down voter but it is not the same as R..'s answer here covers.Insinuation
@MikeSeymour I think you are reading more into what I was saying than what I meant. I'll try to reword it to be more clear.Makeup
@ShafikYaghmour: The relevant points here are (1) spatial locality and (2) number of allocations and (3) whether space is taken up for a pointer... completely different from the points R's answer discusses.Trojan
@ZacHowland: I was just commenting on the first sentence - an array is not equivalent to a pointer. Now you've removed that sentence, the answer is fine.Sculpt
@BenVoigt huh? The answer covers 2 out of 3 of the aspects you mention.Insinuation
#19442048Unpack
@AlecTeal There are some non-standard extensions that support things like int i[0] as members of structs/classes.Makeup
@ZacHowland identity is very important, I linked that previous answer to remind us of that.Unpack
I've edited the question to make it somewhat clearer what I'm asking. Basically, I'm not so much interested in what I could do instead, but rather which approach would be more sensible given a compiler that has extension support for both.Bimonthly
@Jonhoo In general, the more sensible option is to stick to the one that is closest to the standard. Using non-standard extensions may make your life easier in the short term, only to cause you massive headaches down the road. That said, there are times where office politics can (sadly) overrule coding practices.Makeup
@MikeSeymour: But in this particular case (not the general one) it is equivalent to a dynamic array of int if you interpret the same value as filler and the rest as the array members. You just need to provide a couple of helper functions (member functions) that do the magic for you.Dosage
A pointer is not the way you'd solve this in C++; you'd use std::vector.Gruel
@JamesKanze I recommended that as well.Makeup
@ZacHowland I wholeheartedly agree, but in this case there isn't really a standard alternative. Using a pointer or std::vector is not so much an alternative as it is a completely different approach to the problem. I use them happily elsewhere in my code, but for the particular problem I am working on at the moment, using something like [] makes for a far cleaner solution.Bimonthly
@Jonhoo The reason there is not a standard alternative is because even in C this is a non-standard extension. You are kind of setting yourself up for an XY problem ("I want to solve it this way ... but how can I do that in C++"). The best solution would be to step back and think about what exactly you are designing. In most cases, you'll find that trying to force an approach on a problem will be like trying to fit a square peg in a round hole (it can be done, but it is messy); the better approach is usually to find the round peg.Makeup
Flexible array members are a standardized part of C99, not an extension as far a I've understood it?Bimonthly
@Jonhoo Correct, it was added in C99 (before that it was a non-standard extension - sorry for the confusion). But it is still not valid in standard C++ (and likely never will be).Makeup
It's more likely the standard will evolve to embrace the ways the language is used. People do not exist for standards; standards exist for people. This isn't Java, after all.Stereochromy
H
1

There is at least one advantage for flexible array members over zero length arrays when the compiler is clang.

struct Strukt1 {
    int fam[];
    int size;
};

struct Strukt2 {
    int fam[0];
    int size;
};

Here clang will error if it sees Strukt1 but won't error if it instead sees Strukt2. gcc and icc accept either without errors and msvc errors in either case. gcc does error if the code is compiled as C.

The same applies for this similar but less obvious example:

struct Strukt3 {
    int size;
    int fam[];
};

strukt Strukt4 {
    Strukt3 s3;
    int i;
};
Hayes answered 9/11, 2015 at 14:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.