Compile-time re-arrangement of data members?
Asked Answered
P

2

6

I was wondering about a possible way to make memory layout of a class to be more effective in templated code. As far as I know, Standard mandates data members of a class to be laid out in memory on order of their declaration. There might be possible padding done by the compiler to align the data members adding unnecessarily to the size of the class. The idea is to re-arrange data members declarations at compile time to minimize such padding. I did some searching, but couldn't find any info (most of the time people discuss packing compiler directives, which is not quite the same as I see it).

First, please consider the following (trivial, but repetitive and ugly) code (same code on ideone.com) (questions are below the code, feel free to skip right down to them):

#include <iostream>
#include <cstdint>

namespace so
{
template <typename Ta, typename Tb, typename Tc, std::size_t =
    ((sizeof(Ta) >= sizeof(Tb)) && (sizeof(Tb) >= sizeof(Tc))) ? 10 :
    ((sizeof(Ta) >= sizeof(Tc)) && (sizeof(Tc) >= sizeof(Tb))) ? 11 :
    ((sizeof(Tb) >= sizeof(Ta)) && (sizeof(Ta) >= sizeof(Tc))) ? 20 :
    ((sizeof(Tb) >= sizeof(Tc)) && (sizeof(Tc) >= sizeof(Ta))) ? 21 :
    ((sizeof(Tc) >= sizeof(Ta)) && (sizeof(Ta) >= sizeof(Tb))) ? 30 :
    ((sizeof(Tc) >= sizeof(Tb)) && (sizeof(Tb) >= sizeof(Ta))) ? 31 : 0>
struct foo {};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 10>
{
  Ta a;
  Tb b;
  Tc c;
  foo(Ta _a, Tb _b, Tc _c) : a{_a}, b{_b}, c{_c} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 11>
{
  Ta a;
  Tc c;
  Tb b;
  foo(Ta _a, Tb _b, Tc _c) : a{_a}, c{_c}, b{_b} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 20>
{
  Tb b;
  Ta a;
  Tc c;
  foo(Ta _a, Tb _b, Tc _c) : b{_b}, a{_a}, c{_c} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 21>
{
  Tb b;
  Tc c;
  Ta a;
  foo(Ta _a, Tb _b, Tc _c) : b{_b}, c{_c}, a{_a} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 30>
{
  Tc c;
  Ta a;
  Tb b;
  foo(Ta _a, Tb _b, Tc _c) : c{_c}, a{_a}, b{_b} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 31>
{
  Tc c;
  Tb b;
  Ta a;
  foo(Ta _a, Tb _b, Tc _c) : c{_c}, b{_b}, a{_a} {}
};

template <typename Ta, typename Tb, typename Tc>
struct bar: public foo<Ta, Tb, Tc>
{
 private:
  using base = foo<Ta, Tb, Tc>;
 public:
  bar() = default;
  using base::base;
};

template <typename Ta, typename Tb, typename Tc>
struct foobar
{
  Ta a;
  Tb b;
  Tc c;
  foobar() = default;
  foobar(Ta _a, Tb _b, Tc _c) : a{_a}, b{_b}, c{_c} {}
};
} //namespace so

int main()
{
 so::bar<std::uint16_t, std::uint32_t, std::uint16_t> bar{1, 2, 3};
 so::foobar<std::uint16_t, std::uint32_t, std::uint16_t> foobar{1, 2, 3};

 std::cout << sizeof(bar) << "\t" << sizeof(foobar) << std::endl;

 std::cout << bar.a << " : " << bar.b << " : " << bar.c << std::endl;
 std::cout << foobar.a << " : " << foobar.b << " : " << foobar.c << std::endl;

 return (0);
}

Program output:

8   12
1 : 2 : 3
1 : 2 : 3

Questions:

  1. Is there some well-known, compiler-independent way of solving such thing (Boost, maybe)?
  2. If no, is there some compiler-specific directives that will do such thing automatically (without data misalignment as with GCC's __atribute__((packed)))?
  3. Can this be done in a more generic way (possibly using variadic templates)?

Thanks in advance!

Pebbly answered 24/9, 2013 at 7:9 Comment(3)
Hint: I believe you could solve the repetition issue by using inheritance, which opens the door to variadic templates, however this will heavily rely on the compiler and the code may not be as elegant (aka, you will have to use a function to access the element, such as get<0>(foo)). I would also note that you are only looking at the sizes, and not the alignments, on some architectures double is 8 bytes long, but aligned on 4 bytes boundaries, so your computation might be sub-optimal.Goodin
@MatthieuM. Do you mean sort of recursive inheritance as we unroll the parameter pack, and use some kind of get<> to navigate through it? Alignment can be dealt with alignof()/std::alignment_of<>, I imagine - thanks for pointing that out.Pebbly
Actually, my main issue was that unfortunately you cannot use pack expansion in the attributes (annoying) and there are various ways to deal with it... and then I realized I already reviewed those ways when checking how std::tuple was implemented and that instead of trying to re-implement them using inheritance tricks I could simply reuse std::tuple :DGoodin
G
5

I believe I have a relatively simple variadic template solution.

The implementation will require a couple helpers though, so I will present it backward so you can get the gist of it first.

template <typename... Args>
class OptimizedLayout {
public:
    template <size_t I>
    auto at() -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

    template <size_t I>
    auto at() const -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

private:
    using Indexed = /**/; // pairs of sorted Args (by decreasing size)
                          // and their original index in the argument list

    using Storage = /*std::tuple<Indexed.first ...>*/;

    template <size_t I>
    using Index = /*index of element of Indexed whose .second is I*/;

    Storage _storage;
}; // class OptimizedLayout

The main benefit here is that changing how to pack elements will only affect how Indexed is defined, so you can easily improve on the algorithm. For now, I will just provide an equivalent of your template.


Disclaimer: the following code is untested, it may not even compile, let alone produce the correct result.

I. Generating indices.

The explanation can be found on the lounge, we can reuse it to generate a pack of pairs (type, index). To sort it we are going to use a MPL algorithm, so it is simpler to produce the pack as a MPL vector.

template <std::size_t... Is>
struct indices {};

template <std::size_t N, std::size_t... Is>
struct build_indices
  : build_indices<N-1, N-1, Is...> {};

template <std::size_t... Is>
struct build_indices<0, Is...> { using type = indices<Is...>; };

template <typename Tuple, typename Indices>
struct IndexedImpl;

template <typename... T, size_t... I>
struct IndexedImpl< std::tuple<T...>, indices<I...> > {
    using type = boost::mpl::vector< std::pair<T, I>... >;
};

template <typename Tuple>
using Indexed =
    IndexedImpl<Tuple, typename build_indices<std::tuple_size<Tuple>::value>::type>;

II. Sorting

In order to sort we are going to use the MPL sort algorithm, which operates on types.

struct GreaterSize {
    template <typename T, typename U>
    struct apply {
         using type = boost::mpl::bool_<sizeof(T) > sizeof(U)>;
    };
};

template <typename T>
struct TupleInserter {
    using state = T;

    template <typename Seq, typename E>
    struct apply;
};

template <typename T>
template <typename... Args, typename E>
struct TupleInserter<T>::apply<std::tuple<Args...>, E> {
    using type = std::tuple<Args..., E>;
};

template <typename Tuple>
using SortedSequence = boost::mpl::sort<
    typename Indexed<Tuple>::type,
    GreaterSize,
    TupleInserter
>;

III. Computing the Storage class

Now, we only need to compute the storage class which is done by extracting the first element of each pair. Interestingly, pattern matching can really assist here.

template <typename T>
struct TupleFirstExtractor;

template <typename... T, size_t... I>
struct TupleFirstExtractor<std::tuple<std::pair<T, I>...>> {
    using type = std::tuple<T...>;
}; 

IV. Computing the Index solver

template <typename Tuple, size_t Needle, size_t Acc>
struct IndexFinderImpl;

template <typename H, size_t h, typename... Tail, size_t Needle, size_t Acc>
struct IndexFinderImpl<std::tuple<std::pair<H,h>, Tail...>, Needle, Acc>:
    IndexFinderImpl<std::tuple<Tail...>, Needle, Acc+1> {};

template <typename H, typename... Tail, size_t Needle, size_t Acc>
struct IndexFinderImpl<std::tuple<std::pair<H, Needle>, Tail...>, Needle, Acc>:
    std::integral_constant<size_t, Acc> {};

V. Putting it all together

And now we wire up everything:

template <typename... Args>
class OptimizedLayout {
public:
    template <size_t I>
    auto at() -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

    template <size_t I>
    auto at() const -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

private:
    using Indexed = typename SortedSequence<std::tuple<Args...>>::type;

    using Storage = typename TupleFirstExtractor<Indexed>::type;

    template <size_t I>
    using Index = IndexFinderImpl<Indexed, I, 0>;

    Storage _storage;
}; // class OptimizedLayout

Hint: I advise using a specialized namespace to hold all the helpers. Whilst they could be defined within the template, it is easier to define them outside since they do not depend on Args..., however you would need to isolate them to avoid clashes with other parts of your program.

Goodin answered 24/9, 2013 at 8:39 Comment(5)
+1 Thanks for such a detailed break down. I am going to look into it as soon as I have a bit more time and try to implement it fully. I think, after that I should be able to update the question and accept your answer. And once again, thanks for pointing out the deficiency of sizeof() use in this case. You are absolutely right, it should be alignof() instead.Pebbly
@PetrBudnik: note that after reading R. Martinho Fernandes' blog serie I realized that there is no guarantee that std::tuple lays the elements in the order you pass them. So I would actually recommend that you take his implementation; though I'll let this answer here because it is small enough to be easily groked.Goodin
Yes, I read the part comparing libc++ and libstdc++ std::tuple's layout (and about GCC 4.7.2 bug). But you made a comprehensive, swift post, covering the logic of it in detail and giving a few useful references. And I actually like that it leaves room for me to try things. I just want to give it justice by actually doing it (and maybe asking a few questions in the process).Pebbly
But, really, "you got me at" alignof(). Made me think about something like char[10] passed as a template parameter and that, in fact, your "sizeof() might be sub-optimal" line from the original comment was far more than generous ;).Pebbly
@PetrBudnik: Actually, most of the times you can get away with ordering the elements by decreasing size, so even though it's not perfect, it is still pretty reliable.Goodin
O
2

Take a look at this series of blog posts by R. Martinho Fernandes : Link.

It outlines optimal packing of tuples. You can use such a "packed" tuple as the data storage for you class, and provide accessors hiding the get<0>() style tuple element access.

Olcott answered 24/9, 2013 at 8:27 Comment(4)
Very interesting serie, however a quick outline of the challenges/solutions would be nice to flesh out this answer.Goodin
yes, well, I'm not the author of those posts, and I wouldn't like to present the solution as mine... maybe I should've posted this as a comment instead...Olcott
I certainly don't mind it being an answer, it was a most interesting read and actually points toward a flaw in my own answer: I had blindly assumed that std::tuple would be layed out in order which is unfortunately not portable.Goodin
+1 Interesting read, thanks for sharing. Going to go through the details as soon as I have a bit more time.Pebbly

© 2022 - 2024 — McMap. All rights reserved.