How to reduce boilerplate currently necessary for serialization
Asked Answered
P

5

29

Our software is abstracting away hardware, and we have classes that represent this hardware's state and have lots of data members for all properties of that external hardware. We need to regularly update other components about that state, and for that we send protobuf-encoded messages via MQTT and other messaging protocols. There are different messages that describe different aspects of the hardware, so we need to send different views of the data of those classes. Here's a sketch:

struct some_data {
  Foo foo;
  Bar bar;
  Baz baz;
  Fbr fbr;
  // ...
};

Let's assume we need to send one message containing foo and bar, and one containing bar and baz. Our current way of doing this is a lot of boiler-plate:

struct foobar {
  Foo foo;
  Bar bar;
  foobar(const Foo& foo, const Bar& bar) : foo(foo), bar(bar) {}
  bool operator==(const foobar& rhs) const {return foo == rhs.foo && bar == rhs.bar;}
  bool operator!=(const foobar& rhs) const {return !operator==(*this,rhs);}
};

struct barbaz {
  Bar bar;
  Baz baz;
  foobar(const Bar& bar, const Baz& baz) : bar(bar), baz(baz) {}
  bool operator==(const barbaz& rhs) const {return bar == rhs.bar && baz == rhs.baz;}
  bool operator!=(const barbaz& rhs) const {return !operator==(*this,rhs);}
};

template<> struct serialization_traits<foobar> {
  static SerializedFooBar encode(const foobar& fb) {
    SerializedFooBar sfb;
    sfb.set_foo(fb.foo);
    sfb.set_bar(fb.bar);
    return sfb;
  }
};

template<> struct serialization_traits<barbaz> {
  static SerializedBarBaz encode(const barbaz& bb) {
    SerializedBarBaz sbb;
    sfb.set_bar(bb.bar);
    sfb.set_baz(bb.baz);
    return sbb;
  }
};

This can then be sent:

void send(const some_data& data) {
  send_msg( serialization_traits<foobar>::encode(foobar(data.foo, data.bar)) );
  send_msg( serialization_traits<barbaz>::encode(barbaz(data.foo, data.bar)) );
}

Given that the data sets to be sent are often much larger than two items, that we need to decode that data, too, and that we have tons of these messages, there is a lot more boilerplate involved than what's in this sketch. So I have been searching for a way to reduce this. Here's a first idea:

typedef std::tuple< Foo /* 0 foo */
                  , Bar /* 1 bar */
                  > foobar;
typedef std::tuple< Bar /* 0 bar */
                  , Baz /* 1 baz */
                  > barbaz;
// yay, we get comparison for free!

template<>
struct serialization_traits<foobar> {
  static SerializedFooBar encode(const foobar& fb) {
    SerializedFooBar sfb;
    sfb.set_foo(std::get<0>(fb));
    sfb.set_bar(std::get<1>(fb));
    return sfb;
  }
};

template<>
struct serialization_traits<barbaz> {
  static SerializedBarBaz encode(const barbaz& bb) {
    SerializedBarBaz sbb;
    sfb.set_bar(std::get<0>(bb));
    sfb.set_baz(std::get<1>(bb));
    return sbb;
  }
};

void send(const some_data& data) {
  send_msg( serialization_traits<foobar>::encode(std::tie(data.foo, data.bar)) );
  send_msg( serialization_traits<barbaz>::encode(std::tie(data.bar, data.baz)) );
}

I got this working, and it cuts the boilerplate considerably. (Not in this small example, but if you imagine a dozen data points being encoded and decoded, a lot of the repeating listings of data members disappearing makes a lot of difference). However, this has two disadvantages:

  1. This relies on Foo, Bar, and Baz being distinct types. If they are all int, we need to add a dummy tag type to the tuple.

    This can be done, but it does make this whole idea considerably less appealing.

  2. What's variable names in the old code becomes comments and numbers in the new code. That's pretty bad, and given that it is likely that a bug confusing two members is likely present in the encoding as well as in the decoding, it can't be caught in simple unit tests, but needs test components created through other technologies (so integration tests) for catching such bugs.

    I have no idea how to fix this.

Has anybody a better idea how to reduce the boilerplate for us?

Note:

  • For the time being, we're stuck with C++03. Yes, you read that right. For us, it's std::tr1::tuple. No lambda. And no auto either.
  • We have a tons of code employing those serialization traits. We cannot throw away the whole scheme and do something completely different. I am looking for a solution to simplify future code fitting into the existing framework. Any idea that requires us to re-write the whole thing will very likely be dismissed.
Prosser answered 14/5, 2018 at 20:22 Comment(31)
Sounds like you want to write a program that reads a file in a simple language and then generates all the c++ boilerplate for you, that you then compile. Code generator for the win. A simple yacc/bison parser with a simple grammar may even do.Derek
@JesperJuhl: This is indeed one of the solutions we have been looking at. I'd rather find a solution in C++, though, then add yet another code generator to our build process that people will have to maintain when the current programmers will all long since have retired...Prosser
Since the messages are protobuf-encoded, why not generate the code with protobuf?Precipitation
@Jens: Do you have any pointers on how to write a protobuf backend? However, this was one of the many simplifications of this question: Our messages are mostly protobuf-encoded. Currently. We already have a few projects where we send JSON messages. And who knows what we'll do in the future...Prosser
"... maintain when the current programmers will all long since have retired ..." - You are one of those "current programmers". When you have retired, you no longer need to care ;-) (please note the winking smiley).Derek
@Prosser I never needed to write a backend. What do you want to achieve with a custom backend? Messages can be serialized and desialized to/from e.g. streams and then be sent with any messaging library you want. I used it with zeroMQ. If I wanted JSON as a format I would use a JSON library.Precipitation
You could automate the generation of the boilerplate with macros. Is that unacceptable?Golanka
@Jesper: I love what we do. :-)Prosser
@Prosser I know the feeling :)Derek
@Jens: Assume SerializedFooBar to be a protobuf-generated type. Now what?Prosser
@Prosser Maybe I don't understand the question, but you can obj.SerializeToOstream(&output) or obj.ParseFromIstream(&input) any SerializedFooBar obj;. What do you expect?Precipitation
@jhx: Macros aren't nice, but when they reduce boilerplate, they're acceptable.Prosser
@Jens: This question is about the translation between specific sets of data members and protobuf-generated types. (Or other serializable types.) These serializable types are then indeed serialized according to their spec.Prosser
Does your pre-processor understand variable arguments for macros syntax? Are you able to use P99?Golanka
@jxh: Unfortunately not.Prosser
How was it your system decided to not create custom encoder for some_data itself?Golanka
@jxh: As I wrote, there might be any number of messages sent carrying data copied from different members of some_data. But that's simplified. There are also messages that combine members of different classes into one message.Prosser
Are you able to change the SerializedXY class? For instance, could you have Serialised<X,Y> instead? If so, what interface would you like this class to have? I have thought of a possible solution using boost fusion... are you allowed to use boost?Historicity
Umm, one thing I seem to recall is that by design, a Protobuf message payload can be constructed from concatenations of payloads. Can you exploit that somehow?Durwood
Why do you even need to combine the messages, e.g. why is foo and bar combined to a foobar message? Are this data sets somehow related? To me it looks more like the boilerplate is just an unnecessary combination of data?! Also, are you able to slightly alter the data structures (struct Foo, Bar etc.) e.g. add a function to it? And how do you decode the messages?Hutchison
@jhx: Can you be more specific? What system? What encoder?Prosser
@linuxfever: The Serialized... classes are generated. I have no control over them at all.Prosser
@IwillnotexistIdonotexist: I dunno. You tell me!Prosser
@user1810087: How data is structured in our applications is a result of implementation decisions. How the published data is structured is a result of project-specific decisions. Often, this data needs to correspondent to external interfaces we have no control over. We are (correctly, IMO) requested to abstract away our internal architecture and translate between that and many different external demands.Prosser
I see. Can you elaborate a bit on problem #1? If all members are int, std::get<0>, std::get<1>... will still fetch the right element, right? Also, for problem #2, why can't we write our own get<T> function where T can be the name of your type (Foo, Bar, etc)?Historicity
The problem is that serialization_traits<foo> and serialization_traits<bar> are the same type, when foo and bar are tuples with the same lists of types. That those type lists are semantically different doesn't matter for the syntax.Prosser
It looks like you are basically trying to write a generalization layer that will abstract away the numerous SerializedXY types, allowing access to them through a simplified interface (your choice being the specializations of serialization_traits). Is that a fair summary? From this perspective, the stuff you've tried is useful as an example of what you are looking for, but more useful would be information about these SerializedXY types. Could you add to your question some information about them, such as how they are generated (why this cannot change) and what their public interface is?Groff
@JaMiT: The Serialized... types are generated from some IDL. Currently, most of them are protobuf-generated while some are JSON-containers, but that might change. We want to keep them out of our code, as we have little control over what they look like and because they might change. That's one of the reasons for this translation layer for copying between our internal data and them.Prosser
There is a difference between "little control" and "no control". If you had no control, then perhaps one day the SerializedFooBar class would rename its set_foo member to simply foo while the other SerializedFooY classes retain set_foo. Is this a possibility? If so, that is critical information in that it invalidates some approaches. If not, that is some information about their public interface (as I requested). What do you have control over? What can be assumed about them? Why do they have this nice uniform SerializedXY naming scheme? Is that subject to outside changes?Groff
I would like to acknowledge that "generated from some IDL" does address one of the clarifications I requested. However, I requested the clarification be put into the question, not buried in the comments... (While I'm commenting, I'll add: who generates these type definitions?)Groff
@JaMiT: Well, the Serialized... types are generated from our IDL, so we do have some control. However, they aren't generated directly (currently we generate protobuf files, from which C++ is generated) and their exact interface depends on the interface which might change any time. So, yes, set_foo() might change. That's why we have the serialization traits, after all: they are supposed to isolate our code from those external interfaces. I just want it a little bit more declarative, and less repetitive.Prosser
H
7

I will build on your proposed solution, but use boost::fusion::tuples instead (assuming that is allowed). Let's assume your data types are

struct Foo{};
struct Bar{};
struct Baz{};
struct Fbr{};

and your data is

struct some_data {
    Foo foo;
    Bar bar;
    Baz baz;
    Fbr fbr;
};

From the comments, I understand that you have no control over the SerialisedXYZ classes but they do have a certain interface. I will assume that something like this is close enough(?):

struct SerializedFooBar {

    void set_foo(const Foo&){
        std::cout << "set_foo in SerializedFooBar" << std::endl;
    }

    void set_bar(const Bar&){
        std::cout << "set_bar in SerializedFooBar" << std::endl;
    }
};

// another protobuf-generated class
struct SerializedBarBaz {

    void set_bar(const Bar&){
        std::cout << "set_bar in SerializedBarBaz" << std::endl;
    }

    void set_baz(const Baz&){
        std::cout << "set_baz in SerializedBarBaz" << std::endl;
    }
};

We can now reduce the boilerplate and limit it to one typedef per datatype-permutation and one simple overload for each set_XXX member of the SerializedXYZ class, as follows:

typedef boost::fusion::tuple<Foo, Bar> foobar;
typedef boost::fusion::tuple<Bar, Baz> barbaz;
//...

template <class S>
void serialized_set(S& s, const Foo& v) {
    s.set_foo(v);
}

template <class S>
void serialized_set(S& s, const Bar& v) {
    s.set_bar(v);
}

template <class S>
void serialized_set(S& s, const Baz& v) {
    s.set_baz(v);
}

template <class S, class V>
void serialized_set(S& s, const Fbr& v) {
    s.set_fbr(v);
}
//...

The good thing now is that you do not need to specialise your serialization_traits anymore. The following makes use of the boost::fusion::fold function, which I assume is OK to use in your project:

template <class SerializedX>
class serialization_traits {

    struct set_functor {

        template <class V>
        SerializedX& operator()(SerializedX& s, const V& v) const {
            serialized_set(s, v);
            return s;
        }
    };

public:

    template <class Tuple>
    static SerializedX encode(const Tuple& t) {
        SerializedX s;
        boost::fusion::fold(t, s, set_functor());
        return s;
    }
};

And here are some examples of how it works. Notice that if someone tries to tie a data member from some_data that is not compliant with the SerializedXYZ interface, the compiler will inform you about it:

void send_msg(const SerializedFooBar&){
    std::cout << "Sent SerializedFooBar" << std::endl;
}

void send_msg(const SerializedBarBaz&){
    std::cout << "Sent SerializedBarBaz" << std::endl;
}

void send(const some_data& data) {
  send_msg( serialization_traits<SerializedFooBar>::encode(boost::fusion::tie(data.foo, data.bar)) );
  send_msg( serialization_traits<SerializedBarBaz>::encode(boost::fusion::tie(data.bar, data.baz)) );
//  send_msg( serialization_traits<SerializedFooBar>::encode(boost::fusion::tie(data.foo, data.baz)) ); // compiler error; SerializedFooBar has no set_baz member
}

int main() {

    some_data my_data;
    send(my_data);
}

Code here

EDIT:

Unfortunately, this solution does not tackle problem #1 of the OP. To remedy this, we can define a series of tags, one for each of your data members and follow a similar approach. Here are the tags, along with the modified serialized_set functions:

struct foo_tag{};
struct bar1_tag{};
struct bar2_tag{};
struct baz_tag{};
struct fbr_tag{};

template <class S>
void serialized_set(S& s, const some_data& data, foo_tag) {
    s.set_foo(data.foo);
}

template <class S>
void serialized_set(S& s, const some_data& data, bar1_tag) {
    s.set_bar1(data.bar1);
}

template <class S>
void serialized_set(S& s, const some_data& data, bar2_tag) {
    s.set_bar2(data.bar2);
}

template <class S>
void serialized_set(S& s, const some_data& data, baz_tag) {
    s.set_baz(data.baz);
}

template <class S>
void serialized_set(S& s, const some_data& data, fbr_tag) {
    s.set_fbr(data.fbr);
}

The boilerplate is again limited to one serialized_set per data member and scales linearly, similarly to my previous answer. Here is the modified serialization_traits:

// the serialization_traits doesn't need specialization anymore :)
template <class SerializedX>
class serialization_traits {

    class set_functor {

        const some_data& m_data;

    public:

        typedef SerializedX& result_type;

        set_functor(const some_data& data)
        : m_data(data){}

        template <class Tag>
        SerializedX& operator()(SerializedX& s, Tag tag) const {
            serialized_set(s, m_data, tag);
            return s;
        }
    };

public:

    template <class Tuple>
    static SerializedX encode(const some_data& data, const Tuple& t) {
        SerializedX s;
        boost::fusion::fold(t, s, set_functor(data));
        return s;
    }
};

and here is how it works:

void send(const some_data& data) {

    send_msg( serialization_traits<SerializedFooBar>::encode(data,
    boost::fusion::make_tuple(foo_tag(), bar1_tag())));

    send_msg( serialization_traits<SerializedBarBaz>::encode(data,
    boost::fusion::make_tuple(baz_tag(), bar1_tag(), bar2_tag())));
}

Updated code here

Historicity answered 13/6, 2018 at 17:4 Comment(7)
This looks very good to me. Something like the idea to use boost::fusion and its fold() was exactly what I was after. Your serialized_set() functions reduce the boilerplate to the absolute minimum (invoke set_foo()) with very little syntactic overhead to wrap those calls generically. I am not yet sure whether this solution scales recursively, and since I am currently up to my neck in two other things I will have to test this later. But since you even provided a link to compiled code, I have very few qualms about giving you the award. Thank you for your time and effort!Prosser
I am very glad I could help and really hope you can find this helpful. The bounty reward would be nice, but it's not the end of the world if you can't make it :)Historicity
From what I can see, this only looks at the members' types, and thus fails when there are two members of the same type. (Imagine bar and baz being both of the same type.) That's #1 in my question. (See here.)Prosser
@sbi: no problem. I'm at work now, but I'll have a look in the evening. The example you gave is helpful; hopefully there is an easy fix. speak soonHistoricity
@sbi: ok, back at it. Do you have any control over your some_data classsHistoricity
Let us continue this discussion in chat.Historicity
So the discussion in the chat came down to this, which I am happy enough with.Prosser
S
13

In my opinion, the best all-around solution is an external C++ code generator in a scripting language. It has the following advantages:

  • Flexibility: it allows you to change the generated code at any time. This is extremely good for several sub-reasons:

    • Readily fix bugs in all old supported releases.
    • Use new C++ features if you move to C++11 or later in the future.
    • Generate code for a different language. This is very, very useful (specially if your organization is big and/or you have many users). For instance, you could output a small scripting library (e.g. Python module) that can be used as a CLI tool to interface with the hardware. In my experience, this was very liked by hardware engineers.
    • Generate GUI code (or GUI descriptions, e.g. in XML/JSON; or even a web interface) -- useful for people using the final hardware and testers.
    • Generation of other kind of data. For instance, diagrams, statistics, etc. Or even the protobuf descriptions themselves.
  • Maintenance: it will be easier to maintain than in C++. Even if it is written in a different language, it is typically easier to learn that language than have a new C++ developer dive into C++ template metaprogramming (specially in C++03).

  • Performance: it can easily reduce the compilation time of the C++ side (since you can output very simple C++ -- even plain C). Of course, the generator may offset this advantage. In your case, this may not apply, since it looks like you cannot change the client code.

I have used that approach in a couple of projects/systems and it turned out quite nicely. Specially the different alternatives for using the hardware (C++ lib, Python lib, CLI, GUI...) can be very appreciated.


Side note: if part of the generation requires parsing already existing C++ code (e.g. headers with data types to be serialized, like in OP's case with the Serialized types); then a very nice solution is using LLVM/clang's tooling to do so.

In a particular project I worked on, we had to serialize dozens of C++ types automatically (that were subject to change at any time by users). We managed to generate automatically the code for it by just using the clang Python bindings and integrate it in the build process. While the Python bindings did not expose all the AST details (at the time, at least), they were enough for generating the required serialization code for all our types (which included templated classes, containers, etc.).

Stadia answered 14/5, 2018 at 20:56 Comment(15)
#50338943Prosser
@sbi: I see. Well, this answer tries to give you an overview of why this route can be better than a C++ solution. In any case, rejecting a tool because there are too many tools (or build steps) already in the project is not a very solid argument (in my opinion) -- the more complex a system is, the more involved managing its complexity is. Trying to fit all the complexity into the C++03 code when it could easily be outside of it is not a good idea long-term, in my experience :-)Stadia
We do use several code generation tools already. All of them are a hassle. When someone new comes to the team, they have to install all kinds of arbitrary tools that are stored in different repos (like a pip repo for python stuff) than our code. Or some developer updates a tool, but forgets to update that one Jenkins build slave, making Jenkins job fail seemingly randomly... Essentially, everything that prevents you from doing a three-step bootstrap (1. check out code, 2. install build tool, 3. make a release) will sooner or later cause trouble. The less, the better.Prosser
@sbi: That is an infrastructure problem, the tools are not at fault. Don't get me wrong, I fully agree with your last point: building should be automated and deterministic as possible; otherwise you end up with a mess and human errors. However, a code generator is an easy step to automate in a build script -- nobody should be running it manually nor care that it is there. Specially a code generator that only requires information available to the C++ compiler and not external data.Stadia
@sbi: (cont'd). I have experienced your pain a few times, so I understand how you feel. In the end, the solution was to restructure the system, and after that is done, everything is a breeze. But it has to be done, and typically many organizations do not have the resources to do so (or don't acknowledge it as a big issue, which may be fair or not). Otherwise, individual devs/teams may start to make workarounds or suboptimal choices in their workflow to avoid the root problem.Stadia
This isn't about manually running code generators. This is about having to install them, having to install all dependencies, having to keep them updated.Prosser
@sbi: I am sorry, but both are the same issue. Running the tools implies setting them up. Otherwise, you cannot reliably reproduce builds, run tests, CI/CD, etc.Stadia
Write your codegen in (for example) Python. Check in the codegen script(s) alongside the rest of the code (or the rest of the build system). No external tools, no installation.Haematopoiesis
@sbi: No need for "sigh"s here. If you don't want your questions answered, don't ask them. As simple as that :-) Now, ignoring your unneeded aggressiveness, you are disregarding a good solution (validated by its success in huge, complex projects) because a "majority there" dislikes "installing tools". That simply makes no sense at all (setting up your dev. area should be automated, as explained) and it is not even a valid argument (technical solutions don't care what you "like"). If you can't use this solution, ignore it; but don't try to downplay it because you dislike "installing tools".Stadia
@Useless: This misses that any non-trivial Python program needs other packages, which need to be installed on any machine they're supposed to run.Prosser
@Acorn: I have no idea what I wrote anymore, but I don't know why you are even keeping discussing it. I have given my opinion on the matter, and you failed to provide convincing counter-arguments. What's wrong with letting it rest?Prosser
@sbi: The answers in StackOverflow are not meant to help you in particular, but everyone reading. In other words: we are not here to convince you of anything. Therefore, no one needs to provide "convincing counter-arguments" to your opinion. Now, if you question an answer on technical grounds, then I am supposed to answer your concerns, as I did.Stadia
@Prosser - IME, you're factually wrong. I've written perfectly functional Python codegen before with no external packages, generating multiple language bindings for a pre-existing wire protocol. The codegen scripts were checked-in alongside the source, ran as part of the build, and it wasn't even difficult.Haematopoiesis
@Useless: Agreed. Exactly same experience here: Python codegen, checked-in, running on build, using whatever Python version was packaged by the Linux distribution. Even worked in OS X with their bundled Python. Actually, Python is a very nice platform to write entire build systems in (e.g. Meson, SCons...).Stadia
@Useless: Shrug. The tools here use external packages, because developers clearly preferred to not to have to write their own parsers. And while I am no Python expert at all, I know that the Python art of importing the whole universe has even found its way into an XKCD, so I assume what I see in our tools might be the norm. YMMV, and you are free to disagree, of course. Here, however, people dislike having to write their own parser frameworks as much as about having to remember every last Jenkins build slave when they update some tool.Prosser
H
7

I will build on your proposed solution, but use boost::fusion::tuples instead (assuming that is allowed). Let's assume your data types are

struct Foo{};
struct Bar{};
struct Baz{};
struct Fbr{};

and your data is

struct some_data {
    Foo foo;
    Bar bar;
    Baz baz;
    Fbr fbr;
};

From the comments, I understand that you have no control over the SerialisedXYZ classes but they do have a certain interface. I will assume that something like this is close enough(?):

struct SerializedFooBar {

    void set_foo(const Foo&){
        std::cout << "set_foo in SerializedFooBar" << std::endl;
    }

    void set_bar(const Bar&){
        std::cout << "set_bar in SerializedFooBar" << std::endl;
    }
};

// another protobuf-generated class
struct SerializedBarBaz {

    void set_bar(const Bar&){
        std::cout << "set_bar in SerializedBarBaz" << std::endl;
    }

    void set_baz(const Baz&){
        std::cout << "set_baz in SerializedBarBaz" << std::endl;
    }
};

We can now reduce the boilerplate and limit it to one typedef per datatype-permutation and one simple overload for each set_XXX member of the SerializedXYZ class, as follows:

typedef boost::fusion::tuple<Foo, Bar> foobar;
typedef boost::fusion::tuple<Bar, Baz> barbaz;
//...

template <class S>
void serialized_set(S& s, const Foo& v) {
    s.set_foo(v);
}

template <class S>
void serialized_set(S& s, const Bar& v) {
    s.set_bar(v);
}

template <class S>
void serialized_set(S& s, const Baz& v) {
    s.set_baz(v);
}

template <class S, class V>
void serialized_set(S& s, const Fbr& v) {
    s.set_fbr(v);
}
//...

The good thing now is that you do not need to specialise your serialization_traits anymore. The following makes use of the boost::fusion::fold function, which I assume is OK to use in your project:

template <class SerializedX>
class serialization_traits {

    struct set_functor {

        template <class V>
        SerializedX& operator()(SerializedX& s, const V& v) const {
            serialized_set(s, v);
            return s;
        }
    };

public:

    template <class Tuple>
    static SerializedX encode(const Tuple& t) {
        SerializedX s;
        boost::fusion::fold(t, s, set_functor());
        return s;
    }
};

And here are some examples of how it works. Notice that if someone tries to tie a data member from some_data that is not compliant with the SerializedXYZ interface, the compiler will inform you about it:

void send_msg(const SerializedFooBar&){
    std::cout << "Sent SerializedFooBar" << std::endl;
}

void send_msg(const SerializedBarBaz&){
    std::cout << "Sent SerializedBarBaz" << std::endl;
}

void send(const some_data& data) {
  send_msg( serialization_traits<SerializedFooBar>::encode(boost::fusion::tie(data.foo, data.bar)) );
  send_msg( serialization_traits<SerializedBarBaz>::encode(boost::fusion::tie(data.bar, data.baz)) );
//  send_msg( serialization_traits<SerializedFooBar>::encode(boost::fusion::tie(data.foo, data.baz)) ); // compiler error; SerializedFooBar has no set_baz member
}

int main() {

    some_data my_data;
    send(my_data);
}

Code here

EDIT:

Unfortunately, this solution does not tackle problem #1 of the OP. To remedy this, we can define a series of tags, one for each of your data members and follow a similar approach. Here are the tags, along with the modified serialized_set functions:

struct foo_tag{};
struct bar1_tag{};
struct bar2_tag{};
struct baz_tag{};
struct fbr_tag{};

template <class S>
void serialized_set(S& s, const some_data& data, foo_tag) {
    s.set_foo(data.foo);
}

template <class S>
void serialized_set(S& s, const some_data& data, bar1_tag) {
    s.set_bar1(data.bar1);
}

template <class S>
void serialized_set(S& s, const some_data& data, bar2_tag) {
    s.set_bar2(data.bar2);
}

template <class S>
void serialized_set(S& s, const some_data& data, baz_tag) {
    s.set_baz(data.baz);
}

template <class S>
void serialized_set(S& s, const some_data& data, fbr_tag) {
    s.set_fbr(data.fbr);
}

The boilerplate is again limited to one serialized_set per data member and scales linearly, similarly to my previous answer. Here is the modified serialization_traits:

// the serialization_traits doesn't need specialization anymore :)
template <class SerializedX>
class serialization_traits {

    class set_functor {

        const some_data& m_data;

    public:

        typedef SerializedX& result_type;

        set_functor(const some_data& data)
        : m_data(data){}

        template <class Tag>
        SerializedX& operator()(SerializedX& s, Tag tag) const {
            serialized_set(s, m_data, tag);
            return s;
        }
    };

public:

    template <class Tuple>
    static SerializedX encode(const some_data& data, const Tuple& t) {
        SerializedX s;
        boost::fusion::fold(t, s, set_functor(data));
        return s;
    }
};

and here is how it works:

void send(const some_data& data) {

    send_msg( serialization_traits<SerializedFooBar>::encode(data,
    boost::fusion::make_tuple(foo_tag(), bar1_tag())));

    send_msg( serialization_traits<SerializedBarBaz>::encode(data,
    boost::fusion::make_tuple(baz_tag(), bar1_tag(), bar2_tag())));
}

Updated code here

Historicity answered 13/6, 2018 at 17:4 Comment(7)
This looks very good to me. Something like the idea to use boost::fusion and its fold() was exactly what I was after. Your serialized_set() functions reduce the boilerplate to the absolute minimum (invoke set_foo()) with very little syntactic overhead to wrap those calls generically. I am not yet sure whether this solution scales recursively, and since I am currently up to my neck in two other things I will have to test this later. But since you even provided a link to compiled code, I have very few qualms about giving you the award. Thank you for your time and effort!Prosser
I am very glad I could help and really hope you can find this helpful. The bounty reward would be nice, but it's not the end of the world if you can't make it :)Historicity
From what I can see, this only looks at the members' types, and thus fails when there are two members of the same type. (Imagine bar and baz being both of the same type.) That's #1 in my question. (See here.)Prosser
@sbi: no problem. I'm at work now, but I'll have a look in the evening. The example you gave is helpful; hopefully there is an easy fix. speak soonHistoricity
@sbi: ok, back at it. Do you have any control over your some_data classsHistoricity
Let us continue this discussion in chat.Historicity
So the discussion in the chat came down to this, which I am happy enough with.Prosser
T
3

What you want is something that's tuple-like but not an actual tuple. Assuming that all tuple_like classes implement tie() which basically just ties their members, here's my hypothetical code:

template<typename T> struct tuple_like {
    bool operator==(const T& rhs) const {
        return this->tie() == rhs.tie();
    }
    bool operator!=(const T& rhs) const {
        return !operator==(*this,rhs);
    }        
};
template<typename T, typename Serialised> struct serialised_tuple_like : tuple_like<T> {
};
template<typename T, typename Serialised>
struct serialization_traits<serialised_tuple_like<T, Serialised>> {
    static Serialised encode(const T& bb) {
        Serialised s;
        s.tie() = bb.tie();
        return s;
    }
};

As long as both sides implement an appropriate tie(), this should be fine. If the source or destination classes aren't directly in your control, recommend define an inherited class that implements tie() and use that. For merging multiple classes, define a helper class that implements tie() in terms of its members.

Tousle answered 14/5, 2018 at 21:53 Comment(1)
Mhmm. Using an inheriting class sounds like it would solve the tagging problem for tuples which are syntactically equal but semantically different, so that's a good idea (+1 from me). However, I cannot add a tie() to Serialized, as it is a (protobuf-generated) class which is not under my control. (Basically, serialization_traits<>::encode() is this tie() function, which brings us back to problem #2 in my question. Do you see any way to get around this?Prosser
I
3

If your boilerplate really is just a bunch of plain old data structs with trivial comparison operators you could probably get away with some macros.

#define POD2(NAME, T0, N0, T1, N1) \
struct NAME { \
    T0 N0; \
    T1 N1; \
    NAME(const T0& N0, const T1& N1) \
        : N0(N0), N1(N1) {} \
    bool operator==(const NAME& rhs) const { return N0 == rhs.N0 && N1 == rhs.N1; } 
\
    bool operator!=(const NAME& rhs) const { return !operator==(rhs); } \
};

Usage would look like:

POD2(BarBaz, Bar, bar, Baz, baz)

template <>
struct serialization_traits<BarBaz> {
    static SerializedBarBaz encode(const BarBaz& bb) {
        SerializedBarBaz sbb;
        sbb.set_bar(bb.bar);
        sbb.set_baz(bb.baz);
        return sbb;
    }
};

You would need N macros where N is the number of permutations of argument counts that you have but that would be a one time upfront cost.

Alternatively you could leverage tuples to do a lot of the heavy lifting for you like you suggested. Here I've created a "NamedTuple" template for naming the getters of the tuple.

#define NAMED_TUPLE2_T(N0, N1) NamedTuple##N0##N1

#define NAMED_TUPLE2(N0, N1) \
template <typename T0, typename T1> \
struct NAMED_TUPLE2_T(N0, N1) { \
    typedef std::tuple<T0, T1> TupleType; \
    const typename std::tuple_element<0, TupleType>::type& N0() const { return std::get<0>(tuple_); } \
    const typename std::tuple_element<1, TupleType>::type& N1() const { return std::get<1>(tuple_); } \
    NAMED_TUPLE2_T(N0, N1)(const std::tuple<T0, T1>& tuple) : tuple_(tuple) {} \
    bool operator==(const NAMED_TUPLE2_T(N0, N1)& rhs) const { return tuple_ == rhs.tuple_; } \
    bool operator!=(const NAMED_TUPLE2_T(N0, N1)& rhs) const { return !operator==(rhs); } \
    private: \
        TupleType tuple_; \
}; \
typedef NAMED_TUPLE2_T(N0, N1)

Usage:

NAMED_TUPLE2(foo, bar)<int, int> FooBar;

template <>
struct serialization_traits<FooBar> {
    static SerializedFooBar encode(const FooBar& fb) {
        SerializedFooBar sfb;
        sfb.set_foo(fb.foo());
        sfb.set_bar(fb.bar());
        return sfb;
    }
};
Ivory answered 10/6, 2018 at 5:3 Comment(1)
Unfortunately, I have no control over the Serialized... types. They are generated.Prosser
C
2

Have you considered a slightly different approach? Rather than having a separate FooBar and BarBaz representation, consider a FooBarBaz similar to

message FooBarBaz {
  optional Foo foo = 1;
  optional Bar bar = 2;
  optional Baz baz = 3;
}

And then in your application code, you could take advantage of it like:

FooBarBaz foo;
foo.set_foo(...);
FooBarBaz bar;
bar.set_bar(...);
FooBarBaz baz;
baz.set_baz(...);
FooBarBaz foobar = foo;
foobar.MergeFrom(bar);
FooBarBaz barbaz = bar;
barbaz.MergeFrom(baz);

Alternately, you could take advantage of the protobuf encoding and serialize the messages. (the protobuf itself isn't actually serialized, you'd get that from calling one of the ToString methods on it).

// assume string_foo is the actual serialized foo from above, likewise string_bar
string serialized_foobar = string_foo + string_bar;
string serialized_barbaz = string_bar + string_baz;

FooBarBaz barbaz;
barbaz.ParseFromString(serialized_barbaz);

This does assume that you can move most of your apis away from explict sets of fields and toward common messages with optional fields to only send what you need. You may want to wrap up the edges of your system to assert that the fields required for a particular process are set before attempting to use that, but it might lead to less boilerplate elsewhere. The string concat trick can also be handy in cases where you're passing through a system that doesn't actually care what's in them.

Cockchafer answered 20/6, 2018 at 18:54 Comment(2)
I have no control over the messages. While currently, protobuf is the most-used encoding scheme, but JSON is also used. And who knows what we will use next year. No, the whole point of the serialization traits is to decouple our code from that.Prosser
I guess what I was reading is that you have some sort of system state, and some sort of partial system states. My suggestion was around using an incomplete system state to represent the partial states rather than having multiple overlapping definitions that are essentially partial states. Protobuf has some features that make that possible (and they could still be buried in the encoder implementations). That doesn't work quite as well once you get json involved, though there are protobuf to json libraries out there. In theory, you could use protobuf internally, serialize with reflection.Cockchafer

© 2022 - 2024 — McMap. All rights reserved.