Parsing binary message stream in C/C++

Asked 20/1, 2011 at 15:55 Answered 2/7, 2015 at 23:2

I'm writing a decoder for a binary protocol (Javad GRIL protocol). It consists of about a hundred messages, with data in the following format:

struct MsgData {
    uint8_t num;
    float x, y, z;
    uint8_t elevation;
    ...
};

The fields are ANSI-encoded binary numbers which follow each other with no gaps. The simplest way to parse such messages is to cast an input array of bytes to the appropriate type. The problem is that the data in stream are packed, i.e. unaligned.

On x86 this can be solved by using #pragma pack(1). However, that won't work on some other platforms or will incur performance overhead due to further work with misaligned data.

Another way is to write a specific parse function for each message type, but as I've mentioned, the protocol includes hundreds of messages.

Yet another alternative is to use something like the Perl unpack() function and store the message format somewhere. Say, we can #define MsgDataFormat "CfffC" and then call unpack(pMsgBody, MsgDataFormat). This is much shorter but still error-prone and redundant. Moreover, the format can be more complicated because messages can contain arrays, so the parser will be slow and complex.

Is there any common and effective solution? I've read this post and Googled around but didn't find a better way to do it.

Maybe C++ has a solution?

Sportsmanship answered 20/1, 2011 at 15:55 Comment(5)

I suppose using tuple types to define the messages, you could write function templates that iterate over tuple members and invoke the appropriate extraction function for whatever type you're using. However, I can't come up with an idea for automagically converting from these tuples to structs. – Dowable 20/1, 2011 at 16:5

Assuming you're using MSVC++ #pragma pack(1) should work even on other platforms. The packing is implemented in terms of bit shifts and masks, not operating system alignment fixups. – Whiney 20/1, 2011 at 16:11

Your data is unpakced, unaligned. So only correct way to do is byte wise access such as unpack suggested by @larsmans. – Flow 20/1, 2011 at 16:23

@Dowable I can't figure it out also) @Billy unfortunaltey I'm writing fo QNX and other exotic platforms. – Sportsmanship 20/1, 2011 at 16:49

I have played with this for a while and updated my answer with a full-blown version that seem to do what you want. HTH. – Dowable 21/1, 2011 at 21:29

Ok, the following compiles for me with VC10 and with GCC 4.5.1 (on ideone.com). I think all this needs of C++1x is <tuple>, which should be available (as std::tr1::tuple) in older compilers as well.

It still needs you to type some code for each member, but that is very minimal code. (See my explanation at the end.)

#include <iostream>
#include <tuple>

typedef unsigned char uint8_t;
typedef unsigned char byte_t;

struct MsgData {
    uint8_t num;
    float x;
    uint8_t elevation;

    static const std::size_t buffer_size = sizeof(uint8_t)
                                         + sizeof(float) 
                                         + sizeof(uint8_t);

    std::tuple<uint8_t&,float&,uint8_t&> get_tied_tuple()
    {return std::tie(num, x, elevation);}
    std::tuple<const uint8_t&,const float&,const uint8_t&> get_tied_tuple() const
    {return std::tie(num, x, elevation);}
};

// needed only for test output
inline std::ostream& operator<<(std::ostream& os, const MsgData& msgData)
{
    os << '[' << static_cast<int>(msgData.num) << ' ' 
       << msgData.x << ' ' << static_cast<int>(msgData.elevation) << ']';
    return os;
}

namespace detail {

    // overload the following two for types that need special treatment
    template<typename T>
    const byte_t* read_value(const byte_t* bin, T& val)
    {
        val = *reinterpret_cast<const T*>(bin);
        return bin + sizeof(T)/sizeof(byte_t);
    }
    template<typename T>
    byte_t* write_value(byte_t* bin, const T& val)
    {
        *reinterpret_cast<T*>(bin) = val;
        return bin + sizeof(T)/sizeof(byte_t);
    }

    template< typename MsgTuple, unsigned int Size = std::tuple_size<MsgTuple>::value >
    struct msg_serializer;

    template< typename MsgTuple >
    struct msg_serializer<MsgTuple,0> {
        static const byte_t* read(const byte_t* bin, MsgTuple&) {return bin;}
        static byte_t* write(byte_t* bin, const MsgTuple&)      {return bin;}
    };

    template< typename MsgTuple, unsigned int Size >
    struct msg_serializer {
        static const byte_t* read(const byte_t* bin, MsgTuple& msg)
        {
            return read_value( msg_serializer<MsgTuple,Size-1>::read(bin, msg)
                             , std::get<Size-1>(msg) );
        }
        static byte_t* write(byte_t* bin, const MsgTuple& msg)
        {
            return write_value( msg_serializer<MsgTuple,Size-1>::write(bin, msg)
                              , std::get<Size-1>(msg) );
        }
    };

    template< class MsgTuple >
    inline const byte_t* do_read_msg(const byte_t* bin, MsgTuple msg)
    {
        return msg_serializer<MsgTuple>::read(bin, msg);
    }

    template< class MsgTuple >
    inline byte_t* do_write_msg(byte_t* bin, const MsgTuple& msg)
    {
        return msg_serializer<MsgTuple>::write(bin, msg);
    }
}

template< class Msg >
inline const byte_t* read_msg(const byte_t* bin, Msg& msg)
{
    return detail::do_read_msg(bin, msg.get_tied_tuple());
}

template< class Msg >
inline const byte_t* write_msg(byte_t* bin, const Msg& msg)
{
    return detail::do_write_msg(bin, msg.get_tied_tuple());
}

int main()
{
    byte_t buffer[MsgData::buffer_size];

    std::cout << "buffer size is " << MsgData::buffer_size << '\n';

    MsgData msgData;
    std::cout << "initializing data...";
    msgData.num = 42;
    msgData.x = 1.7f;
    msgData.elevation = 17;
    std::cout << "data is now " << msgData << '\n';
    write_msg(buffer, msgData);

    std::cout << "clearing data...";
    msgData = MsgData();
    std::cout << "data is now " << msgData << '\n';

    std::cout << "reading data...";
    read_msg(buffer, msgData);
    std::cout << "data is now " << msgData << '\n';

    return 0;
}

For me this prints

buffer size is 6
initializing data...data is now [0x2a 1.7 0x11]
clearing data...data is now [0x0 0 0x0]
reading data...data is now [0x2a 1.7 0x11]

(I've shortened your MsgData type to only contain three data members, but this was just for testing.)

For each message type, you need to define its buffer_size static constant and two get_tied_tuple() member functions, one const and one non-const, both implemented in the same way. (Of course, these could just as well be non-members, but I tried to keep them close to the list of data members they are tied to.)
For some types (like std::string) you will need to add special overloads of those detail::read_value() and detail::write_value() functions.
The rest of the machinery stays the same for all message types.

With full C++1x support you might be able to get rid of having to fully type out the explicit return types of the get_tied_tuple() member functions, but I haven't actually tried this.

Dowable answered 20/1, 2011 at 16:39 Comment(1)

nice example for using tuple...makes for quite a nice syntax. C++11 rocks.All the better that you provide the full source on ideone.com! – Mcdougall 26/7, 2012 at 20:39

My solution for parsing binary input is to use a Reader class, so per message entry you can define what is read and the reader can check for overruns,underruns,... .

In you case:

msg.num = Reader.getChar();
msg.x = Reader.getFloat();
msg.y = Reader.getFloat();
msg.z = Reader.getFloat();
msg.elevation = Reader.getChar();

It still is a lot of work and error-prone, but at least it helps checking for errors.

Sogdian answered 20/1, 2011 at 16:10 Comment(5)

"Reader Class" == std::istream or std::streambuf. – Whiney 20/1, 2011 at 16:12

@Billy: so it is. I've been using the Reader class for a while, so I never had a use for a more standard system. Well spotted. – Sogdian 20/1, 2011 at 16:15

Yeah, but this is what I call "writing specicfic parsing routine for every message") – Sportsmanship 20/1, 2011 at 16:51

+1 Nice. I like how you display that data is read to each member, rather than reading the whole structure. I also like how the reader can handle Endianess without needing to modify the receiving structure. – Delphiadelphic 20/1, 2011 at 17:57

@gaga: sure this is message specific, but if you have the messages defined somewhere, say in a header file, you could write a script which generates something like the above for you with that header file as input. – Coif 20/1, 2011 at 19:55

Simple answer is no, if the message is a specific binary format that cannot be simply casted, you have no choice but to write a parser for it. If you have the message descriptions (say xml or some form of easily parsed description), why don't you generate the parsing code automatically from that description? It won't be as fast as a cast, but will be damn sight faster generating than writing each message by hand...

Univalent answered 20/1, 2011 at 16:8 Comment(0)

I don't think you can avoid writing specicfic parsing routine for every message in pure C++ (without using pragma).

If all your messages are simple, POD, C-like structures, I think the easiest solution would be to write a code generator: put your structs in a header without other C++ stuff and write a simple parser (a perl/python/bash script using a couple of regular expressions should be enough) -or look for one- that is able to find the variable names in any message; then use it to automatically generate some code for any message to read it, like this:

YourStreamType & operator>>( YourStreamType &stream, MsgData &msg ) {
    stream >> msg.num >> msg.x >> msg.y >> msg.z >> msg.elevation;
    return stream;
}

specialize YourStreamType's operator>> for any basic type your messages contain and you should be done:

MsgData msg;
your_stream >> msg;

Composite answered 21/1, 2011 at 22:11 Comment(0)

You can always align your memory yourself:

uint8_t msg[TOTAL_SIZE_OF_THE_PARTS_OF_MsgData];

As sizeof(MsgData) returns the size of MsgData + padding bytes, you can calculate

enum { TOTAL_SIZE_OF_THE_PARTS_OF_MsgData = 
    2*sizeof(uint8_t)+
    3*sizeof(float)+sizeof(THE_OTHER_FIELDS)
}

Using enums for such constants is a well proven concept on several machines.

read a binary message into the msg array. Later you can cast the values into the MsgData values:

unsigned ofs = 0;
MsgData M;
M.num = (uint8_t)(&msg[ofs]);
ofs += sizeof(M.num);
M.x = (float)(&msg[ofs]);
ofs += sizeof(M.x);

and so forth...

or use memcpy if you don't like the type casts:

memcpy(&M.x,&msg[ofs],sizeof(M.x)); ...

Catamaran answered 2/7, 2015 at 23:2 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags