Is there a way to enforce specific endianness for a C or C++ struct?
Asked Answered
B

11

35

I've seen a few questions and answers regarding to the endianness of structs, but they were about detecting the endianness of a system, or converting data between the two different endianness.

What I would like to now, however, if there is a way to enforce specific endianness of a given struct. Are there some good compiler directives or other simple solutions besides rewriting the whole thing out of a lot of macros manipulating on bitfields?

A general solution would be nice, but I would be happy with a specific gcc solution as well.

Edit:

Thank you for all the comments pointing out why it's not a good idea to enforce endianness, but in my case that's exactly what I need.

A large amount of data is generated by a specific processor (which will never ever change, it's an embedded system with a custom hardware), and it has to be read by a program (which I am working on) running on an unknown processor. Byte-wise evaluation of the data would be horribly troublesome because it consists of hundreds of different types of structs, which are huge, and deep: most of them have many layers of other huge structs inside.

Changing the software for the embedded processor is out of the question. The source is available, this is why I intend to use the structs from that system instead of starting from scratch and evaluating all the data byte-wise.

This is why I need to tell the compiler which endianness it should use, it doesn't matter how efficient or not will it be.

It does not have to be a real change in endianness. Even if it's just an interface, and physically everything is handled in the processors own endianness, it's perfectly acceptable to me.

Brickey answered 18/7, 2011 at 11:21 Comment(14)
I can't see the point of this? The only time you care is when marshalling and unmarshalling the data - all other instances, you'll want the platform specific endianess - so why force a convoluted conversion scheme for all uses of a given struct? Why not isolate it purely to the marshalling/unmarshalling bits?Hamamelidaceous
Endianness is not a property of the struct but a property of the architecture that is running the code. While you could theoretically force the in memory representation to be of a given endianess, that would force conversions from platform to struct endianess in all reads and writes to each field for something that is not observable from the outside. You should only convert formats when dealing with the outside world.Kao
@Nim: I read a lot of data from an embedded system, which has hundreds of different structs, many of them having deep layers of other huge structs inside. As I have the source of the embedded system, I have the code of all those structs. This is why it would be much easier to just use them, because reading and evaluating the data byte-wise would take very long time. The size of the data and number of different structs is huge, so even conversion is better to be avoided.Brickey
@David: I deal with the outside world. I know enforcing endianness is not the cleanest thing to do, but in this specific case, with this specific hardware that's exactly what I need.Brickey
@Brickey - where does your code run? On the same processor or outside? If it's on the same processor - you don't have to care, if outside, the mechanism of transmitting the data from the embedded system to yours should deal with the conversions... And locally you'll work in the format of the processor where you execute the code...Hamamelidaceous
@Nim: no, my code does not run on the same processor, this is why I would like to enforce the endianness, because I know the endianness of the processor which created the data, and that processor will not change, as it is an embedded system running on a very specific hardware. Actually, the processor which created the data is the one that will never change, but the processor where my code runs will possibly change.Brickey
And therefore all your "business logic" needs to operate in the endianess of the local processor - now imagine you need the following operations some_struct.int_value += 10 - you would have to implement some hackery pokery to get that to work in a specific endianess if your processor doesn't match it. Now imagine doing this everywhere in your code - it'll be a horrible mess. If you are really worried about performance, address this in your marshalling/unmarhsalling code...Hamamelidaceous
@vsz: I don't think you understand what Nim is trying to say. You can make the interface of your application be that of the embedded processor, and that is fine, but that only means that whenever your application interacts with that processor it will have to translate from your internal representation to that of the processor. For all of the internal representations you should use whatever the native platform logic has, and just serialize and deserialize in the interface.Kao
@David: exactly that serialization is the problem. The code on the embedded processor can not be changed, and what it does is simply dumping everything with memcpy into char arrays. Having the source of it only means that I at least don't have to reverse-engineer its structure, but not that I am able (or am allowed) in any way to alter it.Brickey
@vsz: Ok, back to basics: you do not want change the endianess in your processor, you only want to be able to borrow the structs that are in the other processors endianness and translate them. That is called serialization and it is quite different from what your question asks for. For that you will need to define the stuctures, read from network/disk and apply the endianness conversion once when reading, then use it in the current processor format and convert once while writing back to disk/network. For the whole time, in memory it will have the local endianess.Kao
Your question should be something in the lines of I have processor X with endianess Y from which I want to read a set of structs and work with them. How can I perform that conversion? Rather than I want to enforce a particular endianess in my structs.Kao
I know what serialization means, and I used it myself in the past. However, there are problems with it in this case: I don't have a nice serialization on the transmitter side which I can conveniently de-serialize on the receiving side. The problems with converting the data is that it's huge and diverse. A struct, which has an other struct inside, which has arrays of different structs, each having other arrays, variables and structs inside, and so on, having arrays and variables of different sizes all around. Writing a conversion tool for that thing would take quite a lot of time.Brickey
It does not matter if it's not a real change in endianness. Even if it's just an interface and the real data is stored physically with the processors own endianness, it would be a solution.Brickey
Good question. Sometimes it would be very nice to have an attribute for specifying endianness for structs and members. Something like: __attribute__ ((endianness (BIG_ENDIAN))) for gcc. Many network protocols uses bigendian (=network byteorder). So protocol sources have lots of ntohs(), htonl(), etc calls for making conversions. If there is bit fields in structs, then the code will be even more ugly (See struct ip from "netinet/ip.h").Alf
L
26

The way I usually handle this is like so:

#include <arpa/inet.h> // for ntohs() etc.
#include <stdint.h>

class be_uint16_t {
public:
        be_uint16_t() : be_val_(0) {
        }
        // Transparently cast from uint16_t
        be_uint16_t(const uint16_t &val) : be_val_(htons(val)) {
        }
        // Transparently cast to uint16_t
        operator uint16_t() const {
                return ntohs(be_val_);
        }
private:
        uint16_t be_val_;
} __attribute__((packed));

Similarly for be_uint32_t.

Then you can define your struct like this:

struct be_fixed64_t {
    be_uint32_t int_part;
    be_uint32_t frac_part;
} __attribute__((packed));

The point is that the compiler will almost certainly lay out the fields in the order you write them, so all you are really worried about is big-endian integers. The be_uint16_t object is a class that knows how to convert itself transparently between big-endian and machine-endian as required. Like this:

be_uint16_t x = 12;
x = x + 1; // Yes, this actually works
write(fd, &x, sizeof(x)); // writes 13 to file in big-endian form

In fact, if you compile that snippet with any reasonably good C++ compiler, you should find it emits a big-endian "13" as a constant.

With these objects, the in-memory representation is big-endian. So you can create arrays of them, put them in structures, etc. But when you go to operate on them, they magically cast to machine-endian. This is typically a single instruction on x86, so it is very efficient. There are a few contexts where you have to cast by hand:

be_uint16_t x = 37;
printf("x == %u\n", (unsigned)x); // Fails to compile without the cast

...but for most code, you can just use them as if they were built-in types.

Largehearted answered 18/7, 2011 at 14:51 Comment(3)
+1 for the first solution which is general and actually manipulates endianness. If I don't find anything simpler, I think I'll use your suggestion. Thanks!Brickey
My only suggestion would be to perhaps consider declaring it a struct rather than a class. Since you explicitly specify the accessability of all memebers, the two are formally equivlent. However a struct has the connotation of being light-weight, which is definately a goal of be_uint32_t. It also has some connotation of normally being used by value rather than by reference or by pointer which is also true of be_uint32_t. On the other hand be_uint32_t has no public fields, which is also an implication of struct.Asquith
@Kevin: Yeah, I usually reserve "struct" for classes with all-public fields and no non-trivial functions. I suppose it is debatable whether the casting operators are "trivial".Largehearted
J
26

A bit late to the party but with current GCC (tested on 6.2.1 where it works and 4.9.2 where it's not implemented) there is finally a way to declare that a struct should be kept in X-endian byte order.

The following test program:

#include <stdio.h>
#include <stdint.h>

struct __attribute__((packed, scalar_storage_order("big-endian"))) mystruct {
    uint16_t a;
    uint32_t b;
    uint64_t c;
};


int main(int argc, char** argv) {
    struct mystruct bar = {.a = 0xaabb, .b = 0xff0000aa, .c = 0xabcdefaabbccddee};

    FILE *f = fopen("out.bin", "wb");
    size_t written = fwrite(&bar, sizeof(struct mystruct), 1, f);
    fclose(f);
}

creates a file "out.bin" which you can inspect with a hex editor (e.g. hexdump -C out.bin). If the scalar_storage_order attribute is suppported it will contain the expected 0xaabbff0000aaabcdefaabbccddee in this order and without holes. Sadly this is of course very compiler specific.

Janijania answered 5/12, 2016 at 12:20 Comment(2)
The fact it's compiler-specific is typical for new features. I'm not sure this is a good idea. By making endianness invisible, I'd be worried that a programmer won't even think about endianness when it is important. For example, adding a checksum to an existing data transfer protocol that sends structures similar to the code you posted.Obduliaobdurate
I'm not sure, I like having this as a feature of the type, I wonder though what happens if people stick this to random structs that might end up written out. That said, I think for how low-level C is the standards committee has done a terrible job of actually giving people dependable ways to specify important implementation requirements and IMHO having features like "packed" structs in the standard should be more important than supporting super quirky hardware, let the fringe stuff be hard instead of the common case.Janijania
T
4

Try using
#pragma scalar_storage_order big-endian to store in big-endian-format
#pragma scalar_storage_order little-endian to store in little-endian
#pragma scalar_storage_order default to store it in your machines default endianness

Read more here

Tadd answered 4/7, 2020 at 8:20 Comment(0)
E
3

No, I dont think so.

Endianness is the attribute of processor that indicates whether integers are represented from left to right or right to left it is not an attribute of the compiler.

The best you can do is write code which is independent of any byte order.

Excel answered 18/7, 2011 at 11:35 Comment(2)
Just to nitpick, floating point registers also have a bit order.Peirsen
@KerrekSB, just to nitpick, endianness is about byte order, not bit order. Most significant bit is always on the left in a byte, regardless of endianness.Screamer
C
1

No, there's no such capability. If it existed that could cause compilers to have to generate excessive/inefficient code so C++ just doesn't support it.

The usual C++ way to deal with serialization (which I assume is what you're trying to solve) this is to let the struct remain in memory in the exact layout desired and do the serialization in such a way that endianness is preserved upon deserialization.

Corned answered 18/7, 2011 at 12:5 Comment(0)
E
1

I am not sure if the following can be modified to suit your purposes, but where I work, we have found the following to be quite useful in many cases.

When endianness is important, we use two different data structures. One is done to represent how it expected to arrive. The other is how we want it to be represented in memory. Conversion routines are then developed to switch between the two.

The workflow operates thusly ...

  1. Read the data into the raw structure.
  2. Convert to the "raw structure" to the "in memory version"
  3. Operate only on the "in memory version"
  4. When done operating on it, convert the "in memory version" back to the "raw structure" and write it out.

We find this decoupling useful because (but not limited to) ...

  1. All conversions are located in one place only.
  2. Fewer headaches about memory alignment issues when working with the "in memory version".
  3. It makes porting from one arch to another much easier (fewer endian issues).

Hopefully this decoupling can be useful to your application too.

Esquimau answered 18/7, 2011 at 14:35 Comment(1)
Thank you for the answer, we actually use similar strategies. The problem in this case is, that the structs are so big, numerous and complicated, that writing conversion routines would take up a lot of time. If there only was a good conversion tool which automatically converts C structures to a specific endianness! Well, a compiler directive would also be nice, but a good interface would be sufficient. However, I was not be able to find any.Brickey
M
1

A possible innovative solution would be to use a C interpreter like Ch and force the endian coding to big.

Malraux answered 18/7, 2011 at 14:40 Comment(0)
R
1

Boost provides endian buffers for this.

For example:

#include <boost/endian/buffers.hpp>
#include <boost/static_assert.hpp>

using namespace boost::endian;

struct header {
    big_int32_buf_t     file_code;
    big_int32_buf_t     file_length;
    little_int32_buf_t  version;
    little_int32_buf_t  shape_type;
};
BOOST_STATIC_ASSERT(sizeof(h) == 16U);
Rainey answered 1/10, 2015 at 9:34 Comment(0)
B
0

There is a data representation for this called XDR. Have a look at it. http://en.wikipedia.org/wiki/External_Data_Representation

Though it might be a little too much for your Embedded System. Try searching for an already implemented library that you can use (check license restrictions!).

XDR is generally used in Network systems, since they need a way to move data in an Endianness independent way. Though nothing says that it cannot be used outside of networks.

Bisulcate answered 18/7, 2011 at 11:22 Comment(0)
W
0

Maybe not a direct answer, but having a read through this question can hopefully answer some of your concerns.

Woodworth answered 18/7, 2011 at 11:28 Comment(0)
M
0

You could make the structure a class with getters and setters for the data members. The getters and setters are implemented with something like:

int getSomeValue( void ) const {
#if defined( BIG_ENDIAN )
    return _value;
#else
    return convert_to_little_endian( _value );
#endif
}

void setSomeValue( int newValue) {
#if defined( BIG_ENDIAN )
    _value = newValue;
#else
    _value = convert_to_big_endian( newValue );
#endif
}

We do this sometimes when we read a structure in from a file - we read it into a struct and use this on both big-endian and little-endian machines to access the data properly.

Magnus answered 18/7, 2011 at 14:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.