Is it guaranteed that the padding bits of "zeroed" structure will be zeroed in C?
Asked Answered
F

3

3

This statement in the article made me embarrassed:

C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment for the target. If you zero a structure and then set some of the fields, will the padding bits all be zero? According to the results of the survey, 36 percent were sure that they would be, and 29 percent didn't know. Depending on the compiler (and optimization level), it may or may not be.

It was not completely clear, so I turned to the standard. The ISO/IEC 9899 in §6.2.6.1 states:

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

Also in §6.7.2.1:

The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

I just remembered that I recently implemented let's say some kind of hack, where I used the not-declared part of byte owned by bit-field. It was something like:

/* This struct is always allocated on the heap and is zeroed. */
struct some_struct {
  /* initial part ... */
  enum {
    ONE,
    TWO,
    THREE,
    FOUR,
  } some_enum:8;
  unsigned char flag:1;
  unsigned char another_flag:1;
  unsigned int size_of_smth;
  /* ... remaining part */
};

The structure was not at my disposal therefore I couldn't change it, but I had an acute need to pass some information through it. So I calculated an address of corresponding byte like:

unsigned char *ptr = &some->size_of_smth - 1;
*ptr |= 0xC0; /* set flags */

Then later I checked flags the same way.

Also I should mention that the target compiler and platform were defined, so it's not a cross-platform thing. However, current questions are still take a place:

  1. Can I rely on the fact that the padding bits of struct (in heap) will be still zeroed after memset/kzalloc/whatever and after some subsequent using? (This post does not disclose the topic in terms of the standard and safeguards for the further use of struct). And what about struct zeroed on stack like = {0}?

  2. If yes, does it mean that I can safely use "unnamed"/"not declared" part of bit-field to transfer some info for my purposes everywhere (different platform, compiler, ..) in C? (If I know for sure that no one crazy is trying to store anything in this byte).

Frag answered 6/10, 2018 at 23:28 Comment(5)
You can probably be sure that no compiler is going to mask off unused bit fields in a byte and deliberately leave them unitialised, or, when writing to a particur bit do anything differently whether there are 2 or 8 bits defined. Note: memset sets the entire range without any care what it containsHonolulu
Although inconvenient, you could create an union of the struct and n characters, where n is the size of the structure. You could do a one time check for n == sizeof(struct) to be sure, then zero out the n character array when desired.Ethanol
@WeatherVane In general you're right. However, there are situations in which you have to use such "dirty hacks", but also you have to be sure from the point of view of the standard. I certainly understand that it will work correctly in my specific case.Frag
Nope, the spec gives the optimizer writer enough rope to substitute a memset on a small struct with a few faster assignments and hang you.Inflationary
There is no guarantee anywhere - "where" the padding bits will be inserted in the structure except that it will not be before the first member (as the address of the struct is the address of the first member). Beyond that, you cannot rely on padding even being covered by a memset of the struct. While you can experiment on a single machine with a single compiler to determine the implementations behavior -- there is guarantee it would be applicable to any other machine or compiler.Armes
Q
3

The short answer to your first question is "no".

While an appropriate call of memset(), such as memset(&some_struct_instance, 0, sizeof(some_struct)) will set all bytes in the structure to zero, that change is not required to be persistent after "some use" of some_struct_instance, such as setting any of the members within it.

So, for example, there is no guarantee that some_struct_instance.some_enum = THREE (i.e. storing a value into a member) will leave any padding bits in some_struct_instance unchanged. The only requirement in the standard is that values of other members of the structure are unaffected. However, the compiler may (in emitted object code or machine instructions) implement the assignment using some set of bitwise operations, and be allowed to take shortcuts in a way that doesn't leave the padding bits alone (e.g. by not emitting instructions that would otherwise ensure the padding bits are unaffected).

Even worse, a simple assignment like some_struct_instance = some_other_struct_instance (which, by definition, is the storing of a value into some_struct_instance) comes with no guarantees about the values of padding bits. It is not guaranteed that the padding bits in some_struct_instance will be set to the same bitwise values as padding bits in some_other_struct_instance, nor is there a guarantee that the padding bits in some_struct_instance will be unchanged. This is because the compiler is allowed to implement the assignment in whatever means it deems most "efficient" (e.g. copying memory verbatim, some set of member-wise assignments, or whatever) but - since the value of padding bits after the assignment are unspecified - is not required to ensure the padding bits are unchanged.

If you get lucky, and fiddling with the padding bits works for your purpose, it will not be because of any support in the C standard. It will be because of good graces of the compiler vendor (e.g. choosing to emit a set of machine instructions that ensure padding bits are not changed). And, practically, there is no guarantee that the compiler vendor will keep doing things the same way - for example, your code that relies on such a thing may break when the compiler is updated, when you choose different optimisation settings, or whatever.

Since the answer to your first question is "no", there is no need to answer your second question. However, philosophically, if you are trying to store data in padding bits of a structure, it is reasonable to assert that someone else - crazy or not - may potentially attempt to do the same thing, but using an approach that messes up the data you are attempting to pass around.

Quadrireme answered 7/10, 2018 at 4:19 Comment(0)
H
1

From the first words of the standard specification:

C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment ...

These words mean that, in the aim to optimize (optimize for speed, probably, but also to avoid architecture restrictions on data/address buses), the compiler can make use of hidden, not-used, bits or bytes. NOT-USED because they would be forbidden or costly to address.

This also imply that those bytes or bits should not be visible from a programming perspective, and it should be considered a programming error to try to access those hidden data.

About those added data, the standard says that their content is "unspecified", and there is really no better way to state what an implementation can do with them. Think at those bitfield declarations, where you can declare integers with any bit width: no normal hardware will permit to read/write from memory in chunks smaller that 8 bits, so the CPU will always read or write at least 8 bits (sometimes, even more). Why should a compiler (an implementation) take care of doing something useful to those other bits, which the programmer specified he does not care about? It's a non sense: the programmer didn't give a name to some memory address, but then he wants to manipulate it?

The padding bytes between fields is pretty much the same matter as before: those added bytes are necessary, but the programmer is not interested in them - and he SHOULD NOT change its mind later!

Of course, one can study an implementation and arrive at some conclusion like "padding bytes will always be zeroed" or something like that. This is risky (are you sure they will be always-always zeroed?) but, more important, it is totally useless: if you need more data in a structure, simply declare them! And you will have no problem, never, even porting the source to different platforms or implementations.

Haunted answered 7/10, 2018 at 4:57 Comment(2)
Fully agree with your statements. However, this is the situation in which there is no possibility to change this data structure.Frag
@Frag to be sincere, I thought about those cases too. But decided to stay quiet... :-)Haunted
N
0

It is reasonable to start with the expectation that what is listed in the standard is correctly implemented. You're looking for further assurances for a particular architecture. Personally, if I could find documented details about that particular architecture, I would be reassured; if not, I would be cautious.

What constituted "cautious" would depend on how confident I needed to be. For example, building a detailed test set and running this periodically on my target architecture would give me a reasonable degree of confidence, but it's all about how much risk you want to take. If it's really, really important, stick to what they standards guarantee you; if it's less so, test and see if you can get enough confidence for what you need.

Neoarsphenamine answered 6/10, 2018 at 23:36 Comment(4)
Summing up. We have no guarantees from the standard. Therefore, alas, it's necessary to clearly study the behavior of the compiler. And recheck compiler docs every time you upgrade its version. Am I correct?Frag
You definitely cannot rely on the compiler emiting code which modifies padding bits when an adjacent field is modified, because it is faster than preserving the value of the bits. I don't think any amount of testing will help, because there is no way to know what set of circumstances might lead to such an optimisation being feasible.Grantee
@Grantee "You definitely cannot rely … I don't think any amount of testing will help" This depends on how certain you need to be. If you need to be absolutely, completely certain, you're right — you can't rely on it. If you can afford to be less than absolutely certain, and you are prepared to see it as a risk which you're prepared to manage, then proportionate testing will help you have the level of confidence you need. If this approach, say, halves your execution cost and testing shows that one time in ten thousand it paints a pixel the wrong colour, or produces a slightly longer journey,(cont)Neoarsphenamine
for your satnav, you might decide it's worth taking that calculated risk.Neoarsphenamine

© 2022 - 2024 — McMap. All rights reserved.