I confess that the idea that I can lay a struct
over a locally defined array in this way is frankly exotic.
I still maintain that C99 and all subsequent standards permit it.
If fact it's very arguable that members being objects in themselves the first bullet point in 6.7.5 allows it:
a type compatible with the effective type of the object
I think that's M.M's point.
Looking at the problem the other way, let's notice that it's absolutely legitimate (in a strictly conforming environment) to alias the member sp->x
as an object in it's own right.
In the context of the code in my OP consider a function with prototype void doit(int* ip,s* sp);
the following call is expected to behave logically:
doit(&(sp->x),sp);
NB: Program logic may (of course) may not behave as desired. For example if doit
increments sp->x
until it exceeds *ip
then there's a problem! However what is not allowed in a conformant compiler is for the outcome to be corrupted by artifacts due to the optimizer ignoring aliasing potential.
I maintain that C would be all the weaker if the language required me to code:
int temp=sp->x;
doit(&temp,sp);
sp->x=temp;
Imagine all the cases where any call to any function has to be policed for the potential aliasing access to any part of the structures being passed. Such a language would probably be unusable.
Obviously a hard optimizing (i.e. non-compliant) compiler might make a complete hash of doit()
if it doesn't recognize that ip
might be an alias of member in the middle of sp
.
That's irrelevant to this discussion.
To set out when a compiler can (and cannot) make such assumptions is understood as the reason why the standard needs to set very precise parameters around aliasing. That is to give the optimizer some conditions to dis-count. In a low level language such as 'C' it could be reasonable (even desirable) to say that a suitably aligned pointer to an accessible valid bit pattern can be used to access to a value.
It is absolutely established that sp->x
in my OP is pointing to a properly aligned location holding a valid unsigned int
.
The intelligent concerns are whether the compiler/optimizer agree that's then a legitimate way to access that location or ignorable as undefined behavior.
As the doit()
example shows it's absolutely established that a structure can be broken down and treated as individual objects which merely happen to have a special relationship.
This question appears to be about the circumstances when a set of members that happen to have that special relationship can have a structure 'laid over them'.
I think most people will agree that the program at the bottom of this answer performs valid, worthwhile functionality that if associated with some I/O library could 'abstract' a great deal of the work required to read and write structures.
You might think there's a better way of doing it, but I'm not expecting many people to think it's not an unreasonable approach.
It operates by exactly that means - it builds a structure member by member then accesses it through that structure.
I suspect some of the people who object to the code in the OP are more relaxed about this.
Firstly, it operates on memory allocated from the free-store as 'un-typed' universally aligned storage.
Secondly, it builds a whole structure. In the OP I'm pointing the rules (at least appear to permit) that you can line up bits of a structure and so long as you only de-reference those bits everything is OK.
I somewhat share that attitude. I think the OP is slightly perverse and language stretching in a poorly written corner of the standard. Not something to put your shirt on.
However, I absolutely think it would be a mistake to forbid the techniques below as they rule out a logically very valid technique that recognizes structures can be built up from objects just as much as broken down into them.
However I will say that something like this is the only thing I could come up with where this sort of approach seems worthwhile. But on the other hand if you can't pull data apart AND/OR put it together then you quickly start to break the notion at C structures are POD - the possibly padded sum of their parts, nothing more, nothing less.
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
typedef enum {
is_int, is_double //NB:TODO: support more types but this is a toy.
} type_of;
//This function allocates and 'builds' an array based on a provided set of types, offsets and sizes.
//It's a stand-in for some function that (say) reads structures from a file and builds them according to a provided
//recipe.
int buildarray(void**array,const type_of* types,const size_t* offsets,size_t mems,size_t sz,size_t count){
const size_t asize=count*sz;
char*const data=malloc(asize==0?1:asize);
if(data==NULL){
return 1;//Allocation failure.
}
int input=1;//Dummy...
const char*end=data+asize;//One past end. Make const for safety!
for(char*curr=data;curr<end;curr+=sz){
for(size_t i=0;i<mems;++i){
char*mem=curr+offsets[i];
switch(types[i]){
case is_int:
*((int*)mem)=input++;//Dummy...Populate from file...
break;
case is_double:
*((double*)mem)=((double)input)+((double)input)/10.0;//Dummy...Populate from file...
++input;
break;
default:
free(data);//Better than returning an incomplete array. Should not leak even on error conditions.
return 2;//Invalid type!
}
}
}
if(array!=NULL){
*array=data;
}else{
free(data);//Just for fun apparently...
}
return 0;
}
typedef struct {
int a;
int b;
double c;
} S;
int main(void) {
const type_of types[]={is_int,is_int,is_double};
const size_t offsets[]={offsetof(S,a),offsetof(S,b),offsetof(S,c)};
S* array=NULL;
const size_t size=4;
int err=buildarray((void **)&array,types,offsets,3,sizeof(S),size);
if(err!=0){
return EXIT_FAILURE;
}
for(size_t i=0;i<size;++i){
printf("%zu: %d %d %f\n",i,array[i].a,array[i].b,array[i].c);
}
free(array);
return EXIT_SUCCESS;
}
I think it's an interesting tension.
C is intended to be that low level high level language and give the programmer almost direct access to machine operations and memory.
That means the programmer can fulfill with the arbitrary demands of hardware devices and write highly efficient code.
However if the programmer is given absolute control such as my point about an 'if it fits it's OK' approach to aliasing then the optimizer gets its game spoilt.
So weirdly it's worth holding a little bit of performance back to return a dividend from the optimizer.
Section 6.5 of the C99 standard tries (and doesn't entirely succeed) to set that boundary out.
gcc
and-fstrict-aliasing
. Thegcc
documents says that that all levels have various degrees of false positives and negaties and so can not really be used as a reliable indication that the code does/does not violate strict aliasing. The checks fails on many trivial examples. – LinusX *
and aY *
(incompatible) point to overlapping memory locations then they can't both be used to access any sub-object. However it seems to me to be very clear from the wording chosen by the standards that ifX
andY
both contain a member of the same type, then it is not an aliasing violation to access that member viaX
and viaY
. – Kalindi