What is the effective type of an object written by memset?
Asked Answered
H

1

24

Code 1:

unsigned int *p = malloc(sizeof *p);
memset(p, 0x55, sizeof *p);

unsigned int u = *p;

Code 2:

void *d = malloc(50);
*(double *)d = 1.23;
memset(d, 0x55, 50);

unsigned int u = *(unsigned int *)d;

In each case, what effect does memset have on the effective type of the object in the malloc'd space; and so is initializing u correct or a strict aliasing violation?

The definition of effective type (C11 6.5/6) is:

The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

However it is unclear whether memset behaves like writing through an lvalue of character type, or something else. The description of memset (7.24.6.1) is not very illuminating:

The memset function copies the value of c (converted to an unsigned char) into each of the first n characters of the object pointed to by s.

Hairless answered 21/6, 2015 at 23:14 Comment(15)
Related question. Answering that question relies on this question, it seems to me.Hairless
The specification of calloc is even vaguer on this point - "The space is initialized to all bits zero. [footnote] Note that this need not be the same as the representation of floating-point zero or a null pointer constant."Schaaff
I'm ~90% sure that both memset and calloc were intended to behave as-if they access storage through an lvalue of character type; indeed, I think the wording in 6.5/6 for "If a value is stored into an object having no declared type through an lvalue having a type that is not a character type" is intended precisely to permit erasing heap blocks with memset and calloc. I am, now, less convinced that the standard actually specifies what it was meant to re these functions, than I was when I wrote the "related question".Schaaff
@Schaaff I think that sentence is talking about *(T *)p = t; , then the block has effective type THairless
@MattMcNabb It's the exception for character type that I think is meant to cover memset.Schaaff
Another awkward situation is if we change one byte of an object using memset, does that then render the whole object unreadable by what its effective type was before?Hairless
@MattMcNabb ... although if it does, then your Code 2 is still UB because the effective type remains double after the memset. :-(Schaaff
@Schaaff yeah I can't wrap my head around it reallyHairless
@MattMcNabb memset(void *s, ...) takes one type of object for the destination: a void*. It "copies the value of c ... into each of the first n characters of the object pointed to by s". Thus effectively, memset() treats the pointer as a character pointer. Certain all this is well know to you. But not clear why it is insufficient to answer "What is the effective type of an object written by memset?".Sangria
@chux it's not well known to me, hence this question.Hairless
It's clear that this spec is crap.Coenzyme
@Coenzyme Heh. In their defence, if they wrote a spec that was incompatible with existing implementations then they would cop flak for that too, so they probably left it vague on purpose.Hairless
@MattMcNabb I know formalising that is very difficult because C must be able to fly just above ground (portable assembly) and also fast and high in the sky (high level, optimisable language). C++ may be worse as a low-high level language. It maybe the F35 of programming.Coenzyme
@curiousguy: I would posit that one of the big factors behind C's success was that in various tricky situations where different hardware platforms provided different contradictory behavioral guarantees, and programs had a variety of contradictory behavioral requirements, allowing implementations to guarantee their behavior as tightly or loosely as the implementers deemed practical made it possible for programs whose requirements fit well with their underlying platforms' guarantees to write code that was simpler, more readable, and faster, than would have been possible if the Standard...Pimento
...had mandated some other behavior. Thus, it could be close to portable assembly on some platforms, and a higher-level language on others. Unfortunately, hyper-modern C throws out a lot of useful loosely-defined behaviors which many compilers have traditionally supported, without offering any other reasonable way of expressing similarly-loose semantics.Pimento
C
8

My 50ct:

First, I break this into sentences for easier reference:

  1. The effective type of an object for an access to its stored value is the declared type of the object, if any.
  2. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
  3. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.
  4. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

The footnote might help here: "87) Allocated objects have no declared type.".

DNA: "does not apply"

Case 1:

  • memset(...): 1: DNA (no declared type), 2: DNA (memset writes to char - semantics), 3: DNA (neither memcpy nor memmove), 4: char [] for memset internally only (not permanent).
  • unsigned int u = *p: 1: DNA (no declared type), 2/3: DNA (no write, but read), 4: type of lvalue is unsigned int.

Conclusion: no violation, but the interpretion is implementation defined, as the actual value depends on alignment within the variable and endianess.

Case 2:

  • *(double *)d = 1.23;: 2: d becomes double * for this and following reads.
  • memset(d, 0x55, 50);: same as for Case 1.
  • unsigned int u = *(unsigned int *)d: d is still double *: bang!

In any way, memset() is of litte use for non-char scalars, except if using 0, which is still implementation dependent, as neither (float)0.0, nor the null pointer need to be actually "all bits zero".

Finally:

  • Sentence 2 does not apply to memset, as internally, memset() copies by char: "...of c (converted to an unsigned char) into each of the first n characters ..." (or uses char semantics, at least; the actual implementation is irrelevant here).
  • Sentence 3 does not apply to memset(), either, as that only applies to memcpy/memmove or when copying as "an array of character type". Which it also does not (but the former do, so the or-condition just makes an explicit copy-loop equivalent to the functions).
  • memset() does not change the effective type of the object. That differs from memcpy and memmove. That results from sentence 4, which does not include "... for that access and for subsequent accesses ..." as 2 and 3 state and 1 implies.
Cooper answered 22/6, 2015 at 1:4 Comment(11)
Does the unsigned char exception to the aliasing rule not allow data to be "hand-copied" a byte at a time without using memcpy/memmove? My understanding is that it does. If that is the case, I'm not sure what basis there would be for regarding the behavior of memset as being different from that of a memcpy from a zero-initialized source. As for the usefulness of memset with double or pointers, I'd say it's precisely as useful as the zero-initialization performed by calloc (IMHO, the standard should define testable flags to indicate what types have all-bits-zero as their "0").Pimento
@supercat: I think the trick is that you still copy "as an array of character type" as stated in 3. I do agree, however, there is no practical difference between copying an array of equal chars or memset in this regard. I'd call it a legal glitch. There is a general problem with - for instance - zero-filling (as defined e.g. for not explicitly initialized global variables or done by memset with 0 fill), and _null pointers_/float 0.0 not necessarily being "all bits zero". There would actually be a "null-ify function" required, which knows about the requirements of each base type. ...Cooper
@supercat: ... If the implementation uses zero-bits as "null" value, that would then be memset(..., 0). Otherwise it might be as complicated as copying templates per field. However, that is C's origin as "lower high level language".Cooper
@Olaf: One of my major peeves with C is the number of things that are "implementation-defined", but which fail to define any means by which a program can test whether a particular implementation defines a behavior a certain way. If the standard required that C17 (or whatever the next one is) implementations where an all-bits-zero pointer isn't null define __STDC_NULL_NOT_ALL_BITS_ZERO to 1, then portable code could avoid needlessly nulling things which were known to have been cleared by calloc or memset.Pimento
@supercat: I agree about the implementation defined behaviour (IDB) hassle. However, such a flag is imo not the solution, but would even complicater the problem. No, it requires the aforementioned function which nullifies any struct by knowing its scalars. OTOH, this would require RTTI about each struct/array, and I am not sure this is a good idea for C in general. C is meant to be slim and low-level. Just read the PCS for your target platform. All this adds to the reason why I do not use C on a PC. If speed is no problem, I use Python. C is for embedded (where I actually quite like it)Cooper
@Olaf: I don't think the maintainers of C standards have a uniform vision of whether C should still be a "low-level" language, or what that should mean with regard to optimizations. There are many cases where programs are required to produce valid output when given valid input, and are allowed to produce almost any output when given invalid input, but are required to refrain from launching nuclear missiles in any case. Many platforms could provide considerable help in this regard at little or no cost, but the Standard does nothing to allow such features to be used safely.Pimento
@supercat: I do not see C as the appropriate language for this anymore. It might have been, but nowadays, there are much better languages. (Oh, and I don't care much what the standard's ppl think about where C _should be used or not). I use C only on embedded systems, preferably with gcc extensions which actually add to safety (e.g. plan9-extensions) and be careful. It is also vital to know your ABI and PCS well enough.Cooper
@supercat: Ehm... not sure, but: is there anything in my answer you cannot agree with or think is even wrong?Cooper
@Olaf: My point was that in many cases it's routine and accepted for code to make use of the fact that calloc zero-initializes data; while I think there should be a way by which code can compile-time assert that certain data types have a default value of all-bits-zero, I don't see memset as any worse than calloc in that regard. BTW, for what do you see C as being the appropriate language, other than embedded systems [where many causality-breaking "optimizations" may be literally deadly]?Pimento
Well, it is the reason calloc() exists to zero out the data. For the other: I just think it is beyond C's scope for the language to know the specifics. If you need such, you have to write your own initializers, anyway; preferable a factory per struct/union (hmm, the latter commplicate the issue even more) type. But here starts actually C++, I'd say.Cooper
Where did I say mamset is worse than calloc? Heck, it is very likely used by calloc(). gcc. even emits calls to some of the mem*-functions itself, e.g. to copy a struct. For the application field: I think I already answered that. However, I have some ideas about an embedded language for 32bit bare-metal. But that is nothing to discuss here and now.Cooper

© 2022 - 2024 — McMap. All rights reserved.