Confused about casting and order of operations
Asked Answered
V

2

5

This is a very specific problem and I haven't heard back from the author, so I thought I would throw this out and see if my sanity is intact or not.

This is in regard to this video regarding DMA on a particular ARM processor STM32F407 and casting 16 bit words to 32 bit. I used to consider myself an expert in C but something is wrong here or my thinking is wrong.

Now around 16:46, you will see this setup for the DMA:

uint16_t rxBuf[8];
uint16_t txBuf[8];
...
HAL_I2SEx_TransmitReceive_DMA(&hi2s2, txBuf, rxBuf, 4);

And around 19:05 you will see this in the callback routine:

int lsample = (int) (rxBuf[0]<<16)|rxBuf[1];
int rsample = (int) (rxBuf[2]<<16)|rxBuf[3];

Now, what's going on is that the STM32F407 can only do DMA using 16-bit words, even if the data on the serial bus (I2S) is 24-bit or 32-bit data. I am sure (or assuming) that type int is the same as int32_t. The value in rxBuf[0] contains the upper 16 bits of a sample and rxBuf[1] has the lower 16 bits of the 32-bit left-channel sample.

My problem is the parenths surrounding rxBuf[0]<<16 do not include the cast (int) to its left.

How can that possibly work? The cast to 32 bits applies to the result (rxBuf[0]<<16) but what is inside the pareths is only a 16-bit unsigned value being shifted left by 16 bits. There should be nothing but zeros left in that 16-bit value when it is cast to 32 bits.

I think the two lines of code should be

int32_t lsample = (int32_t) ( ( (uint32_t)rxBuf[0]<<16 ) | (uint32_t)rxBuf[1] );
int32_t rsample = (int32_t) ( ( (uint32_t)rxBuf[2]<<16 ) | (uint32_t)rxBuf[3] );

First you must cast the value rxBuf[0] into (uint32_t), then shift it left 16 bits. C will naturally cast the 16-bit value rxBuf[1] to 32 bits since the left operand is now 32 bits, but it's good to be explicit with the cast anyway.

But that is not the code shown and the guy's project obviously works. But I cannot see how you can get anything other than zeros in the upper 16 bits of the word.

Is it that the compiler isn't working correctly and serendipitously treating these 16-bit values as a 32-bit word? Or am I (after 4 decades of coding in C) just not understanding the correct syntax for casting in C? The C reference manual seems to support my understanding of the correct syntax.

Vicissitude answered 1/7 at 2:32 Comment(32)
The integer promotionsConnivent
Note that there is still a potential bug: if the high bit of rxBuf[0] is set, then a right shift by 16 will shift it into the sign bit, which is signed integer overflow causing undefined behavior. So what you probably want is (uint32_t)rxBuf[0] << 16 | rxBuf[1], and assign it to a variable of type uint32_t.Connivent
Don't expect amazing code from video tutorials produced by electronics folks. The majority of them only know just enough C to get by. An explicit cast before shifting into a larger type has better self-documenting semantics and is generally a better way to write code that performs bit manipulation. I second the idea of using unsigned here, and being explicit about the width of the type because int is implementation-dependent.Gintz
@NateEldredge, I agree that uint32_t should be used instead of int and then the result be cast to int32_t. It should be int32_t lsample = (int32_t)(((uint32_t)rxBuf[0]<<16 )|rxBuf[1]); if we're gonna be super strict and proper and explicit with casting. But that (int32_t) cast could be left out since it's implicit in the assignment. I disagree about undefined behavior. If there was sign extension (and that would happen if the 16-bit value was int16_t) then the 16-bit value is sign-extended in the cast and the 16 sign bits fall off the edge instead of 16 zeros.Vicissitude
@Gintz , my problem is that this code should not work. Not only that it's bad style. BTW, I agree with you about the coding practices and style of a lotta coders that are electrical engineers. I try to be clean as a whistle with my code.Vicissitude
But it's uint16_t which is not sign-extended. I don't think most compilers enforce the rule against overflow for shifts, because there's probably lots of bad code relying on it, but for instance UBSAN traps it: godbolt.org/z/nzxdxj6WqConnivent
Overflow occurs in C whenever the mathematical value of an expression can't be represented in its type. Mathematically, shifts are defined as multiplication by a power of 2. 40000 * 2^16 is 2621440000, which is greater than the maximum value of int32_t (2147483647), hence overflow and UB. You don't have to shift out of the sign bit to cause overflow; shifting in is bad enough, because you've overflowed the 31 value bits.Connivent
Yes @NateEldredge I know it's uint16_t. But that value inside the pareths is still only a 16-bit value and, if the compiler is working according to spec, (rxBuf[0]<<16) should simply be all zeros whether it's uint16_t or int16_t. It's left shifting so only zeros go in from the left and whatever bits that are shifted to the right fall off the edge. It should be all zeros.Vicissitude
@robertbristow-johnson Your original question was already answered, so I wasn't commenting on that. You are incorrect though. That code should work (mostly) because of integer promotion, and it does have undefined behavior even if you disagree. Working on the assumption that int is 32 bits, when uint32_t is promoted to int it will always be positive even if bit 15 was 1. You then shift that bit into the sign position, essentially creating a negative from an originally positive value which is undefined behavior according to the language.Gintz
@robertbristow-johnson: That part was already explained in the answers, and in the link I gave in my first comment. The integer promotions mean that rxBuf[0] << 16 is shifting an int, despite there being no explicit cast. So you really are shifting within a signed 32-bit number, not an unsigned 16-bit number. C doesn't actually have any way to do the latter.Connivent
@paddy, inside the parenths, where is the uint16_t being promoted to 32 bits? I don't see it. That word is a 16-bit word and it got all of its bits shifted out. It should only be 16 zeros remaining before it's cast to 32-bits.Vicissitude
@robertbristow-johnson: The promotion to int is implicit!!!! There's nothing to see. It happens automatically whether you ask for it or not, and can't be prevented.Connivent
You have a uint16_t value shifted by an int value. The uint16_t is promoted to int and then the shift happens on that int value. The result is also int.Gintz
OH!!! It's the constant 16 that is the int!!!! Shit! Crappy code and serendipitous luck. When it's shifting by a constant, I don't think of that constant as any more than 6 bits. It's not even an int.Vicissitude
No, it's not even that. rxBuf[0] << (uint16_t)16 would be just the same. EVERY expression of type uint16_t gets promoted to int. rxBuf[0] gets promoted no matter what context it appears in, BEFORE the shift ever takes place.Connivent
Nate, I think @Gintz had the answer. paddy's answer is why I am not insane.Vicissitude
paddy and I are saying the same thing.Connivent
No, I think I'm also wrong! :) I've been bitten by this stuff before too, and still don't know better.Gintz
Are chars promoted to int like that?Vicissitude
Yes, they are. Every type of rank lower than int, which is to say, every type whose range fits within that of int.Connivent
Where is that documented in the C reference manual?Vicissitude
@robertbristow-johnson See my answer.Uttasta
And when you do something like uint16_t x = 5000; uint16_t y = x << 7;, or even y = x << (uint16_t)7;, you may think you're shifting within a 16-bit value, but in the C abstract machine, you're shifting within a 32-bit value (or whatever size int happens to be), then truncating the result to 16 bits. A compiler can of course optimize this to a shift of a 16-bit object if the target machine has such an instruction, since the end result is the same, but it isn't what is being done conceptually.Connivent
And in the context of ARM, it even matches what the physical machine does. ldrh r0, [x] zero-extends into the 32-bit register. lsl r0, r0, #7 shifts the entire 32-bit register. strh r0, [y] stores the low 16 bits only. You actually computed the high 16 bits of the 32-bit result of the shift, but then ignored them. Unlike x86, which can do shr ax, 7 on a 16-bit partial register, ARM has no 16-bit arithmetic or logic instructions per se.Connivent
@robert bristow-johnson, Note that int/unsigned may be as narrow as 16-bits. Better to use uint32_t (or wider) here. uint32_t lsample = ((uint32_t) rxBuf[0]<<16)) | rxBuf[1];.Postobit
@chux-ReinstateMonica once the complete 32-bit word is assembled together (all using unsigned bit manipulation instructions), then we want to reinterpret the 32-bit word as signed. This is an audio DSP project and these samples are signed numbers. Eventually they will be cast into float and scaled by 2^(-31) so that their range is between -1.0 and +0.99999999 .Vicissitude
@robertbristow-johnson, Better to ask about how to directly turn the 2 uint16_t values to a float.Postobit
oh, no, @chux-ReinstateMonica i know how to do that. This was DMA happening in hardware that had only a 16-bit wide DMA buffer. The values coming in from the Analog-to-Digital converter are signed fixed-point numbers. So C will see them as signed integers at first. Then we'll cast them to float (which will be mathematical floats having the same value as the signed integer) and then we'll scale that float down with 2^(-31) so that the float values are all between -1.0000000 and +0.99999999 .Vicissitude
This is the most correct code, in my opinion: int32_t lsample = (int32_t)( ( (uint32_t)rxBuf[0]<<16 ) | (uint32_t)rxBuf[1] );Vicissitude
Is int definitely 32-bit on this platform?Desalinate
And that explains where the integer promotion will end up, @Desalinate . And that and this implicit promotion explains how the product worked for this video. But when packing a word together from smaller words, we should be explicit with the casts. Before I was confused as hell.Vicissitude
@Desalinate I presumed that int was 32 bits because the STM32F407 has a 32-bit wide data bus. I just believed that, without explicit casting, a 16-bit number stays 16 bits until it's in an operation mixed with a 32-bit or longer word. I did not know that, implicitly, a 16-bit short would be cast (or "promoted") to int. Not until an operation with an int.Vicissitude
U
8

What's happening here is a result of integer promotion. Generally speaking, any expression using a type smaller than int will promote that type to int.

The exact verbiage of this is spelled out in section 6.3.1.1p2 of the C standard:

The following may be used in an expression wherever an int or unsigned int may be used:

  • An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
  • A bit-field of type _Bool, int, signed int, or unsigned int.

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions

And the bitwise shifts are one such operator where this applies, as specified in section 6.5.7p3 of the C standard:

The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

So in this expression:

int lsample = (int) (rxBuf[0]<<16)|rxBuf[1];

The value of rxBuf[0] is first promoted to int before the left shift operator is applied. The value of rxBuf[1] also gets promoted to int before the bitwise OR operator is applied. So the cast actually has no effect in this case.

There is however a bug here. Assuming an int is 32 bits, if the high bit of rxBuf[0] happens to be set, then the shift will result in a bit value of 1 being shifted into the sign bit of the result. This will trigger undefined behavior as per section 6.5.7p4 of the C standard regarding the bitwise shift operators:

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The proper way to handle this is to cast the value of rxBuf[0] to uint32_t to allow the shift to work properly:

int32_t lsample = ((uint32_t)rxBuf[0]<<16)|rxBuf[1];
int32_t rsample = ((uint32_t)rxBuf[2]<<16)|rxBuf[3];
Uttasta answered 1/7 at 3:10 Comment(10)
This is the answer. Sorry dbush, I was arguing with the other guys and just didn't see your answer pop in here.Vicissitude
@robertbristow-johnson Re: your edit suggestion, I'm not sure converting the result to a signed type makes sense. Usually when bit shifts are involved the desired values are unsigned.Uttasta
It is correct. These bits being clocked into the chip on the serial bus are 24 bit signed words coming from an analog-to-digital converter. First we gotta treat all the bits as unsigned while we assemble the complete word. But once the complete word is packed together, then we want to treat the 32-bit value as signed. (Eventually it will become float.)Vicissitude
Re “There is however a bug here”: There is no bug if the value is guaranteed not to overflow (as is the case if, as OP states, the combined value fits in 24 bits) or if the C implementation defines the behavior.Telluride
There is a bug in case rxBuf[0]<<16 has a 1 in the MSB, because then it will left-shift into the sign bit of the promoted-to 32 bit int which is explicitly an undefined behavior bug.Klystron
@Uttasta I really think you should accept the edit. lsample and rsample have always meant to be signed numbers. It's just that we have to assemble the 32-bit word from 16-bit blocks of bits which, of course, should be considered unsigned. But once that word is assembled, it's a twos-complement signed number. And it must be treated as such because, in my application, it's going right away into a float. I do not want to send an unsigned to float because the floating-point value will be wrong.Vicissitude
@Klystron for the sign extension to happen, doesn't that depend on if the operand of the shift (in this case rxBuf[0]) is signed or unsigned? Or does it depend only on the integer type of the destination (in this case, lsample)? It seems to me that the only logical rule is that the sign extension depends only on the operand, not the destination. And it seems to me that the only implicit integer promotion should keep the signedness of the operand. So uint16_t should get promoted only to unsigned int if sizeof(uint16_t) is less than sizeof(int).Vicissitude
@robertbristow-johnson Sign extension refers to when you have a smaller signed & negative integer type and it gets converted to a larger one, in which case the sign is preserved. That doesn't apply here. And unfortunately the integer promotions (on a 32/64 bit system) turn your uint16_t into an int which is signed. The C language isn't very logical and this is a pretty broken part of the language. And when we have a signed int32, we may not left shift data more than 30 bits of we invoke UB. Or if the signed int32 is negative, we may not left shift data at all or we invoke UB.Klystron
That is, this is undefined behavior: uint16_t x = 0x8000; x << 16; As is uint16_t x = 0; ~x << n;. The rule of thumb is to always cast the left operand of shift into a large unsigned integer type like uint32_t.Klystron
I agree with the rule of thumb. It's just that before I have always thought that the word length stayed unchanged unless explicitly cast or implicitly cast because it was in an operation with an longer word, and then the shorter word gets properly extended to a longer word. And I thought that if the shorter word was signed, that MSB (the sign bit) would be copied into all of the left bits in the extension, independent of the signedness of the longer word. And if the shorter word was unsigned, then the left bits in the extension are all zero, independent of the signedness of longer word.Vicissitude
G
4

C implicitly promotes integer arguments of the shift operators with rank less than int to int. The (int) doesn't do anything there.

Genesa answered 1/7 at 2:46 Comment(10)
As I am learning. I did not know that C implicitly promotes integer types that are shorter than int to an int. After 4 decades of coding in C, I didn't know that.Vicissitude
It's better to promote explicitly anyway. (Though to unsigned in your case.) The reader of the code should not have to consult obscure corners of the standard to know what the code should do.Genesa
Mark, your answer is sufficiently correct, but I am giving the check mark to @Uttasta .Vicissitude
@Fredrik In all fairness it is not uncommon that even veteran programmers aren't aware of all the various implicit things in C. I've seen that plenty of times when you enforce a stricter rule set like MISRA C and then this becomes an eye-opener. Until then they've been getting along with "weird bug, but I added a cast and now everything works fine - I don't know why - shrug". And beginner classes never mention implicit promotions either: instead they are hellbent on teaching mildly useful things like recursion or printf format strings.Klystron
@Fredrik If you always do explicit casting where promotions should occur, which any good coding practice would demand, then you would neither need to know nor care about C's implicit promotions.Genesa
@Fredrik, I had always known that int and unsigned int were the natural width of the integer of the machine. Essentially the width of the data bus. Back in my day, this would be the MC68000 (16 bits), the first microprocessor that I coded C (and asm) on. long was always longer than short and int could be either of them, but not both. It this implicit promotion of integers that were shorter than int to int that I didn't know about. My self-schooling said that, in general, the word width remained constant until a mixed operation with a longer word or an explicit cast.Vicissitude
And, @Fredrik , I do always explicitly cast. It's just that I was reading someone else's code and I couldn't figure out how it could have possibly worked with the crappy casting. I was convinced that if the compiler was behaving according to spec, the 16-bit words would remain 16 bits when added or subtracted or multiplied or divided or ORed or ANDed or shifted or XORed or complemented or negated. Unless in a mixed operation with a longer word. Or explicitly cast. That was the C doctrine that I learned from the beginning.Vicissitude
@robertbristow-johnson FYI, implicit promotions from char and short to int are in the C89 standard.Genesa
Yeah, and I didn't know that @MarkAdler . Since 1982, I have always thought that the word length stayed unchanged unless explicitly cast or implicitly cast because it was in an operation with an longer word, and then the shorter word gets properly extended to a longer word. And I thought that if the shorter word was signed, that MSB (the sign bit) would be copied into all of the left bits in the extension, independent of the signedness of the longer word. And if the shorter word was unsigned, then the left bits in the extension are all zero, independent of the signedness of longer word.Vicissitude
No judgment. Just pointing out its not something they snuck in recently. Like I said, it doesn't matter at all if applying decent coding practices.Genesa

© 2022 - 2024 — McMap. All rights reserved.