c++ AVX512 intrinsic equivalent of _mm256_broadcast_ss()?
Asked Answered
A

1

5

I'm rewriting a code from AVX2 to AVX512.

What's the equivalent I can use to broadcast a single float number to a _mm512 vector? In AVX2 it is _mm256_broadcast_ss() but I can't find something like _mm512_broadcast_ss().

Aggappe answered 17/1, 2020 at 14:22 Comment(5)
Could this be what you're after? #59129302Glenglencoe
@Glenglencoe Yes! thanks. For _mm256_broadcast_ss() I read it's kinda faster than set1 but in this app I don't care about performance here anyway.Aggappe
I'll add it as an answer to make it more clear, perhaps you can let people know it is rightGlenglencoe
Consider also: _mm512_broadcastss_psWitchy
@PaulR That function takes a "__m128" type of input, I'm out of mental juice to do the extra conversion of input typeAggappe
G
7

AVX512 doesn't need a special intrinsic for the memory source version1. You can simply use _mm512_set1_ps (which takes a float, not a float*). The compiler should use a memory-source broadcast if that's efficient. (Potentially even folded into a broadcast memory source for an ALU instruction instead of a separate load; AVX512 can do that for 512-bit vectors.)

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_set1_ps&expand=5236,4980


Footnote 1: The reason for _mm256_broadcast_ss even existing separately from _mm256_set1_ps is probably because of AVX1 vbroadcastss ymm, [mem] vs. AVX2 vbroadcastss ymm, xmm. Some compilers like MSVC and ICC let you use intrinsics without enabling the ISA extensions for the compiler to use anywhere, so there needed to be an intrinsic for only the AVX1 memory-source version specifically.

With AVX512, both memory and register source forms were introduced with AVX512F so there's no need to give users of those compilers a way to micro-manage which asm is allowed.

Glenglencoe answered 17/1, 2020 at 14:36 Comment(4)
I just noticed another difference: _mm512_set1_ps() takes a float value while _mm256_broadcast_ss() takes a float pointer. I can definitely use _mm512_set1_ps() although they are not quite the same.Aggappe
Maybe you can find something on that Intel link that is more correct?Glenglencoe
No. I guess they forgot to implement that.Aggappe
@Noob: You can use _mm256_set1_ps( *ptr ) with AVX1 as well; I'm not sure why _mm256_broadcast_ss even exists. Maybe because of some compilers like MSVC that never optimize intrinsics and don't let you avoid AVX2 instructions with command line options? So you can use _mm256_broadcast_ss to make sure you get the AVX1 memory-source version, and _mm256_set1_ps to also allow the AVX2 register source vbroadcastss ymm, xmm version, whichever is convenient for the compiler? Anyway, fortunately AVX512 introduced both mem and reg source versions with the same extension.Bencher

© 2022 - 2024 — McMap. All rights reserved.