Does vfmaq_f32 really have higher running accuracy?
Asked Answered
G

1

0

Does vfmaq_f32 really have higher running accuracy? I guess the accuracy of vfmaq_f32 varies depending on the length of the bit extension of the floating point processing unit in different architectures,on macos arm64,the result of running the code is consistent. Can higher or lower precision results be obtained on other architectures? Or are there compilation options that can control the accuracy of the results?

#include<arm_neon.h>
#include<iostream>
using namespace std;
int main(){
    float a = 12.3839467819;
    float b = 21.437678904;
    float c = 4171.42144;
    printf("%.17f\n",a);
    printf("%.17f\n",b);
    printf("%.17f\n",c);


    printf("%.17f\n",a+b*c);

    float32x4_t a_reg = vdupq_n_f32(a);
    float32x4_t b_reg = vdupq_n_f32(b);
    float32x4_t c_reg = vdupq_n_f32(c);
    float32x4_t res_reg = vfmaq_f32(a_reg, b_reg, c_reg);
    float res[4] = {0.f};
    vst1q_f32(res,res_reg);
    printf("%.17f\n",res[0]);


    res_reg = vmlaq_f32(a_reg, b_reg, c_reg);
    vst1q_f32(res,res_reg);
    printf("%.17f\n",res[0]);


    res_reg = vmulq_f32(b_reg, c_reg);
    res_reg = vaddq_f32(res_reg, a_reg);
    vst1q_f32(res,res_reg);
    printf("%.17f\n",res[0]);
    return 0;
}
Galway answered 14/9, 2023 at 7:50 Comment(4)
Please, don't spam tags. With #include <iostream> and using namespace std, this is definitely no C source code (or an essential syntax error). Btw. as you are new, it couldn't hurt to take the tour and read How to Ask.Meagre
Point of that accumulating NEON asm instruction is optimization - you do one operation instead of two. Unless somethign is wrong with implementation. That is used alot when writing some impulse filters. All NEON instructions are pretty IEEE754-compliant. Doing one addition at time cannot decrease accumulation losses, you need at least three values to be summed at same time to get any diffrence.Modena
In general contracted floating point operations are not IEEE754 strict mode compliant. Not checked with NEON, but they are certainly not for x86.Margiemargin
@Scheff's Cat Thank you very much for your guidance on questions, which I will focus on in future questionsGalway
M
1

In general merged floating point operations can maintain a higher running precision. The fused multiply accumulate and dot product operations are the two that most commonly show up in instruction sets. There is no guarantee that the output of these operations is consistent across CPU architectures.

When compiling normal C code (i.e. no intrinsics) for "strict" IEEE floating point compliance the compiler must not generate these contracted operations, as they are not conformant to the specification.

Margiemargin answered 14/9, 2023 at 18:17 Comment(7)
Off-topic: in shared pseudocode of FPRoundInt() there is "If EXACT is TRUE, set FPSR.IXC if result is not numerically equal to op." comment. Here the EXACT should probably be INEXACT (and the boolean exact should probably be boolean inexact). Consider raising an issue with the spec team.Grampositive
Off-topic: Arm® Architecture Reference Manual for A-profile architecture (ARM DDI 0487J.a (ID042523)) uses both FP16 ("FEAT_FP16, Half-precision floating-point data processing") and FPHP ("Floating Point Half Precision"), which seems to refer to the same thing. Why using both FP16 and FPHP instead of a single one (e.g. FPHP)? Consider raising an issue with the spec team.Grampositive
For the FPRoundInt() case, the spec is correct as far as I read it. (I.e. if in exact mode, set FPSR.IXC if the result is not exact).Margiemargin
For the FPHP case, the register field covers more than just FEAT_FP16, so they are not directly analogous. I suspect the naming is deliberately different to avoid the confusion. The FPHP field can indicate an implementation with no half-float support, an implementation with fp16 conversions, or an implementation with fp16 conversions and fp16 data processing. Only the latter requires FEAT_FP16.Margiemargin
FYI: In my case cat /proc/cpuinfo reported fphp instead of fp16. Initially I was confused because the manual uses FEAT_FP16 and not FEAT_FPHP. Then I understood that cpuinfo is not required to use feature names from the manual. However, how does a user know the exact meaning of fphp feature reported by cat /proc/cpuinfo? Does the fphp indicate "support for fp16 conversions and fp16 data processing" OR "support for fp16 conversions" only? Any ideas / comments?Grampositive
See docs.kernel.org/arch/arm64/elf_hwcaps.htmlMargiemargin
"HWCAP_FPHP = Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001."Margiemargin

© 2022 - 2024 — McMap. All rights reserved.