inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch
Asked Answered
S

2

27

I am trying to compile a C program using cmake which uses SIMD intrinsics. When I try to compile it, I get two errors

/usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:326:1: error: inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch _mm_mullo_epi32 (__m128i __X, __m128i __Y)

/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘_mm_shuffle_epi8’: target specific option mismatch _mm_shuffle_epi8 (__m128i __X, __m128i __Y)

This issue has already been solved here StackOverflow by setting

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1")

I try the very same and many other options. But my project still fails to compile.

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -sse4_1")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=nehalem")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1 -msse4.2")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ssse3")  
Sidonnie answered 30/3, 2017 at 21:29 Comment(1)
Related re: how GCC and clang handle target options, vs. MSVC allowing you to use intrinsics for extensions you haven't told the compiler it can use on its own: The Effect of Architecture When Using SSE / AVX Intrinisics /Radnorshire
L
20

Since you are compiling C code, not C++, you need:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse4.1")

You can get rid of all the other -march XXX and -msseXXX settings.

If you're using a mix of C and C++ then you could also add:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1")
Lido answered 30/3, 2017 at 21:35 Comment(2)
I had to add also -maes or ti did not work for me set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1 -maes")Omor
Or better, use -march=native if compiling for your own machine. That will enable everything your CPU has, and set tuning options.Radnorshire
E
26

A general method to find the instruction switch for gcc

File intrin.sh:

#!/bin/bash

get_instruction ()
{
    [ -z "$1" ] && exit
    func_name="$1[^1-9a-zA-Z_]"

    header_file=`grep --include=\*intrin.h -Rl "$func_name" /usr/lib/gcc | head -n1`
    [ -z "$header_file" ] && exit
    >&2 echo "found in: $header_file"

    target_directive=`grep "#pragma GCC target(\|$func_name" $header_file | grep -B 1 "$func_name" | head -n1`
    echo $target_directive | grep -o '"[^,]*[,"]' | sed 's/"//g' | sed 's/,//g'
}

instruction=`get_instruction $1`
if [ -z "$instruction" ]; then
    echo "Error: function not found: $1"
else
    echo "add this option to gcc: -m$instruction"
fi

Usage:

./intrin.sh _mm_shuffle_epi8      # output: -mssse3
./intrin.sh _mm_cvtepu8_epi32     # output: -msse4.1
./intrin.sh _mm_loadu_ps          # output: -msse
./intrin.sh _mm_clmulepi64_si128  # output: -mpclmul
./intrin.sh _mm256_loadu_si256    # output: -mavx
./intrin.sh _mm512_and_ps         # output: -mavx512dq
./intrin.sh _mm_shl_epi8          # output: -mxop
Etheleneethelin answered 2/9, 2020 at 18:56 Comment(4)
Note that it's usually a good idea to use something like -march=haswell, not just -mavx2 -mfma. Or at least add -mtune=znver2 (Zen 2) or something onto your -m ISA options. The "generic" tuning can be pretty poor for possibly-unaligned 256-bit vectors, especially when your data is usually aligned at runtime but the compiler just doesn't know that. See Why doesn't gcc resolve _mm256_loadu_pd as single vmovupd?. Or if you want to make a binary for your own machine, -march=native.Radnorshire
Excellent answer!Helladic
This wont work for some functions, like _mm_shl_epi8, as function definition in include file is directly followed by an opening parenthesis, and not a space. Possible fix: in get_instruction (), replace func_name="$1 " with func_name="$1[^[:alnum:]]".Men
@Men Thank you for the reminder, it has been corrected.Etheleneethelin
L
20

Since you are compiling C code, not C++, you need:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse4.1")

You can get rid of all the other -march XXX and -msseXXX settings.

If you're using a mix of C and C++ then you could also add:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1")
Lido answered 30/3, 2017 at 21:35 Comment(2)
I had to add also -maes or ti did not work for me set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1 -maes")Omor
Or better, use -march=native if compiling for your own machine. That will enable everything your CPU has, and set tuning options.Radnorshire

© 2022 - 2024 — McMap. All rights reserved.