I'm supposed to patch an existing Instruction Set Test, that doesn't test all the used instructions. So I need to look at the asm file of one level of code and find out what C code causes the instruction to happen so I can use it in my patch.
Your goal is insane, and the first half of your question is backwards / only loosely related to your real problem.
There might be a way to convince your compiler to use each specific instruction you want it to, but it will be specific to your compiler version, options, and all the surrounding code including potentially constants in header files.
If you want to test all the instructions in an ISA, hoping that you can convince a C compiler to generate them somehow is totally the wrong approach. You want your test to keep testing the same thing in the future, so you should . If you need specific asm, write in asm.
This is the same question asked a couple weeks ago for ARM: How to force IAR to use desired Cortex-M0+ instructions (optimization will be disabled for this func.) except that you say you're going to build with optimization enabled (which may make it easier to get a wider range of instruction generated: some may only be used as peephole optimizations over the simple normal code-gen).
Also, starting with asm and reversing that into equivalent C is no guarantee that the compiler will choose that instruction when compiling, so the question title is only loosely related to your real problem.
If you do still want to hand-hold a compiler into generating specific asm, to create brittle source code that may only do what you want with very specific compiler / version / options, the first step would be to think "when would this instruction be part of an optimized way of doing something?".
Usually this line of thinking is more useful for optimizing by tweaking the source to compile more efficiently. First you think about an efficient asm implementation of function you're writing. Then you write your C or C++ source the same way, i.e. using the same temporaries you hope the compiler will use. For an example, see What is the efficient way to count set bits at a position or lower? where I was able to hand-hold gcc into using a more efficient sequence of instructions, like clang was doing for my first attempt.
Sometimes this can work well; for your purposes it's simple when the instruction-set only has one really good way to do something. e.g. ld.bu
looks like a byte-load with zero extension (u
for unsigned) into a full register. unsigned foo(unsigned char*p) {return *p;}
should compile to that, and you can use a noinline
attribute to stop it from optimizing away.
But insert
, if that's inserting a zero-bit into a bitfield, could just as easily have been an and
with ~1
(0xFE), assuming TriCore has and-immediate. If insert
has a non-immediate form, that is probably the most efficient option for single-bit bitfield = rand()
(or any value that's still not a compile-time constant after optimization with constant-propagation).
For TriCores' packed arithmetic (SIMD) instructions, you're going to need the compiler to auto-vectorize, or use an intrinsic.
There might well be some instructions in the ISA that your compiler will never emit. Although I think you're only trying to test the instructions that the compiler does emit in other parts of your code? You say "all the used instructions", not "all the instructions", so that at least guarantees that the task is possible.
A non-inline function with an arg is an excellent way to force code-gen for run-time variables. Those of use who look at compiler-generated asm frequently write small functions that take args and return a value (or store to a global or volatile
) to force the compile to generate code for something without discarding the result, and without constant-propagation turning the whole function into return 42;
, i.e. a mov
-immediate / ret
. See How to remove "noise" from GCC/clang assembly output? for more about that, and also Matt Godbolt's CppCon2017 talk: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” for some great beginner intro to reading compiler-generated asm, and what kind of stuff modern optimizing compilers do for small functions.
Assigning to a volatile
and then reading that variable would be another way to defeat constant-propagation even for a test that needs to run without external inputs, if that's easier than using noinline functions. (Compilers have re-load from a volatile
for every separate time it's read in the C source, i.e. they have to assume it can be asynchronously modified.)
int main(void) {
volatile int vtmp = 123;
int my_parameter = vtmp;
... then use my_parameter, not vtmp, so CSE and other optimizations can still work
}
[...] It's optimized
The compiler output you show definitely doesn't look optimized. It looks like load / set a bit / store, then load / clear a bit / store, which should have optimized down to just load / clear the bit / store. Unless those asm blocks weren't really contiguous, and you're showing code from two different blocks pasted together.
Also, InsertStruct.SomeMember = 0x0u;
is an incomplete description: it obviously depends on the struct definition; I assume you used an int SomeMember :1;
single-bit bitfield member? According to this TriCore ISA ref manual I found, insert
copies a range of bits from one register to another, at a specified insert position, and comes in register and immediate source form.
Replacing a whole byte could just be a store instead of a read/modify/write. So the key here is the struct definition, not just the statement that compiled to the instruction.
x / 15
by multiplication, or removing whole loop summing values by calculating the result directly, etc... If you would try to reconstruct C source from such assembly, you would end with completely different source (algorithm-wise). – Claudiaclaudian