One way of doing this could be to manually instrument each instruction with a counting instruction. There are several ways of doing this -
You could modify the Instruction emitter part of any open source compiler (gcc/LLVM) to emit a counting instruction before every instruction. I can add to the answer the exact way of doing this in LLVM if you are interested. But I believe that the second method I am giving here will be easier to implement and will work across most compilers.
You can instrument the instructions post compilation. Most compilers provide the option to generate readable assembly instead of the object files. The flag for gcc/clang is -S
.
For the following program
#include <stdio.h>
int main_real(int argc, char* argv[]) {
printf("hello world\n");
return 0;
}
my compiler produces the following .s
file -
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 14
.globl _main_real ## -- Begin function main
.p2align 4, 0x90
_main_real: ## @main_real
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $32, %rsp
leaq L_.str(%rip), %rax
movl $0, -4(%rbp)
movl %edi, -8(%rbp)
movq %rsi, -16(%rbp)
movq %rax, %rdi
movb $0, %al
callq _printf
xorl %ecx, %ecx
movl %eax, -20(%rbp) ## 4-byte Spill
movl %ecx, %eax
addq $32, %rsp
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "hello world\n"
.subsections_via_symbols
It is easy to see here that everything that starts with <tab>
not followed by a .
is an instruction.
Now we have to simple program that finds all such instructions and instrument them. You can do this easily with perl
.
But before we actually instrument the code, we have to figure out an appropriate instrumenting instruction. This will depend a lot on the architecture and the target operating system. So I will provide an example for X86_64.
It is clear why we need to instrument BEFORE the instructions rather than AFTER them, so as to also count the branching instructions.
Assuming a global variables __r13_save
and __instruction_counter
initialized to zero, we can insert the instruction -
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
As you can see we have used the rip
relative addressing mode, which should be fine for most programs that a beginner writes (bigger programs might have issues).
We have used leaq
here instead of incq
to avoid clobbering the flags that are used by the program for control flow. (As suggested by @PeterCordes in the comments.)
This instrumentation also works correctly for single threaded programs since we are using a global counter for instructions and stashing away the %r13
register. For extending the above for multithreaded program, one will have to use thread local storage and instrument the thread creation functions too.
Also, the variables __r13_save
and __instruction_counter
are frequently accessed and should always be in the L1 cache, making this instrumentation not that costly.
Now to instrument the instructions we use perl as -
cat input.s | perl -pe 's/^(\t[^.])/\tmovq %r13, __r13_save(%rip)\n\tmovq __instruction_counter(%rip), %r13\n\tleaq 1(%r13), %r13\n\tmovq %r13, __instruction_counter(%rip)\n\tmovq %r13, __r13_save(%rip)\n\1/' > output.s
For the above sample program this generates
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 14
.globl _main_real ## -- Begin function main_real
.p2align 4, 0x90
_main_real: ## @main_real
.cfi_startproc
## %bb.0:
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
subq $32, %rsp
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
leaq L_.str(%rip), %rax
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
movl %edi, -4(%rbp)
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
movq %rsi, -16(%rbp)
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
movq %rax, %rdi
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
movb $0, %al
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
callq _printf
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
xorl %ecx, %ecx
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
movl %eax, -20(%rbp) ## 4-byte Spill
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
movl %ecx, %eax
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
addq $32, %rsp
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
popq %rbp
movq %r13, __r13_save(%rip)
movq __instruction_counter(%rip), %r13
leaq 1(%r13), %r13
movq %r13, __instruction_counter(%rip)
movq %r13, __r13_save(%rip)
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "hello world\n"
.subsections_via_symbols
Now we also need to create this variable somewhere. This can be done by creating a simple c wrapper.c as -
#include <stdio.h>
long long int __instruction_counter;
long long int __r13_save;
int main_real(int, char* []);
int main(int argc, char* argv[]) {
int ret = main_real(argc, argv);
printf("Total instructions = %lld\n", __instruction_counter);
return ret;
}
You might see the function main_real
. So in your actual program you have to create a main_real
instead of main
.
Finally link everything up as -
clang output.s wrapper.c -o a.out
and execute your program. Your code should run normally and print the instruction count before it exits.
You might have to take care of name mangling of the __instruction_counter
variable. For some ABIs the compiler adds an extra _
at the beginning. In that case you will have to add an extra _
to the perl command. You can check the exact name for the variable by also generating the assembly for the wrapper.
On running the above example I get -
hello world
Total instructions = 15
Which matches the exact number of instruction our function has.
You might have noticed that this counts only the number of instructions in the code you have written and compiled. Not in the printf
function for instance. That is usually a difficult problem to tackle with static instrumentation.
One caveat here is that your program has to exit "normally" i.e. by returning from main
. If it calls exit
or abort
, you will not be able to see the instruction count. You can also provide an instrumented version of exit
and abort
to solve that problem.
With a compiler based approach this can be made more efficient by adding a single addq
instruction for each basic block with the parameter being the number of instruction that BB has, since once the control flow enters a basic block, it is bound to go through it.
gdb
in single step mode. You can divert the output to a file and feed itsi
[step single ISA instruction] commands until it exits. From the output file, you can see what happened. To feed it manysi
commands, you could write a control program that runsgdb
under two pipes. The control program sendssi
togdb
and grabs the output ofgdb
, counting the instructions. To get things started, you'll probably need to start withb _start
, thenrun
, thensi
... – Sphalerite10000si
and then, if the program completes, check the counter to see how many remain. But executing in single-step mode is generally very slow. – Mathias