Find which assembly instruction caused an Illegal Instruction error without debugging
Asked Answered
P

7

43

While running a program I've written in assembly, I get Illegal instruction error. Is there a way to know which instruction is causing the error, without debugging that is, because the machine I'm running on does not have a debugger or any developement system. In other words, I compile in one machine and run on another. I cannot test my program on the machine I'm compiling because they don't support SSE4.2. The machine I'm running the program on does support SSE4.2 instructions nevertheless.

I think it maybe because I need to tell the assembler (YASM) to recognize the SSE4.2 instructions, just like we do with gcc by passing it the -msse4.2 flag. Or do you think its not the reason? Any idea how to tell YASM to recognize SSE4.2 instructions?

Maybe I should trap the SIGILL signal and then decode the SA_SIGINFO to see what kind of illegal operation the program does.

Perspicacious answered 27/4, 2012 at 16:11 Comment(1)
YASM does recognize SSE4.2 instructions, so this is not the problem. Are you sure your machine does support SSE4.2? What hardware is it exactly? You could run the program in an emulator, valgrind (which does support the subset of SSE4.2 which is used in glibc and gcc) probably would work.Bloodandthunder
H
37

Actually often you get an illegal instruction error not because your program contain an illegal opcode but because there is a bug in your program (e.g., a buffer overflow) that makes your program jumps in a random address with plain data or in code but not in the start of the opcode.

Hie answered 27/4, 2012 at 16:15 Comment(1)
Like a missing returnQuirinus
H
57

Recently I experienced a crash due to a 132 exit status code (128 + 4: program interrupted by a signal + illegal instruction signal). Here's how I figured out what instruction was causing the crash.

First, I enabled core dumps:

$ ulimit -c unlimited

Interestingly, the folder from where I was running the binary contained a folder named core. I had to tell Linux to add the PID to the core dump:

$ sudo sysctl -w kernel.core_uses_pid=1

Then I run my program and got a core named core.23650. I loaded the binary and the core with gdb.

$ gdb program core.23650

Once I got into gdb, it showed up the following information:

Program terminated with signal SIGILL, Illegal instruction.
#0  0x00007f58e9efd019 in ?? ()

That means my program crashed due to an illegal instruction at 0x00007f58e9efd019 address memory. Then I switched to asm layout to check the last instruction executed:

(gdb) layout asm
>|0x7f58e9efd019  vpmaskmovd (%r8),%ymm15,%ymm0
 |0x7f58e9efd01e  vpmaskmovd %ymm0,%ymm15,(%rdi)
 |0x7f58e9efd023  add    $0x4,%rdi
 |0x7f58e9efd027  add    $0x0,%rdi

It was instruction vpmaskmovd that caused the error. Apparently, I was trying to run a program aimed for AVX2 architecture on a system which lacks support for AVX2 instruction set.

$ cat /proc/cpuinfo | grep avx2

Lastly, I confirmed vpmaskmovd is an AVX2 only instruction.

Hartzel answered 24/10, 2016 at 17:1 Comment(4)
I realized my answer doesn't fully meet one of the question requirements: "determining the faulting instruction without using debugging tools" :/ Still, I think the answer can be useful for other users so I prefer to leave it. In addition, as Michael Burr commented, it could be possible to pull the core dump from the target machine into the build machine and debug the core from there (where debugging tools are available) by setting a different target architecture "(gdb) set architecture <target-arch>".Hartzel
I'd add, if it's your program that crashes, build with '-ggdb' before following the steps aboveHenn
Had a similar problem with a program that tried to use PMADDUBSW on my SSE2-only VM, thanks :)Nimiety
Thanks, this helped me find a SIMD instruction on a non-SIMD armv7 architecture!Replenish
H
37

Actually often you get an illegal instruction error not because your program contain an illegal opcode but because there is a bug in your program (e.g., a buffer overflow) that makes your program jumps in a random address with plain data or in code but not in the start of the opcode.

Hie answered 27/4, 2012 at 16:15 Comment(1)
Like a missing returnQuirinus
S
11

If you can enable core dumps on that system, just run the program, let it crash, then pull the core dump off the target machine onto your development machine and load it into a GDB built to debug the target architecture - that should tell you exactly where the crash occurred. Just use GDB's core command to load the core file into the debugger.

  • To enable core dumps on the target:

    ulimit -c unlimited
    
  • pseudo-files that control how the core file will be named (cat these to see the current configuration, write to them to change the configuration):

    /proc/sys/kernel/core_pattern
    /proc/sys/kernel/core_uses_pid
    

On my system, once core dumps are enabled, a crashing program will write a file simply named "core" in the working directory. That's probably good enough for your purposes, but changing how the core dump file is named lets you keep a history of core dumps if that's necessary (maybe for a more intermittent problem).

Sextan answered 27/4, 2012 at 18:8 Comment(2)
Linux boxes I've used all drop cores called core.$PID.Loria
@Warren: my Ubuntu box (and the embedded build we have) default to a file simply named core for some reason.Sextan
C
4

Well ... You can of course insert trace printouts, so you can quickly rule out large areas of the code. Once you've done that, run e.g.

$ objdump --disassemble my-crashing-program | less

Then jump to e.g. the function you know is causing the error, and read the code, looking for anything that looks odd.

I'm not totally sure how objdump displays illegal instructions, but they should stand out.

Compressive answered 27/4, 2012 at 16:13 Comment(0)
G
4

For handwritten assembly I would suspect a stack management problem resulting in a return-to-nowhere. Write a debugging printout routine that saves every register and insert a call to it at the top of every function.

Then you will see how far you get...

(BTW, a good editor and a good understanding of the assembler's macro syntax are lifesavers when writing machine code.)

Geostatic answered 27/4, 2012 at 16:22 Comment(3)
I suspect the assembler requires me to explicitly specify that I'm using SSE4.2 instruction, just like gcc requires to pass the -msse4.2 flag.Perspicacious
But enabling instructions in the assembler just changes the allowed syntax. It wouldn't be the difference between trapping and not trapping, I imagine.Geostatic
@user1018562 No. If the assembler encounters an instruction not allowed for the target architecture, it would error out - but only during compile time. If an error exists at runtime, but not during compile time, the opposite is true - the assembler emits instructions, which the target architecture doesn't understand. So if anything, you need to tell the compiler to not emit SSE instructions.Bloodandthunder
Q
4

Missing a return statement at the end of a function can cause this.

Quirinus answered 23/7, 2018 at 15:7 Comment(1)
this program is written in assembly, not c. instead of return X;, you'd write mov eax, X retPernicious
E
0

To investigate an Illegal instruction case make following steps

# Start your application via GDB
gdb --args your_app your_app_arg1 your_app_arg2 ... your_app_argN

# inside GDB run program to reproduce "Illegal instruction"
(gdb) run
Starting program: /path/to/your_app your_app_arg1 your_app_arg2 ... your_app_argN
...
Program received signal SIGILL, Illegal instruction.
0x00007ffff7fe6298 in _dl_start () from /lib64/ld-linux-x86-64.so.2

# at final step see an ASM command
layout asm
>0x7ffff7fe6298 <_dl_start+376> vmovq %rsi,%xmm2

At this point we found root cause of "Illegal instruction" - vmovq

Ensanguine answered 20/5 at 6:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.