printf gets stuck in an infinite loop with AL = 10 on x86-64 Linux with older gcc
Asked Answered
D

1

2

Very simple assembly introduction code.
Seems to compile ok through gcc -o prog1 prog1.s, then ./prog1 just skips a line and shows nothing, like waiting an input the code doesn't ask. What's wrong?
Using gcc (Debian 4.7.2-5) 4.7.2 in 64-bit gNewSense running on VMware. Code:

/*
int nums[] = {10, -21, -30, 45};
int main() {
  int i, *p;
  for (i = 0, p = nums; i != 4; i++, p++)
    printf("%d\n", *p);
  return 0;
}
*/

.data
nums:  .int  10, -21, -30, 45
Sf:  .string "%d\n"    # string de formato para printf

.text
.globl  main
main:

/********************************************************/
/* mantenha este trecho aqui e nao mexa - prologo !!!   */
  pushq   %rbp
  movq    %rsp, %rbp
  subq    $16, %rsp
  movq    %rbx, -8(%rbp)
  movq    %r12, -16(%rbp)
/********************************************************/

  movl  $0, %ebx  /* ebx = 0; */
  movq  $nums, %r12  /* r12 = &nums */

L1:
  cmpl  $4, %ebx  /* if (ebx == 4) ? */
  je  L2          /* goto L2 */

  movl  (%r12), %eax    /* eax = *r12 */

/*************************************************************/
/* este trecho imprime o valor de %eax (estraga %eax)  */
  movq    $Sf, %rdi    /* primeiro parametro (ponteiro)*/
  movl    %eax, %esi   /* segundo parametro  (inteiro) */
  call  printf       /* chama a funcao da biblioteca */
/*************************************************************/

  addl  $1, %ebx  /* ebx += 1; */
  addq  $4, %r12  /* r12 += 4; */
  jmp  L1         /* goto L1; */

L2:  
/***************************************************************/
/* mantenha este trecho aqui e nao mexa - finalizacao!!!!      */
  movq  $0, %rax  /* rax = 0  (valor de retorno) */
  movq  -8(%rbp), %rbx
  movq  -16(%rbp), %r12
  leave
  ret      
/***************************************************************/
Drier answered 5/5, 2020 at 20:26 Comment(19)
It would make things a great deal easier if you could translate the comments to English and explain what sort of output you expect (I suppose the same output as the C program you listed above).Tobi
For me, it works like the C code in your comment does. Are you sure you're compiling and running what you think you are?Freeway
@Tobi You edited right. The portuguese comments are basic explanations/don't change this.Drier
@JosephSible-ReinstateMonica Yes I am, as the commands indicate.Drier
@Drier You should double-check that. As it stands, your problem isn't reproducible.Freeway
@JosephSible-ReinstateMonica I'm x-checking that for hours.That's how I finally gave up and came here.Drier
If you compile and run your C code, does it work the way you expect? If not, then that points to some problem with your system.Freeway
Yes I runned C on it the entire month, this problem only happened right now with assembly.Drier
You should zero %al before call printf as you don't use any SSE registers for arguments. Still, that is unlikely to cause this problem. You could try running the program through strace or of course use a debugger.Tobitobiah
@Tobitobiah after gcc -Wall -g prog1.s, gdb a.out, layout next, run + ^C: 0x00007ffff7a9e1d0 <printf+64> jmpq *%rax highlighted. In regular terminal: Program received signal SIGINT, Interrupt. 0x00007ffff7a9e1d0 in printf () from /lib/x86_64-linux-gnu/libc.so.6 Now what?Drier
That is very interesting. What is p/a $rax? If that points back to itself for whatever reason, then it would be an endless loop.Tobitobiah
A infinite loop is precisely what I suspect. Sorry I don't know what you mean by p/a but %rax is where the '0' return value of the main function is stored. If $rax refers to the memory address associated to it I SUPPOSE it's the mentioned above. Btw ran other assembly code slightly different and it's all good with the new one.Drier
I meant in gdb when you are stopped the the jmpq do a p/a $rax to see the value.Tobitobiah
Program received signal SIGINT, Interrupt. 0x00007ffff7a9e1d0 in printf () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) p/a $rax p/a $rax $1 = 0x7ffff7a9e1ca <printf+58>Drier
Ahha yeah, that's pointing to just before the jmp so it's an endless loop. Very strange.Tobitobiah
Yeah... and just rolled smooth and peachy in onlineGDB right now. Guess we have a OS or VM stranger thing here. Not my thing at the moment, but thank you very much for the inputs anyhow. Learned some indirectly.Drier
Wait, I just tried it in a gNewSense 4 VM, and I can reproduce the problem there. I may just be able to figure this out after all.Freeway
@joseph Was about to redirect the answer but, yeah great.Drier
@Tobitobiah was right about needing to zero %al. Do that and it works. Full answer and explanation coming shortly.Freeway
F
3

tl;dr: do xorl %eax, %eax before call printf.

printf is a varargs function. Here's what the System V AMD64 ABI has to say about varargs functions:

For calls that may call functions that use varargs or stdargs (prototype-less calls or calls to functions containing ellipsis (. . . ) in the declaration) %al18 is used as hidden argument to specify the number of vector registers used. The contents of %al do not need to match exactly the number of registers, but must be an upper bound on the number of vector registers used and is in the range 0–8 inclusive.

You broke that rule. You'll see that the first time your code calls printf, %al is 10, which is more than the upper bound of 8. On your gNewSense system, here's a disassembly of the beginning of printf:

printf:
   sub    $0xd8,%rsp
   movzbl %al,%eax                # rax = al;
   mov    %rdx,0x30(%rsp)
   lea    0x0(,%rax,4),%rdx       # rdx = rax * 4;
   lea    after_movaps(%rip),%rax # rax = &&after_movaps;
   mov    %rsi,0x28(%rsp)
   mov    %rcx,0x38(%rsp)
   mov    %rdi,%rsi
   sub    %rdx,%rax               # rax -= rdx;
   lea    0xcf(%rsp),%rdx
   mov    %r8,0x40(%rsp)
   mov    %r9,0x48(%rsp)
   jmpq   *%rax                   # goto *rax;
   movaps %xmm7,-0xf(%rdx)
   movaps %xmm6,-0x1f(%rdx)
   movaps %xmm5,-0x2f(%rdx)
   movaps %xmm4,-0x3f(%rdx)
   movaps %xmm3,-0x4f(%rdx)
   movaps %xmm2,-0x5f(%rdx)
   movaps %xmm1,-0x6f(%rdx)
   movaps %xmm0,-0x7f(%rdx)
after_movaps:
   # nothing past here is relevant for your problem

A quasi-C translation of the important bits is goto *(&&after_movaps - al * 4); (see Labels as Values). For efficiency, gcc and/or glibc didn't want to save more vector registers than you used, and it also doesn't want to do a bunch of conditional branches. Each instruction to save a vector register is 4 bytes, so it takes the end of the vector register saving instructions, subtracts al * 4 bytes, and jumps there. This results in just enough of the instructions executing. Since you had more than 8, it ended up jumping too far back, and landing before the jump instruction it just took, thus creating an infinite loop.

As for why it's not reproducible on modern systems, here's a disassembly of the beginning of their printf:

printf:
   sub    $0xd8,%rsp
   mov    %rdi,%r10
   mov    %rsi,0x28(%rsp)
   mov    %rdx,0x30(%rsp)
   mov    %rcx,0x38(%rsp)
   mov    %r8,0x40(%rsp)
   mov    %r9,0x48(%rsp)
   test   %al,%al          # if(!al)
   je     after_movaps     # goto after_movaps;
   movaps %xmm0,0x50(%rsp)
   movaps %xmm1,0x60(%rsp)
   movaps %xmm2,0x70(%rsp)
   movaps %xmm3,0x80(%rsp)
   movaps %xmm4,0x90(%rsp)
   movaps %xmm5,0xa0(%rsp)
   movaps %xmm6,0xb0(%rsp)
   movaps %xmm7,0xc0(%rsp)
after_movaps:
   # nothing past here is relevant for your problem

A quasi-C translation of the important bits is if(!al) goto after_movaps;. Why did this change? My guess is Spectre. The mitigations for Spectre make indirect jumps really slow, so it's no longer worth doing that trick. Or not; see comments. Instead, they do a much simpler check: if there's any vector registers, then save them all. With this code, your bad value of al isn't a disaster, since it just means the vector registers will be unnecessarily copied.

Freeway answered 6/5, 2020 at 0:33 Comment(11)
The mitigations for Spectre make indirect jumps really slow - only slow if you armor them with lfence or something, which GCC doesn't do in general by default. I think this change predated Spectre; probably just because indirect branches are harder to predict, and FP printf is rare enough than dumping extra registers when you have one FP arg doesn't have much cost. (Especially on modern CPUs with good OoO exec and large store buffers.) Interesting discovery; I didn't know gcc variadic code-gen every did anything other than check AL!=0.Sanalda
Another effect of this is that a bogus AL can't crash by jumping too far. So it's more robust against buggy hand-written code. IDK if that was any motivation at all. It also saves instructions in the no-FP fast path, just test %al,%a / jz instead of multiple ALU instructions to calculate a jump target. Seems like a good change to me regardless of Spectre.Sanalda
The TL;DR line worked indeed. An interesting follow up is that the slightly different program onlinegdb.com/r1Yd5py9I when with a greater than 8 value to be printed (by adding 5 to any of the summed values) it goes invalid operation instead of infinite loop this time. I wonder why.Drier
@Drier Since the problem is it's jumping wildly, with values other than 10, it's probably ending up jumping to halfway inside of some instruction that doesn't happen to be some other valid instruction, and is thus getting SIGILL Illegal Instruction.Freeway
So, to wrap it up, we have a gNewSense issue here? Because in onlineGDB and in my colleagues/teacher Fedora it works just fine.Drier
@Drier No, it's not an issue with gNewSense. It was an issue with your code. Your code broke one of the rules of the ABI, and it just so happens that newer systems are more lenient about the rule you broke than older ones are (i.e., on newer systems it's just slightly slower instead of completely broken).Freeway
@Ajna: It's not rare for buggy asm code to work by accident / happen to work. Other ABI violations like modifying a call-preserved register also often don't cause a problem with simple callers, but will break other code. Throwing code at the wall and seeing what sticks works even less well in asm than in other languages. Don't depend on trial and error. (Although it can find things that definitely don't work, e.g. like here where it breaks on one test system.)Sanalda
@PeterCordes I don't believe a top 3 national and top 1 private computer science college code would be throwing code at the wall or depend in trial and error, but ok, noted.Drier
@Ajna: Is that where the ABI-violating code in the question was from? You didn't say that until now, but I guess that explains why you kept thinking it must be a bug in gNewSense even after the bug in that code was explained. Bugs do happen by accident even when you know what you're doing and just forget something. For the same reasons intentional trial and error is unsafe, it's easy to miss such bugs when testing on systems where it happens to work. Often a good idea to start with or compare against C compiler output; compilers don't make mistakes in following the calling convention.Sanalda
@PeterCordes fun fact: the code in the exercise following this one has right before call printf a new 'movl $0 %eax' attached to it :PDrier
@JosephSible-ReinstateMonica: Re: efficiency advantages of the test/jz way over the computed-jump way: I wrote a big footnote about that in an answer to Why does printf still work with RAX lower than the number of FP args in XMM registers?. Not exactly a duplicate, but the answer basically has to explain the same details.Sanalda

© 2022 - 2024 — McMap. All rights reserved.