What's going on in Apple LLVM-gcc x86 assembly?
Asked Answered
E

3

8

I'm interested in learning more x86/x86_64 assembly. Alas, I am on a Mac. No problem, right?

$ gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 
5658) (LLVM build 2336.11.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO 
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I wrote a simple "Hello World" in C to get a base-line on what sort of code I'll have to write. I did a little x86 back in college, and have looked up numerous tutorials, but none of them look like the freakish output I'm seeing here:

.section    __TEXT,__text,regular,pure_instructions
.globl  _main
.align  4, 0x90
_main:
Leh_func_begin1:
pushq   %rbp
Ltmp0:
movq    %rsp, %rbp
Ltmp1:
subq    $32, %rsp
Ltmp2:
movl    %edi, %eax
movl    %eax, -4(%rbp)
movq    %rsi, -16(%rbp)
leaq    L_.str(%rip), %rax
movq    %rax, %rdi
callq   _puts
movl    $0, -24(%rbp)
movl    -24(%rbp), %eax
movl    %eax, -20(%rbp)
movl    -20(%rbp), %eax
addq    $32, %rsp
popq    %rbp
ret
Leh_func_end1:

.section    __TEXT,__cstring,cstring_literals
L_.str:
.asciz   "Hello, World!"

.section    __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
EH_frame0:
Lsection_eh_frame:
Leh_frame_common:
Lset0 = Leh_frame_common_end-Leh_frame_common_begin
.long   Lset0
Leh_frame_common_begin:
.long   0
.byte   1
.asciz   "zR"
.byte   1
.byte   120
.byte   16
.byte   1
.byte   16
.byte   12
.byte   7
.byte   8
.byte   144
.byte   1
.align  3
Leh_frame_common_end:
.globl  _main.eh
_main.eh:
Lset1 = Leh_frame_end1-Leh_frame_begin1
.long   Lset1
Leh_frame_begin1:
Lset2 = Leh_frame_begin1-Leh_frame_common
.long   Lset2
Ltmp3:
.quad   Leh_func_begin1-Ltmp3
Lset3 = Leh_func_end1-Leh_func_begin1
.quad   Lset3
.byte   0
.byte   4
Lset4 = Ltmp0-Leh_func_begin1
.long   Lset4
.byte   14
.byte   16
.byte   134
.byte   2
.byte   4
Lset5 = Ltmp1-Ltmp0
.long   Lset5
.byte   13
.byte   6
.align  3
Leh_frame_end1:


.subsections_via_symbols

Now...maybe things have changed a bit, but this isn't exactly friendly, even for assembly code. I'm having a hard time wrapping my head around this...Would someone help break down what is going on in this code and why it is all needed?

Many, many thanks in advance.

Ebersole answered 8/3, 2013 at 11:56 Comment(6)
It isn't freakish. And if you enable code optimization (e.g. with -O2), it will probably make more sense.Powys
Welcome to x86_64 assembly. Try compiling with -m32 option. It could give you a more familiar output.Milly
@AlexeyFrunze I've just compiled the sample C Hello world I wrote with -O2, just in case it wasn't defaulting to it already (I've been lead to believe gcc uses -O2 by default, somehow...might've been during my Gentoo days). The assembly source code doesn't look much different than the code above:Ebersole
How about unnecessary stores to the memory followed by reads from there? They should've gone with -O2. Also, if you use debugging options (was it -g?), you should drop them as they affect optimization.Powys
@AlexeyFrunze Sorry, hadn't completed my comment above: ...I should've been more specific with what I meant by 'freakish', which I only meant jokingly. The labeling system is unfamiliar to me now, as opposed to during my brief period with x86 in college. I understand the concept of optimizing instructions and doing arithmetic in strange ways in assembly via the compiler, but I'm more concerned with the labeling of sections and what each section is doing, not instruction-for-instruction but overall.Ebersole
Aha! But that should've been clarified in the question! :)Powys
P
11

Since the question is really about those odd labels and data and not really about the code itself, I'm only going to shed some light on them.

If an instruction of the program causes an execution error (such as division by 0 or access to an inaccessible memory region or an attempt to execute a privileged instruction), it results in an exception (not a C++ kind of exception, rather an interrupt kind of it) and forces the CPU to execute the appropriate exception handler in the OS kernel. If we were to totally disallow these exceptions, the story would be very short, the OS would simply terminate the program.

However, there are advantages of letting programs handle their own exceptions and so the primary exception handler in the OS handler reflects some of exceptions back into the program for handling. For example, a program could attempt to recover from the exception or it could save a meaningful crash report before terminating.

In either case, it is useful to know the following:

  • the function, where the exception has occurred, not just the offending instruction in it
  • the function that called that function, the function that called that one and so on

and possibly (mainly for debugging):

  • the line of the source code file, from which this instruction was generated
  • the lines where these function calls were made
  • the function parameters

Why do we need to know the call tree?

Well, if the program registers its own exception handlers, it usually does it something like the C++ try and catch blocks:

fxn()
{
  try
  {
    // do something potentially harmful
  }
  catch()
  {
    // catch and handle attempts to do something harmful
  }
  catch()
  {
    // catch and handle attempts to do something harmful
  }
}

If neither of those catches catches, the exception propagates to the caller of fxn and potentially to the caller of the caller of fxn, until there's a catch that catches the exception or until the default exception handler that simply terminates the program.

So, you need to know the code regions that each try covers and you need to know how to get to the next closest try (in the caller of fxn, for example) if the immediate try/catch doesn't catch the exception and it has to bubble up.

The ranges for try and locations of catch blocks are easy to encode in a special section of the executable and they are easy to work with (just do a binary search for the offending instruction addresses in those ranges). But figuring out the next outer try block is harder because you may need to find out the return address from the function, where the exception occurred.

And you may not always rely on rbp+8 pointing to the return address on the stack, because the compiler may optimize the code in such a way that rbp is no longer involved in accessing function parameters and local variables. You can access them through rsp+something as well and save a register and a few instructions, but given the fact that different functions allocate different number of bytes on the stack for the locals and the parameters passed to other functions and adjust rsp differently, just the value of rsp isn't enough to find out the return address and the calling function. rsp can be an arbitrary number of bytes away from where the return address is on the stack.

For such scenarios the compiler includes additional information about functions and their stack usage in a dedicated section of the executable. The exception-handling code examines this information and properly unwinds the stack when exceptions have to propagate to the calling functions and their try/catch blocks.

So, the data following _main.eh contains that additional information. Note that it explicitly encodes the beginning and the size of main() by referring to Leh_func_begin1 and Leh_func_end1-Leh_func_begin1. This piece of info allows the exception-handling code to identify main()'s instructions as main()'s.

It also appears that main() isn't very unique and some of its stack/exception info is the same as in other functions and it makes sense to share it between them. And so there's a reference to Leh_frame_common.

I can't comment further on the structure of _main.eh and the exact meaning of those constants like 144 and 13 as I don't know the format of this data. But generally one doesn't need to know these details unless they are the compiler or the debugger developers.

I hope this give you an idea of what those labels and constants are for.

Powys answered 9/3, 2013 at 6:9 Comment(1)
Excellent response. Thank you. This gives me a good enough idea of what's going on behind the scenes without getting much dirtier. The curiosity comes from wanting to handwrite modern x86_64 (God knows why, right?).Ebersole
C
4

Ok lets give it a try

// First section of code, declaring the main function that has to be align on a 32 bit boundary.

UPDATE: My explanation of the .align directive may be wrong. See gas documentation below.

.section    __TEXT,__text,regular,pure_instructions
.globl  _main
.align  4, 0x90
_main:

Store the previous base pointer and allocate stack space for local variables.

Leh_func_begin1:
pushq   %rbp
Ltmp0:
movq    %rsp, %rbp
Ltmp1:
subq    $32, %rsp
Ltmp2:

Push the arguments on the stack and call puts()

movl    %edi, %eax
movl    %eax, -4(%rbp)
movq    %rsi, -16(%rbp)
leaq    L_.str(%rip), %rax
movq    %rax, %rdi
callq   _puts

Put return value on stack, free local memory, restore base pointer and return.

movl    $0, -24(%rbp)
movl    -24(%rbp), %eax
movl    %eax, -20(%rbp)
movl    -20(%rbp), %eax
addq    $32, %rsp
popq    %rbp
ret
Leh_func_end1:

Next section, also a code section, containing the string to print.

.section    __TEXT,__cstring,cstring_literals
L_.str:
.asciz   "Hello, World!"

The rest is unknown to me, could be data used be the c startup code and or debugging info.

.section    __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
...

UPDATE: Documentation on the .align directive from: http://sourceware.org/binutils/docs-2.23.1/as/Align.html#Align

"The way the required alignment is specified varies from system to system. For the arc, hppa, i386 using ELF, i860, iq2000, m68k, or32, s390, sparc, tic4x, tic80 and xtensa, the first expression is the alignment request in bytes. For example `.align 8' advances the location counter until it is a multiple of 8. If the location counter is already a multiple of 8, no change is needed. For the tic54x, the first expression is the alignment request in words.

For other systems, including ppc, i386 using a.out format, arm and strongarm, it is the number of low-order zero bits the location counter must have after advancement. For example `.align 3' advances the location counter until it a multiple of 8. If the location counter is already a multiple of 8, no change is needed.

This inconsistency is due to the different behaviors of the various native assemblers for these systems which GAS must emulate. GAS also provides .balign and .p2align directives, described later, which have a consistent behavior across all architectures (but are specific to GAS)."

//jk

Calendre answered 8/3, 2013 at 12:16 Comment(0)
R
2

You can find the answers for pretty much any questions you've got related to the directives here and here.

For example:

.section    __TEXT,__text,regular,pure_instructions

Declares a section named __TEXT,__text with the default section type and specify that this section will contain only machine code (i.e. no data).


.globl _main
Makes the _main label (symbol) global, so that it will be visible to the linker.


.align 4, 0x90
Aligns the location counter to the next 2^4 (==16) byte boundary. The space in between will be filled with the value 0x90 (==NOP).

As for the code itself, it's obviously doing a lot of redundant intermediary loads and stores. Try compiling with optimizations enabled as one of the commentators suggested and you should find that the resulting code will make more sense.

Ruhl answered 8/3, 2013 at 12:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.