What assembly language does gcc produce on my system?
Asked Answered
F

2

9

I'm trying to learn a bit about assembly. I decided to start by looking at the generated assembly files from simple source code. Of course, I get bombarded by instructions that I have no idea what they mean, and I start to search for their meaning on the internet. While searching, I realized that I have no idea what assembly language I'm looking for..

Is there a way to know which assembly language gcc generates? Does this question even make sense? I am mainly interested in the assembly that my system accepts (or however I should phrase that..). See below for the generated code using gcc.

If you realize which knowledge gaps I have, please link the relevant documents to read/study.

System:

OS: Windows 10 Pro

Processor: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz 2.20 GHz

Type: 64-bit operating system, x64-based processor

//test.c

int main(){

    int x = 2;

    return 0;
}

 //test.s
.file   "test.c"
    .text
    .def    __main; .scl    2;  .type   32; .endef
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $48, %rsp
   .seh_stackalloc  48
   .seh_endprologue
    call    __main
    movl    $2, -4(%rbp)
    movl    $0, %eax
    addq    $48, %rsp
    popq    %rbp
    ret
   .seh_endproc
   .ident   "GCC: (Rev10, Built by MSYS2 project) 10.2.0"
Folklore answered 27/4, 2021 at 9:50 Comment(2)
pushq, movl, ret, these are instructions the rest is other stuff related to the language. the language is specific to a tool (gnu assembler or gcc) and not the target (x86), so the intel documentation which fully covers how these instructions work might not use the same syntax (I am not remotely talking about AT&T vs intel). so you may have to look at the gnu assembler documentation (part of binutils) or infer from the disassembly and machine code vs the intel documentation.All
choose a better instruction set to start with.All
C
18

GCC always produces asm output that the GNU assembler can assemble, on any platform. (GAS / GNU as is part of GNU Binutils, along with tools like ld, a linker.)

In your case, the target is x86-64 Windows (prob. from x86_64-w64-mingw32-gcc),
and the instruction syntax is AT&T syntax (GCC and GAS default for x86 including x86-64).

The comment character is # in GAS for x86 (including x86-64).
Anything starting with a . is a directive; some, like .globl main to export the symbol main as visible in the .o for linking, are universal to GAS in general; check the GAS manual.

SEH directives like .seh_setframe %rbp, 0 are Windows-specific stack-unwind metadata for Structured Exception Handling, specific to Windows object-file formats. (Which you can 100% ignore, until/unless you want to learn how backtraces and exception handling work under the hood, without relying on a chain of legacy frame pointers. AFAIK, it's basically equivalent to ELF/Linux .eh_frame metadata from .cfi directives.)

In fact you can ignore almost all the directives, with the only really important ones being sections like .text vs. .data, and somewhat important to make linking work being .globl. That's why https://godbolt.org/ filters directives by default.


You can use gcc -masm=intel if you want Intel syntax / mnemonics which you can look up in Intel's manuals. (https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html / https://www.felixcloutier.com/x86/). See also How to remove "noise" from GCC/clang assembly output?. (gcc -O1 -fverbose-asm might be interesting.)

If you want to learn AT&T syntax, see https://stackoverflow.com/tags/att/info. The GAS manual also has a page about AT&T vs. Intel syntax, but it's not written as a tutorial, i.e. it assumes you know how x86 instructions work, and are looking for details on the syntax GAS uses to describe them: https://sourceware.org/binutils/docs/as/i386_002dVariations.html

(Keep in mind that the CPU actually runs machine code, and it doesn't matter how the bytes get into memory, just that they do. So different assemblers (like NASM vs. GAS) and different syntaxes (like .intel_syntax noprefix) ultimately have the same limitations on what the machine can do or not in one instruction. All mainstream assemblers can let you express pretty much everything every instruction can do, it's just a matter of knowing the syntax for immediates, addressing modes, and so on. Intel and AMD's manuals document exactly what the CPU can do, using Intel syntax but not nailing down the details of syntax or directives.)


Resources (including some linked above):

Cherie answered 27/4, 2021 at 10:0 Comment(4)
Tons of resources can also be found below the x86 tag wiki, mainly thanks to our resident x86 guru @Peter Cordes :)Sandarac
@Lundin: Oh yeah, thanks. I haven't been editing as much recently to keep up with new canonicals. It kind of got bloated and is in need of some cleanup, but there's lots of good stuff.Cherie
This is gnu assembler (gas) syntax not AT&T. The language is specific to the tool not the target nor where the destination is in the mnemonics...All
@old_timer: The instructions are using AT&T syntax, the directive are GAS directives. I don't think it does anyone any good to say it's "not AT&T". Are you trying to point out that I should have talked about directives more in paragraphs about it being GAS, rather than in paragraphs about it being AT&T syntax, which is GAS's default for x86 targets? Do note that some directives are target-specific (to the object-file format), e.g. .seh_*Cherie
S
4

Is there a way to know which assembly language gcc generates?

Yeah the one for your target port. Which appears to be x86. This assembler language in turn comes in various flavours and dialects, with tons of history: https://en.wikipedia.org/wiki/X86_assembly_language

Of course, I get bombarded by instructions that I have no idea what they mean

Reading C compiler-generated assembler is much harder than reading hand coded assembler. I'd recommend to start with some assembler tutorials with code examples written by humans instead.

x86 is also perhaps the hardest one of them all because of all the flavours, and because of the complexity of the core. It's generally recommended to learn some simple assembler first to get the hang of it.

8 bit microcontrollers is a good place to start.

Sandarac answered 27/4, 2021 at 10:1 Comment(29)
If you're already familiar with C, a clean machine that's a simple compiler target might be a good place to start, like RISC-V or possibly ARM64. If you compile with the right options, and craft your examples nicely, compiler-generated asm can be a good way to learn. Not a great way to start if you don't know the super-basics of the stack and calling convention and what registers are, though, so yeah start with that, but after that, any time you wonder "what would be a good way to do X in asm", a good starting point is writing a function in C and compiling it with optimization.Cherie
e.g. Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” has some good examples of writing C functions that don't optimize away, but instead into something interesting to look at. See also How to remove "noise" from GCC/clang assembly output?.Cherie
I wouldn't recommend an 8-bit micro if you already know C and want to learn what it's like under the hood. Once you already know asm for a machine that makes a simple compiler target (int and pointer both fit in registers), then you can look at how 8-bit micros have to deal with pointers (e.g. AVR pair of regs, or not deal with pointers if they mostly only have support for 8-bit offsets). Also, if you don't have a stack, then the way "local variables" work will be different from what you're used to in thinking about C.Cherie
@PeterCordes Schools still like to use the ancient Motorola 68k for teaching assembler. Which was old even by the time I went to uni 20 years ago :) Personally I find Motorola-based assembler much easier to read but it might just be personal preference. HC08, AVR, STM8, R8C etc 8 bit micros all use some flavour of it and these families are still mostly in production.Sandarac
m68k was the first assembly I learned, and yeah it's widely recognized as a pretty clean CISC design without a lot of weird stuff, good for learning. (I had my dad's Atari Mega4 STe desktop (16MHz 68000) at home at the time when I was learning C in first year of university, with GCC and a unix-like shell my dad had installed. I found a fixed-point Mandelbrot program in a magazine or book and typed it in, and was optimizing it when I got tired of the slow compile times and started editing the asm. My dad had some m68k asm books, but these days the internet would have been fine.)Cherie
But yeah, seeing how C optimized into asm seemed pretty straightforward for me, for the inner-most loop of something I already understood, and I stand by that as a good way to learn. (Along with a book or tutorial for the absolute basics; compiler output is great for learning a 2nd or 3rd ISA when you already know one.) But I think you're talking about m68k's syntax style. Yeah IDK, these days I generally prefer Intel syntax (destination on the left, like an = or += C statement), but the loose addressing mode syntax without fixed places where different parts had to be used to grate on me.Cherie
@Lundin: I would probably not recommend a friend to buy any 8 bit CPU-based SBC myself, at a time where you can buy a 64 bit Aarch64 SBC for around USD 25 that is suitable for bare-metal, FreeRTOS or Linux programming. Regarding the mk68k, I still remember the shock I experimented when I had to switch from m68k assember to 8088 assembler.Tzong
@Tzong Your definition of bare metal is different than mine then. Also, all these "PC in disguise" solutions programmed by "PC programmers in disguise" are becoming an increasing quality problem to the electronics industry, which already suffered from a high ratio of quack programmers even before this whole single board hype started. (And then the "IoT" hype...) Sure there are very few reasons to do professional development with 8 bitters today, I personally loathe them and will never touch one again. But for the sole purpose of learning assembler, they are great.Sandarac
@Lundin: I was more speaking from a hobbyist point of view buying its first board, because I don't see any major differences in learning assembly on an AVR than on an Aarch32-based CPU, and because you can ultimately learn everything on such a board from bare-metal (even your definition of it) to FreeRTOS, Linux, PostgreSQL, Docker, everything. And of course, you can start learning assembly by writing assembly programs on a Linux system, then switch to bare-metal assembly programming, using the same board.Tzong
@Lundin: Our definitions of bare-metal are probably closer than you may think however: for me, loading a minimalist aarch32 or aarch64 program written in assembly into a $25 Allwinner H5 64 bit SBC static OCRAM using sunxi-fel, running it and examining the result by dumping the memory after it stopped is bare-metal. Debugging the same program using a $3 Altera USB Blaster clone step by step using openocd and GDB would be bare-metal programming.Tzong
@Lundin: Loading a small program configuring a timer and triggering an interruption into the same board SRAM using u-boot loads and go commands, the same way this could be done a long time ago on a 68000 board using a ROM monitor, would be bare-metal as well. I would tend to agree to the rest of your considerations though. For me, running a python program written in micropython would definitively not be bare-metal programming.Tzong
@Lundin: Learning what, though? Bare metal embedded / OS development? Sure, keep it simple with an 8-bit micro. Learning how C compiles to asm for "normal" targets for performance tuning? Use a normal target, especially one where you can use performance counters, especially if you have some OS background to understand context sw / page fault overhead. (Depending how much if any CPU-architecture you understand, learning some about tuning for an in-order CPU might be be good (and possibly useful; big.LITTLE seems to be here to stay, with Intel planning that for the future, and Apple with M1.)Cherie
Although to be fair, I'm glad I got some experience playing around on hardware (Atari ST, and a joystick-port I/O from TTL logic hack on a TI99/4a from a yardsale) that had direct HW access from normal programs, without having to ask the OS nicely. IDK how well or quickly I'd have grokked the concept under Linux if I hadn't written a program to scribble funny art characters into video RAM on the ST. So I certainly see some appeal of messing around with a simpler system at some point if you want to understand what kernels do. But I never messed with interrupt handlers, just simple stuff.Cherie
@Peter Cordes: One strong argument in favor of at least, say, a cortex-m3 low cost board, as a learning platform would be IMHO the availability of on-board SWD/JTAG debuggers with GDB support. I don't see any 8 bit platform available with such a feature at a reasonable cost. Would you know of any ?Tzong
@Tzong High-end CPUs come with a lot of complexity that beginners can do without: branch prediction, instruction cache, data cache, advanced pipelining, MMU setup, multi-core and so on. I personally find the more high end assemblers more hard to read too: x86, ARM and Power PC. Especially since all of these come with various dialects and often multiple instruction sets on the same core.Sandarac
@PeterCordes The main purpose would be to learn how computers work. You can't really become a good C (or C++, or higher level) programmer if you don't have a basic idea of how computers work. It's very hard to understand (or teach) higher level concepts like for example calling convention, function inlining, re-entrancy, stack overflows etc to someone who don't have a clue about the underlying machine code.Sandarac
And then if you end up programming embedded systems, there's often a few cases here and there when you have to resort to manual inline asm for some particularly sensitive part of the code, like clearing interrupt flags or setting up memory.Sandarac
@Lundin: Your point is valid, even though a beginner would probably not have to consider those things at first. But I would again definitively require the platform to have a SWD/JTAG debugging hardware/software available at low cost. I don't know of any 8 bit platform that can offer that. Would you know of any ?Tzong
@Tzong Yeah SWD is very neat. But most semi-modern 8-bitters have some similar although proprietary single-wire interface, they are low cost but MCU-specific. The older ones from the 90s had pure evil interfaces, where you have to pull ten different pins to different values, provide a programming voltage of 8.999V and enable it at midnight while walking backwards around your local church :) Those interfaces I wouldn't recommend to my worst enemies.Sandarac
@Lundin: Just curious, what would be the minimal overall cost of using the combination of software/hardware required for those semi-modern 8 bit platforms ?Tzong
Right, agreed on learning registers, calling conventions, the asm stack, etc. before moving on to performance and/or osdev concepts. You can do that in user-space under a mainstream OS without needing any real hardware, using an emulator. For example MARS is actually pretty decent, although its toy syscall interface is a weird mix of libc and system calls, and makes a bunch of "normal" stuff just plain impossible (e.g. cursor movement). Or RARS for RISC-V. x86 asm used to be ok to learn, but changes like PIE by default have introduced potholes in a lot of tutorials.Cherie
@Tzong Just google "x starter kit". The cheapest one (some PIC16) was priced at 2.5€ but that's likely based on some icky bootloader. Other such kits (I tried HCS08, R8C, EFM8) range between 20€ up to around 70-80€ but then you get some manner of in-circuit debugger with the deal.Sandarac
@Lundin: You can have a STM32F0308-DISCO for CAD 12, i.e. 8€, and use STMCube, or Segger Ozone GUI debugger for free once having replaced the STLinkv2 firmware by a Segger-provided one. And of course GDB or Eclipse CDT/Eclipse Standalone Debugger or Apache Netbeans. I would assume that there is way more support available in various communities for those than for the more expensive 8 bit solutions you are mentionning - my two cents - , but I may but I may of course be wrong.Tzong
@Tzong I personally pretty much only program Cortex M nowadays. Crossworks+Segger is my current favoured tool chain for professional development. But it's not something I'd recommend for learning assembler. Also I wouldn't recommend Eclipse to anyone, for any purpose.Sandarac
@Frant: I don't disagree, I was just considering the debugging aspect for people that are adverse to using GDB in command-line mode. I would just recommend any decent editor such as notepad++ or vscode, and GNU as/GNU ld for the development per se for beginners.Tzong
If someone wanted to learn more ISAs coming from a fairly strong background in x86_64 what would you suggest?Rossierossing
@Rossierossing ARM 32/64 is the obvious choice then, since it's the other mainstream architecture. Seems PowerPC has lost ground ever since the ARM hype started. As for proprietary microcontrollers, I wouldn't bother unless you plan to use a particular one extensively.Sandarac
@Sandarac really? Had thought PowerPC was up there. At least what I see on glibc-alpha makes me think its pretty high priority. Is there a lot of micro-optimization work for ARM the same way there is for x86_64 and apparently powerpc?Rossierossing
@Rossierossing I can't say how "mature" the various optimizers are, but PowerPC has been around longer than ARM so I would imagine it got more advanced optimizers. Also, which of these that are most used might depend on what branch you work in. I suppose PowerPC might be more frequently used in network & telecom embedded systems.Sandarac

© 2022 - 2024 — McMap. All rights reserved.