How to compile arm assembly on an m1 macbook
Asked Answered
D

3

21

I am completely new to assembly and would like to learn arm assembly (simply because I own an m1 mac). I can't really find many ressources online, that's why I'm here. My code is as follows (file is called first.asm):

global _main
section .text

_main: 
    mov r7, 0x4
    mov r0, 1
    ldr r1, message
    mov r2, 13
    swi 0

.section .data
    message: .ascii "Hello, World\n"

When I use

as first.asm -o hello.o 

I get the following errors:

first.asm:2:1: error: invalid instruction mnemonic 'global'
global _main
^~~~~~
first.asm:3:1: error: invalid instruction mnemonic 'section'
section .text
^~~~~~~
first.asm:6:2: error: unknown use of instruction mnemonic without a size suffix
        mov r7, 0x4
        ^
first.asm:7:2: error: unknown use of instruction mnemonic without a size suffix
        mov r0, 1
        ^
first.asm:8:2: error: invalid instruction mnemonic 'ldr'
        ldr r1, message
        ^~~
first.asm:9:2: error: unknown use of instruction mnemonic without a size suffix
        mov r2, 13
        ^
first.asm:10:2: error: invalid instruction mnemonic 'swi'
        swi 0
        ^~~
first.asm:12:15: error: unexpected token in '.section' directive
.section .data

I have a couple of questions:

  1. Is "as" a built-in mac compiler for assembly code?
  2. Do I need a different compiler?
  3. Does it make sense for me to learn arm assembly since I'm on an m1 mac or could I write x86 assembly without issues?
Diffluent answered 15/11, 2021 at 12:26 Comment(9)
The Apple M1 only supports ARM64 (also known as aarch64) assembly which is quite different from 32 bit ARM assembly. While you might be able to assemble 32 bit ARM programs with a suitable toolchain, you will not be able to run them.Chromatophore
You need to check what is the default assembler and write assembly code accordingly. Each assembler has its own syntax for writing assembly code.Parson
Also note that assembly is assembled, not compiled. To translate assembly into machine code, you need an assembler, not a compiler. As for your last question, learning ARM or ARM64 assembly is not a bad idea. You can also run emulated x86 code using Rosetta.Chromatophore
Thank you both for your answers. You cleared up a lot of my confusion. fuz One question out of curiosity: How did you know this was 32 bit? By the r7, r0 and so forth? kiner_shah: How would I figure out my default assembler? ThanksDiffluent
@cachedcashew: Run as --version to find out what assembler is being used.Donalt
Note that assuming you are using some version of the GNU assembler, or something compatible with it, then you have typos: directives start with . and the first two lines should have .global and .section. That is of course unrelated to the 32 vs 64 bit issue.Donalt
@NateEldredge Thanks for the help! as --version gives me: Apple clang version 12.0.5 (clang-1205.0.22.11) Target: x86_64-apple-darwin20.3.0 Does this mean my assembler is set to x86 and I need to change it to arm64? Sorry for all the noob questions.Diffluent
@Diffluent Yes, arm64 has different register names and slightly different instructions. It also does not have an swi instruction (it's called svc there). You will need to obtain an arm64 tutorial. Do not try to port your ARM tutorial to ARM64 while following it. As for the toolchain, you somehow managed to install an x86 toolchain. Try to obtain an ARM toolchain.Chromatophore
I see that all the comments above are summarized in my answer. Yet it is voted down consistently. There is something fundamentally wrong with stackoverflow. And may I stress that comments are not intended to give answers. If you have a partial answer that can be improved by others.Oomph
E
13

The following tutorial from Stephen Smith details some of the differences between Linux ARM64 assembly language and MacOS ARM64 assembly language: https://smist08.wordpress.com/2021/01/08/apple-m1-assembly-language-hello-world/

Here is a sample HelloWorld.s source assembler program for MacOS ARM64:

// Assembler program to print "Hello World!"
// to stdout.
//
// X0-X2 - parameters to linux function services
// X16 - linux function number
//
.global _start             // Provide program starting address to linker
.p2align 3 // Feedback from Peter

// Setup the parameters to print hello world
// and then call Linux to do it.

_start: mov X0, #1     // 1 = StdOut
        adr X1, helloworld // string to print
        mov X2, #13     // length of our string
        mov X16, #4     // MacOS write system call
        svc 0     // Call linux to output the string

// Setup the parameters to exit the program
// and then call Linux to do it.

        mov     X0, #0      // Use 0 return code
        mov     X16, #1     // Service command code 1 terminates this program
        svc     0           // Call MacOS to terminate the program

helloworld:      .ascii  "Hello World!\n"

The commands to assemble and link it are the following:

as -o HelloWorld.o HelloWorld.s
ld -macosx_version_min 13.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path` -e _start -arch arm64

IMO, the ideal way to approach the challenge of learning ARM64 assembler on MacOS ARM64 is to use Xcode with its built in source code management and debugger.

The Stephen Smith book and his source code examples is for Linux ARM64 and I believe iOS ARM64. The HelloSilicon GIT repository referenced by Albert is Stephen's source code updated to support native macOS ARM64. Stephen's Chapter 3 source code includes a Xcode.project to run ARM64 on an iOS device with all the view controller junk. The HelloSilicon Chapter 03 includes a much simpler Xcode project for native macOS - Invaluable to you (and me) since you will also develop skills around source code management in Xcode together with using in the Xcode debugger.

Assembler languages are very different between CPU architectures. Whatever one you decide to learn, stick with it.

Easterner answered 24/8, 2023 at 18:58 Comment(5)
That tutorial discusses differences between Linux AArch64 and macOS AArch64. Not "standard" vs. non-standard, and not 32-bit ARM code like in the question. It does show a working Hello World for AArch64 macOS, but some of the explanation doesn't match the code. e.g. they write "In MacOS the program must start on a 64-bit boundary, hence the listing has an “.align 2” directive near top." But .align 2 aligns to 2^2 = 4, not 8 bytes. They should be using .p2align 3.Feud
.align on some GAS targets takes an exponent, on others it takes a power-of-2 value directly. Use .p2align or .balign so readers can be sure whether it's 2 bytes or 2^2 = 4 bytes without having to memorize the assembler differences between platforms.Feud
Anyway, probably the most important point here is that this is AArch64, not "ARM" assembly. It's a whole different architecture; as discussed in comments under the question macOS doesn't support 32-bit ARM mode at all, nor do Apple M1 CPUs.Feud
I don't use macOS myself, sorry. I just know how it works from Stack Overflow. But you can easily look at compiler-generated asm for a simple C function, using macOS clang -O2 -S. See How to remove "noise" from GCC/clang assembly output? (but don't use godbolt.org unless you want to dig up the right -target aarch64-darwin-whatever string. Use a local compiler on macOS and look at the asm from a C function, preferably not main, just one that takes an arg and returns a value)Feud
@PeterCordes Just an update... I own the ARM64 book referenced by Albert and had the GitHub download of his source. It turns out the HelloSilicon Github repository referenced by Albert is Stephen's source updated to support macOS ARM64.Easterner
M
0

Going error by error, for your first error:

first.asm:2:1: error: invalid instruction mnemonic 'global': global _main

To solve this put .global _main

first.asm:3:1: error: invalid instruction mnemonic 'section': section .text

This time just replace your section .text with _start: to indicate to the assembler (compiler but for assembly) that the code following is what you want the computer to execute

first.asm:6:2: error: unknown use of instruction mnemonic without a size suffix: mov r7, 0x4

Make sure to put mov r7, #0x4 because the "#" is the symbol for a constant

first.asm:7:2: error: unknown use of instruction mnemonic without a size: suffix mov r0, 1

Once more use mov r0, #1

first.asm:8:2: error: invalid instruction mnemonic 'ldr': ldr r1, message

For this error you want to write ldr r1, =message once more it's a syntax error. You have to put the "=" sign to tell the assembler you're using a variable

first.asm:9:2: error: unknown use of instruction mnemonic without a size suffix: mov r2, 13

Once more put a "#" mov r2, #13

first.asm:10:2: error: invalid instruction mnemonic 'swi': swi 0

Just put swi #0 because "0" is the default return type (like return 0 in c++)

first.asm:12:15: error: unexpected token in '.section' directive: .section .data

Simply remove the .section tag leaving .data

Compilation process:

as HelloWorld.s -o HelloWorld.o compiles your binary to an object file

ld -macos_version_min 15.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot 'xcrun -sdk macosx --show-sdk-path' -e _start -arch arm64 links your object file to an executable (note: you might have to change the macos_version_min to keep you with your version)

./HelloWorld should work for running your executable

Michaud answered 14/5, 2024 at 19:7 Comment(0)
O
-6

You are naive with regards to assemblers, apparently, by your oen admission. Assemblers are different than compilers, that go at great length to accomodate a standard language.

If you have a standard ISO c program chances are that it will run as is on different systems. Adding more of a utility to a program you will discover that you need a standard library that will e.g. read files that are vastly different from MS-Windows to Linux. Even so, if you are presented with a standard program that runs on Linux, you will be at a loss using it on MS-Windows. You will have to learn how to use a compiler on MS-Windows. This could take up one hour, for very experienced people, to one week, or even a year.

The situation with assemblers is worse. Suppose you have understood the machine instructions well enough to craft a program. You will encounter the problem that although Intel has specified the instructions, few assemblers or only the Intel supplied assembler can handle your source. In fact every single assembler program creates its own dialect for Intel instructions (and there is a separate universe for ARM instructions.)

Notorious is that GNU insisted on calling the AX register %ax. Then there is are the assembler directives. You pointed out "global" and "section". They are invented from whole cloth by the inventor of the assembler. As luck may have it there are certain convertions, but they are not much use to you, being inexperienced. In my experience you can port a program in assembler and find 6000 errors in 13000 lines. This can be caused e.g. by different comment conventions, renaming registers, insisting that instruction are lower case, or the order of operands. So you are surprised by finding that "global" is not understood? The first thing to get a manual for that as you are using and discover that "global" is not in there.

That is on top of the problem to run the assembler and use it to make a program, probably a separate link step. That is the difficulty that was familiar with running the c-compiler with a standard library, but on steroids.

If you plan to go on with learning assembly in this way you are advised to goto https://github.com/below/HelloSilicon where there is a whole course, specifically targeting the Apple M1. I cannot stress this enough, assembly language is specific, general advise doesn't cut it. Buy the $50 book, study it, and go step by step.

An alternative is to use an embedded assembler. The idea is that you code a small snippet of code in an interpreter with a high level language. You still have to be aware of register usage within that high level language. This is done in C but the simplest is Forth.

I have written a dialect of Forth. https://github.com/albertvanderhorst/ciforth Within the AMD_64 release you can find the source code of an assembler in the library and an example how to use that to add e.g. floating point instructions to the high level interpreter.

CODE F+ FADDP, ST1|     NEXT, END-CODE

You see here that the single, high-level instruction, F+ is defined, the code to add two floating pointer numbers. CODE is invoking the assembler and END-CODE end this. You are talking to the interpreter of the high level language before and afterwards. The assembler itself understands F+ FADDP, ST1| . NEXT, is a shorthand for instructions that are always the same and instructs the interpreter to go on with the next instruction. That only illustrates the simplest possible use of assembly language for AMD_86 Unfortunately the example is not available for the ARM language, although ciforth has a 64 bits ARM compiler for Linux.

Oomph answered 10/8, 2023 at 9:5 Comment(3)
I thought it was a brilliant answer, posed by someone who has 50+ years of experience with assemblers. I find it interesting that it is voted down multiple times and nobody cares to comment.Oomph
I didn't downvote, but your first sentence was a bit blunt. This can cause others (newbies especially) to feel discouraged and not want to read the rest of your answer. I thought it was a good answer though!Amphibious
I softened the first sentence, without changing its meaning.Oomph

© 2022 - 2025 — McMap. All rights reserved.