Intel x86 to ARM assembly conversion
Asked Answered
R

2

9

I am currently learning ARM assembly language;

To do so, I am trying to convert some x86 code (AT&T Syntax) to ARM assembly (Intel Syntax) code.

__asm__("movl $0x0804c000, %eax;");

__asm__("mov R0,#0x0804c000");

From this document, I learn that in x86 the Chunk 1 of the heap structure starts from 0x0804c000. But I when I try do the same in arm, I get the following error:

/tmp/ccfNZp9F.s:174: Error: invalid constant (804c000) after fixup

I am assuming the problem is that ARM can only load 32bit instructions.

Question 1: Any idea what would be the first chunk in case of ARM processors?


Question 2:

From my previous question, I know how memory indirect addressing works.

Are the snippets written below doing the same job?

movl (%eax), %ebx

LDR R0,[R1]

I am using ARMv7 Processor rev 4 (v7l)

Rora answered 25/6, 2013 at 12:44 Comment(2)
See: label and label... and all the duplicates I marked there. The ARM only supports 8bit constants rotated by a multiple of two. In order to support a constant like you have, the syntax ldr r0,=0x804c000 is used. The assembler maintains a literal pool and place the constant there. A PC relative addressing is used to load the constant. Use the directive .ltorg to dump the pool in your assembler.Ludmilla
because ARM is risc, and x86 is cisc and because they are simply different instruction sets, only a small percentage of the x86 code will "port" directly one to one. It may take multiple ARM instructions per x86 (and at times vice versa, one arm instruction for a group of x86 instructions). all processors will have similar things like register indirect addressing, and yes those two are functionally the same.Lindbergh
A
2

Answer to Question 1

The MOV instruction on ARM only has 12 bits available for an immediate value, and those bits are used this way: 8 bits for value, and 4 bits to specify the number of rotations to the right (the number of rotations is multiplied by 2, to increase the range).

This means that only a limited number of values can be used with that instruction. They are:

  • 0-255
  • 256, 260, 264,..., 1020
  • 1024, 1040, 1056, ..., 4080
  • etc

And so on. You are getting that error because your constant can't be created using the 8 bits + rotations. You can load that value onto the register following instruction:

LDR r0, =0x0804c000

Notice that this is a pseudo-instruction though. The assembler will basically put that constant somewhere in your code and load it as a memory location with some offset to the PC (program counter).

Answer to question 2

Yes those instructions are equivalent.

Anklebone answered 25/6, 2013 at 13:51 Comment(6)
Thanks. When I try LDR r0, =0x0804c000. Here is the assembler message: offset out of rangeRora
Put a .ltorg somewhere close to where you are loading that value. This will put the literal pool there, which should solve the offset problem.Anklebone
.ltorg is exactly what I described. It is a literal pool; constant data the assembler places to satisfy the ldr r0,=0x804c000 request. Maybe relocation in ARM assembler helps? Or Arm op-codes and .ltorg in gnu-assembler manual. Just add the text .ltorg after a sub-routine return on occasion, and that is all you have to know.Ludmilla
@artlessnoise: Thanks. Just one question: If I want to load the data from r0 to another register, I would do LDR R1,[R0].The value of R0 being 0x0804c000. On the same lines as described previously, I would try to do LDR =R1,[R0]Rora
Sorry, I meant to put my comment in the under the main question. Anyways, use ADD, not LDR. The ARM is a RISC CPU and has specific load and store instructions. You can not do memory to register type operations. I guess 0x0804c000 is an address? If it was a structure address, then you could do ldr r1, [R0, #offset]. That would add a value and get the address there. If 0x0804c000 is some register data then, add r1, r0, #const could work. Here #const has the same restrictions as the mov, an eight bit rotate right by multiple of two; all ARM codes are 32bit, no roomLudmilla
Got it. it was my mistake. I could do MOV R1,R0;. I will now check the disassembler and register info.Rora
L
7

Trying to learn arm by looking at x86 is not a good idea one is CISC and quite ugly the other is RISC and much cleaner.. Just learn ARM by looking at the instruction set reference in the architectural reference manual. Look up the mov instruction the add instruction, etc.

ARM doesnt use intel syntax it uses ARM syntax.

Dont learn by using inline assembly, write real assembly. Use an instruction set simulator first not hardware.

ARM, Mips and others aim for fixed word length. So how would you for example fit an instruction that says move some immediate to a register, specify the register, and fit the 32 bit immediate all in 32 bits? not possible. So for fixed length instruction sets you cannot simply load any immediate you want into any register. You must read up on the rules for that instruction set. mips allows for 16 bit immediates, arm for 8 plus or minus depending on the flavor of arm instruction set and the instruction. mips limits where you can put those 16 bits either high or low, arm lets you put those 8 bits anywhere in the 32 bit register depending on the flavor of arm instruction set (arm, thumb, thumb2 extensions).

As with most assembly languages you can solve this problem by doing something like this

ldr r0,my_value
...
my_value: .word 0x12345678

With CISC that immediate is simply tacked onto the instruciton, so whether it 0 bytes a way or 20 bytes away it is still there with either approach.

ARM assemblers also generally allow you this shortcut:

ldr r0,=something
...
something:

which says load r0 with the ADDRESS of something, not the contents at that location but the address (like an lea)

But that lends itself to this immediate shortcut

ldr r0,=0x12345678

which if supported by the assembler will allocate a memory location to hold the value and generate a ldr r0,[pc,offset] instruction to read it. If the immediate is within the rules for a mov then the assembler might optimize it into a mov rd,#immediate.

Lindbergh answered 25/6, 2013 at 13:52 Comment(3)
What exactly is the ARM Syntax?Rora
Intel syntax is the syntax defined in the intel manuals for the intel procdessor. AT&T syntax is deviation on that by AT&T for their assembler I assume. Neither of these have anything whatsoever to do with any other processor than x86. Applying those terms to ARM or MIPS or AVR or 6502 or PDP11 or any other processor makes no sense. The syntax defined in the original vendors instruction set reference is that vendors syntax, that vendor often makes or has made an assembler that uses a related syntax. And you go with that.Lindbergh
Anyone who writes an assembler though can change the syntax as they see fit as assembly language really doesnt have standards, the machine code is the standard and however you get there is however you get there. Just look at what gnu assembler has done to assembly languages as an example.Lindbergh
A
2

Answer to Question 1

The MOV instruction on ARM only has 12 bits available for an immediate value, and those bits are used this way: 8 bits for value, and 4 bits to specify the number of rotations to the right (the number of rotations is multiplied by 2, to increase the range).

This means that only a limited number of values can be used with that instruction. They are:

  • 0-255
  • 256, 260, 264,..., 1020
  • 1024, 1040, 1056, ..., 4080
  • etc

And so on. You are getting that error because your constant can't be created using the 8 bits + rotations. You can load that value onto the register following instruction:

LDR r0, =0x0804c000

Notice that this is a pseudo-instruction though. The assembler will basically put that constant somewhere in your code and load it as a memory location with some offset to the PC (program counter).

Answer to question 2

Yes those instructions are equivalent.

Anklebone answered 25/6, 2013 at 13:51 Comment(6)
Thanks. When I try LDR r0, =0x0804c000. Here is the assembler message: offset out of rangeRora
Put a .ltorg somewhere close to where you are loading that value. This will put the literal pool there, which should solve the offset problem.Anklebone
.ltorg is exactly what I described. It is a literal pool; constant data the assembler places to satisfy the ldr r0,=0x804c000 request. Maybe relocation in ARM assembler helps? Or Arm op-codes and .ltorg in gnu-assembler manual. Just add the text .ltorg after a sub-routine return on occasion, and that is all you have to know.Ludmilla
@artlessnoise: Thanks. Just one question: If I want to load the data from r0 to another register, I would do LDR R1,[R0].The value of R0 being 0x0804c000. On the same lines as described previously, I would try to do LDR =R1,[R0]Rora
Sorry, I meant to put my comment in the under the main question. Anyways, use ADD, not LDR. The ARM is a RISC CPU and has specific load and store instructions. You can not do memory to register type operations. I guess 0x0804c000 is an address? If it was a structure address, then you could do ldr r1, [R0, #offset]. That would add a value and get the address there. If 0x0804c000 is some register data then, add r1, r0, #const could work. Here #const has the same restrictions as the mov, an eight bit rotate right by multiple of two; all ARM codes are 32bit, no roomLudmilla
Got it. it was my mistake. I could do MOV R1,R0;. I will now check the disassembler and register info.Rora

© 2022 - 2024 — McMap. All rights reserved.