How to reconcile short conditional jumps with branch target alignments in Delphi assembler?
I’m using Delphi version 10.2 Tokyo, for 32-bit and 64-bit assembly, to write some functions entirely using the assembly.
If I don’t use the .align
, the compiler correctly encodes short
conditional jumps instructions (2 byte instruction which consists of an 1-byte opcode 074h
and 1-byte relative offset -+ up to 07Fh). But if I ever put even a single .align
, even as small as .align 4
-- all conditional jump instructions that are located before the .align and have destination located after the .align
- in this case all these instructions become 6-byte instructions, not 2-byte as they should be. Only the instructions that are located after the .align remain correctly encoded as 2-byte short
.
Delphi Assembler doesn’t accept ‘short’ prefix.
How can I reconcile short conditional jumps with branch target alignments with .align
in Delphi assembler?
Here is a sample procedure – please note that there is an .align
in the middle.
procedure Test; assembler;
label
label1, label2, label3;
asm
mov al, 1
cmp al, 2
je label1
je label2
je label3
label1:
mov al, 3
cmp al, 4
je label1
je label2
je label3
mov al, 5
.align 4
label2:
cmp al, 6
je label1
je label2
je label3
mov al, 7
cmp al, 8
je label1
je label2
je label3
label3:
end;
Here is how it is encoded – conditional jumps, located before the align
, that point to to label2 and label3 (after the align
) are encoded as 6-byte instructions (this is a 64-bit CPU target):
0041C354 B001 mov al,$01 // mov al, 1
0041C356 3C02 cmp al,$02 // cmp al, 2
0041C358 740C jz $0041c366 // je label1
0041C35A 0F841C000000 jz $0041c37c // je label2
0041C360 0F8426000000 jz $0041c38c // je label3
0041C366 B003 mov al,$03 //label1: mov al, 3
0041C368 3C04 cmp al,$04 // cmp al, 4
0041C36A 74FA jz $0041c366 // je label1
0041C36C 0F840A000000 jz $0041c37c // je label2
0041C372 0F8414000000 jz $0041c38c // je label3
0041C378 B005 mov al,$05 // mov al, 5
0041C37A 8BC0 mov eax,eax // <-- a 2-byte dummy instruction, inserted by ".align 4" (almost a 2-byte NOP)
0041C37C 3C06 cmp al,$06 //label2: cmp al, 6
0041C37E 74E6 jz $0041c366 // je label1
0041C380 74FA jz $0041c37c // je label2
0041C382 7408 jz $0041c38c // je label3
0041C384 B007 mov al,$07 // mov al, 7
0041C386 3C08 cmp al,$08 // cmp al, 8
0041C388 74DC jz $0041c366 // je label1
0041C38A 74F0 jz $0041c37c // je label2
0041C38C C3 ret // label3:
But if I remove the .align
- all the instructions have correct size - just 2 bytes as they used to be:
0041C354 B001 mov al,$01 // mov al, 1
0041C356 3C02 cmp al,$02 // cmp al, 2
0041C358 7404 jz $0041c35e // je label1
0041C35A 740E jz $0041c36a // je label2
0041C35C 741C jz $0041c37a // je label3
0041C35E B003 mov al,$03 //label1: mov al, 3
0041C360 3C04 cmp al,$04 // cmp al, 4
0041C362 74FA jz $0041c35e // je label1
0041C364 7404 jz $0041c36a // je label2
0041C366 7412 jz $0041c37a // je label3
0041C368 B005 mov al,$05 // mov al, 5
0041C36A 3C06 cmp al,$06 //.align 4 label2:cmp al, 6
0041C36C 74F0 jz $0041c35e // je label1
0041C36E 74FA jz $0041c36a // je label2
0041C370 7408 jz $0041c37a // je label3
0041C372 B007 mov al,$07 // mov al, 7
0041C374 3C08 cmp al,$08 // cmp al, 8
0041C376 74E6 jz $0041c35e // je label1
0041C378 74F0 jz $0041c36a // je label2
0041C37A C3 ret // je label3
// label3:
Back to conditional jumps instructions: how can I reconcile short conditional jumps with branch target alignments with .align
in Delphi assembler?
I acknowledge that the benefit of aligning branch targets on processors like SkyLake and later is slim and I understand that I can just refrain from using .align
- it will also save the code size. But I want to know how can I use Delphi assembler to generate short jumps with align
. This problem persists in 32-bit target also, not only in the 64-bit one.
.align
at all with that assembler. – Railingjcc rel8
a "short" jump, andjcc rel32
a "near" jump. Both of them are near jumps, as opposed to a far jump to a different code segment. So "short" means "near with compact encoding". The online HTML versions get messy after the first page of the table :( – Beefburgershort
, notnear
(sincenear
was only relevant for 16-bit code, not for 32-bit or 64-bit code). – Railing.align
directives. Maybe it's a 1-pass assembler that can't go back and prove that the branch distances are all short? The labels are local, right, so it can't be worried about the linker needing to fill in a different address. Otherwise that would be a problem all the time, not just with.align
. – Beefburger.align
with that assembler. That you get a few long forward branches shouldn't matter a lot. Most branches that matter (e.g. in loops) are backward anyway, and there it works. – Rolansysenter
ABI like Linux does.) I'd guess that far jumps aren't predicted, but it's also possible that the CPU optimistically assumes that there's no call-gate or whatever. – Beefburger.align
. 2) Check if it chooses to use rel32 for any other cases when it doesn't have to, like backward, or not across a.align
. I think you'll find that it's just forward branches across.align
that get rel32 when they don't need it, which definitely doesn't sound intentional the way GCC's similarrep ret
tuning for AMD K10 was. – Beefburger