Understanding the SBCL entry/exit assembly boiler plate code
Asked Answered
R

2

13

BACKGROUND

When using 64bit Steel Bank Common Lisp on Windows for a trivial identity function:

(defun a (x)
   (declare (fixnum x)) 
   (declare (optimize (speed 3) (safety 0))) 
  (the fixnum x))

I find the disassembly is given as:

* (disassemble 'a)

; disassembly for A
; Size: 13 bytes
; 02D7DFA6:       84042500000F20   TEST AL, [#x200F0000]      ; safepoint
                                                              ; no-arg-parsing entry point
;       AD:       488BE5           MOV RSP, RBP
;       B0:       F8               CLC
;       B1:       5D               POP RBP
;       B2:       C3               RET

I understand that the lines:

mov rsp, rbp
pop rbp
ret  

perform standard return from function operations, but I don't understand why there are the lines:

TEST AL, [#x200F0000]  // My understanding is that this sets flags based on bitwise and of AL and contents of memory 0x200F0000

and

CLC // My understanding is that this clears the carry flag.

QUESTIONS

  1. Why does SBCL generate a test instruction, but never use the flags?
  2. Why does SBCL clear the carry flag before returning from a function?
Regenerative answered 18/2, 2014 at 16:11 Comment(9)
Calling convention maybe? Is this with optimizations enabled?Algor
My understanding is that my optimize declaration specifier is telling it to compile for speed not safety. (Without these the code is much longer)Regenerative
What version of SBCL? I do not get the TEST AL in either SBCL or Allegro Lisp.Hord
@AndrewMyers 64bit for WindowsRegenerative
@PeterdeRivaz Ah, I'm using 64bit Linux, perhaps that's important.Hord
@AndrewMyers: Confirmed. 64 bit Linux, there's no test instruction. Probably something related to the Windows ABI.Encroach
On at least one of the platforms SBCL uses the status flags to communicate whether it is a single value return or not.Cassino
Since I may only edit comments for five minutes here's link to the relevant code (with comments): compiler codeCassino
@PhilippMatthiasSchäfer Thanks a lot, I think I will learn a lot from trying to understand that code! I was pleasantly surprised to see that the SBCL Lisp compiler is written in Lisp :)Regenerative
R
8

As the disassembler hints, the TEST instruction is a safepoint. It's used for synchronizing threads for the garbage collector. Safepoints are inserted in places where the compiler knows the thread is in a safe state for garbage collection to occur.

The form of the safepoint is defined in compiler/x86-64/macros.lisp:

#!+sb-safepoint
(defun emit-safepoint ()
  (inst test al-tn (make-ea :byte :disp sb!vm::gc-safepoint-page-addr)))

You are of course correct about the result of the operation not being used. In this case, SBCL is interested in a side effect of the operation. Specifically, if the page containing the address happens to be protected, the instruction generates a page fault. If the page is accessible, the instruction just wastes a very small amount of time. I should point out this is probably much, much, faster than simply checking a global variable.

On Windows, the C functions map_gc_page and unmap_gc_page in runtime/win32-os.c are used to map and unmap the page:

void map_gc_page()
{
    DWORD oldProt;
    AVER(VirtualProtect((void*) GC_SAFEPOINT_PAGE_ADDR, sizeof(lispobj),
                        PAGE_READWRITE, &oldProt));
}

void unmap_gc_page()
{
    DWORD oldProt;
    AVER(VirtualProtect((void*) GC_SAFEPOINT_PAGE_ADDR, sizeof(lispobj),
                        PAGE_NOACCESS, &oldProt));
}

Unfortunately I haven't been able to track down the page fault handler, but the general idea seems to be that when a collection is needed, unmap_gc_page will be called. Each thread will continue running until it hits one of these safepoints, and then a page fault occurs. Presumably the page fault handler would then pause that thread, and then when all threads have been paused, garbage collection runs, and then map_gc_page is called again and the threads are allowed to resume.

The credits file honors Anton Kovalenko with introducing this mechanism.

On Linux and Mac OS X, a different synchronization mechanism is used by default, which is why the instruction isn't generated on default builds for those platforms. (I'm not sure if the PowerPC ports use safepoints by default, but obviously they don't use x86 instructions).

On the other hand, I have no idea about the CLC instruction.

Refuse answered 13/5, 2014 at 23:7 Comment(2)
Wow! That is fascinating! With out-of-order processors I wonder if there is a risk that the function might start doing some unsafe operations, but by adding the CLC instruction it forces the safepoint check to have completed?Regenerative
@PeterdeRivaz, The CLC instruction shows up on non-safepoint builds, so I suspect it's not related to safepoints.Refuse
L
5

I know nothing about TEST AL, [#x200F0000], but I believe that CLC is for functions that return one value. SBCL Internals Manual, "Unknown-Values Returns", suggests that functions set the carry flag if they return multiple values, or clear the carry flag if they return one value.

I am running SBCL 1.1.14 with OpenBSD and x86-64. I can see CLC and SEC if I disassemble a function that returns one value, and a function that returns multiple values:

CL-USER> (disassemble (lambda () 100))
; disassembly for (LAMBDA ())
; Size: 16 bytes
; 04B36F64:       BAC8000000       MOV EDX, 200               ; no-arg-parsing entry point
;       69:       488BE5           MOV RSP, RBP
;       6C:       F8               CLC
;       6D:       5D               POP RBP
;       6E:       C3               RET
;       6F:       CC0A             BREAK 10                   ; error trap
;       71:       02               BYTE #X02
;       72:       19               BYTE #X19                  ; INVALID-ARG-COUNT-ERROR
;       73:       9A               BYTE #X9A                  ; RCX
NIL

This one has CLC (clear carry) because it returns one value.

CL-USER> (disassemble (lambda () (values 100 200)))
; disassembly for (LAMBDA ())
; Size: 35 bytes
; 04B82BD4:       BAC8000000       MOV EDX, 200               ; no-arg-parsing entry point
;       D9:       BF90010000       MOV EDI, 400
;       DE:       488D5D10         LEA RBX, [RBP+16]
;       E2:       B904000000       MOV ECX, 4
;       E7:       BE17001020       MOV ESI, 537919511
;       EC:       F9               STC
;       ED:       488BE5           MOV RSP, RBP
;       F0:       5D               POP RBP
;       F1:       C3               RET
;       F2:       CC0A             BREAK 10                   ; error trap
;       F4:       02               BYTE #X02
;       F5:       19               BYTE #X19                  ; INVALID-ARG-COUNT-ERROR
;       F6:       9A               BYTE #X9A                  ; RCX
NIL

This one has STC (set carry) because it returns two values.

Latticework answered 11/6, 2014 at 21:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.