Compilation of IORef and STRef
Asked Answered
L

1

5

To measure the performance of those Refs, I dumped the assembly produced by GHC on the following code :

import Data.IORef

main = do
  r <- newIORef 18
  v <- readIORef r
  print v

I expected the IORef to be completely optimized away, leaving only a syscall to write stdout with string "18". Instead I get 250 lines of assembly. Do you know how many will actually be executed ? Here is what I think is the heart of the program :

.globl Main.main1_info
Main.main1_info:
_c1Zi:
    leaq -8(%rbp),%rax
    cmpq %r15,%rax
    jb _c1Zj
_c1Zk:
    movq $block_c1Z9_info,-8(%rbp)
    movl $Main.main2_closure+1,%ebx
    addq $-8,%rbp
    jmp stg_newMutVar#
_c1Zn:
    movq $24,904(%r13)
    jmp stg_gc_unpt_r1
.align 8
    .long   S1Zo_srt-(block_c1Z9_info)+0
    .long   0
    .quad   0
    .quad   30064771104
block_c1Z9_info:
_c1Z9:
    addq $24,%r12
    cmpq 856(%r13),%r12
    ja _c1Zn
_c1Zm:
    movq 8(%rbx),%rax
    movq $sat_s1Z2_info,-16(%r12)
    movq %rax,(%r12)
    movl $GHC.Types.True_closure+2,%edi
    leaq -16(%r12),%rsi
    movl $GHC.IO.Handle.FD.stdout_closure,%r14d
    addq $8,%rbp
    jmp GHC.IO.Handle.Text.hPutStr2_info
_c1Zj:
    movl $Main.main1_closure,%ebx
    jmp *-8(%r13)

I am concerned about this jmp stg_newMutVar#. It is nowhere else in the assembly, so maybe GHC resolves it at a later linking stage. But why is it even here and what does it do ? Can I dump the final assembly without any unresolved haskell symbols ?

Lunar answered 29/8, 2016 at 9:25 Comment(1)
I doubt GHC can perform any relevant optimization on the IO refs. GCC does that on the equivalent malloc snippet, but that is far more idiomatic code in C than in Haskell. The stg_ stuff is the GHC runtime, it should be implemented in C somewhere in the GHC sources. Not for the faint of heart.Thuythuya
B
11

Starting with a couple of links:

The cmm and C sources aren't particularly readable if you're not already familiar with the macros and primops. Unfortunately, I don't know of a good way to view the assembly generated for cmm primops, short of looking into an executable with objdump or some other disassembler.

Still, I can summarize the runtime semantics of IORef.

IORef is a wrapper around MutVar# from GHC.Prim. As the doc says, MutVar# is like a single-element mutable array. It takes up two machine words, the first is the header, the second is the stored value (which is a pointer to a GHC object). A value of MutVar# is itself a pointer to this two-word object.

MutVar-s differ from normal immutable objects most notably by participating in a write barrier mechanism. GHC has generational garbage collection, so any MutVar that lives in an older generation must be also a GC root when collecting the younger generations, since mutating a MutVar may cause younger objects to become reachable. Therefore, whenever a MutVar is promoted from generation 0 (the youngest), it is added to a so-called "mutable list" that contains references to all such mutable objects. The mutable list gets rebuilt during GC of old generations. In short, MutVar-s in old generations are always present on the mutable list.

This is a rather simplistic way of dealing with mutable variables, and if we have large numbers of them in old generations, minor garbage collection slows down because of the bloated mutable list, and as a result the entire program slows down.

Since mutable variables aren't used prominently in production code, there hasn't been much demand or pressure for optimizing the RTS for their heavy usage.

If you need a large number of mutable variables, you should instead use a single mutable boxed array, because that's only a single reference on the mutable list and also has a bitmap-based optimization for GC traversal of elements that might have been mutated.

Also, as you see newMutVar# is only statically linked but not inlined, although it's a rather small chunk of code. As a result, it's also not optimized away. This is again broadly because of the lack of effort and attention for optimizing mutating code. By contrast, allocating and copying small known-sized primitive arrays is currently inlined and greatly optimized, because Johan Tibell who did large amount of work implementing the unordered-containers library made it so (in order to make unordered-containers faster).

Bouse answered 29/8, 2016 at 10:23 Comment(3)
Thanks, this is clear. I am mostly interested in vector algorithms, so it is good to know that GHC only uses one IORef for the whole vector. For an unboxed vector of Ints, is reading or writing a cell more expensive than the same operation on an int[] in C ?Lunar
@V.Semeria unboxed arrays (mutable or immutable) don't have any GC cost, since GC only needs to follow pointers to heap objects. Unboxed array operations have the same cost as in C.Galluses
But note that unsafe operations on boxed arrays have too the same cost as in C, only with mutable boxed arrays do we pay extra GC cost (immutable arrays don't have the mentioned GC overhead).Galluses

© 2022 - 2024 — McMap. All rights reserved.