Replacing ld with gold - any experience?

Asked 13/8, 2010 at 10:46 Answered 25/12, 2018 at 9:58

Solved c++c linker migration gold-linker

Has anyone tried to use gold instead of ld?

gold promises to be much faster than ld, so it may help speeding up test cycles for large C++ applications, but can it be used as drop-in replacement for ld?

Can gcc/g++ directly call gold.?

Are there any know bugs or problems?

Although gold is part of the GNU binutils since a while, I have found almost no "success stories" or even "Howtos" in the Web.

(Update: added links to gold and blog entry explaining it)

Reine answered 13/8, 2010 at 10:46 Comment(0)

At the moment it is compiling bigger projects on Ubuntu 10.04. Here you can install and integrate it easily with the binutils-gold package (if you remove that package, you get your old ld). Gcc will automatically use gold then.

Some experiences:

gold doesn't search in /usr/local/lib
gold doesn't assume libs like pthread or rt, had to add them by hand
it is faster and needs less memory (the later is important on big C++ projects with a lot of boost etc.)

What does not work: It cannot compile kernel stuff and therefore no kernel modules. Ubuntu does this automatically via DKMS if it updates proprietary drivers like fglrx. This fails with ld-gold (you have to remove gold, restart DKMS, reinstall ld-gold.

Tushy answered 13/8, 2010 at 12:1 Comment(5)

Thanks, I think I'll give it a try - the restrictions you mention seem to be no problem in my case. – Reine 13/8, 2010 at 15:10

+1: thanks for sharing experience. What about about performance ? – Heated 13/6, 2011 at 10:12

it is significantly faster, especially on linking together huge static libraries to one binary but we did not made any measurements tough. – Tushy 16/6, 2011 at 19:36

@Heated My measurements were for linking many objects and .a files into a set of ~30 .so files (one largish, the rest small) and 1 executable for a significant commercial application. Measuring only link time and running make in serial, I got a total time of 22.48 sec with ld vs 16.24 sec with gold, for an improvement of 6.24 sec per build. However, if I run make in parallel with 8 processors, the total difference is only 1.42 sec per build. The overall memory usage was a 42% improvement, regardless of make parallelization. YMMV. – Impale 5/1, 2018 at 16:16

@metal: thanks a lot for the figures. The memory usage improvement looks great, ld is so greedy about it. – Heated 8/1, 2018 at 9:0

As it took me a little while to find out how to selectively use gold (i.e. not system-wide using a symlink), I'll post the solution here. It's based on http://code.google.com/p/chromium/wiki/LinuxFasterBuilds#Linking_using_gold .

Make a directory where you can put a gold glue script. I am using ~/bin/gold/.
Put the following glue script there and name it ~/bin/gold/ld:
```
#!/bin/bash
gold "$@"
```
Obviously, make it executable, chmod a+x ~/bin/gold/ld.
Change your calls to gcc to gcc -B$HOME/bin/gold which makes gcc look in the given directory for helper programs like ld and thus uses the glue script instead of the system-default ld.

Thyme answered 21/12, 2011 at 16:44 Comment(3)

That is needed for what operating system? As nob said in his answer, for Ubuntu just install the gold binutils-package and the compiler will use the it right away. Same for openSuse. – Hypoacidity 4/2, 2014 at 7:56

Yes, it is quite easy to replace ld system-wide. My answer was particularly geared towards how to use gold selectively. And in that case, I think, it is necessary for any OS. – Thyme 4/2, 2014 at 11:57

@vidstige Yes, the advantage of the script is that it looks for gold on the PATH. For a symlink, you'd need to point to the full path. – Thyme 5/7, 2016 at 16:9

Can gcc/g++ directly call gold.?

Just to complement the answers: there is a gcc's option -fuse-ld=gold (see gcc doc). Though, AFAIK, it is possible to configure gcc during the build in a way that the option will not have any effect.

Knothole answered 14/12, 2016 at 14:11 Comment(2)

-fuse-ld=gold is not complete. If you have to use -Wl,-fuse-ld=gold as it's used at the link-time. – Natatory 16/11, 2017 at 12:43

@Nawaz No, -Wl, is used to pass an option directly to ld; to use another linker you need to tell that to gcc. Please refer to the doc. – Raceway 25/9, 2019 at 10:15

Minimal synthetic benchmark: LD vs gold vs LLVM LLD

Outcome:

gold was about 3x to 4x faster for all values I've tried when using -Wl,--threads -Wl,--thread-count=$(nproc) to enable multithreading
LLD was about 2x faster than gold!

Tested on:

Ubuntu 20.04, GCC 9.3.0, binutils 2.34, sudo apt install lld LLD 10
Lenovo ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB), Samsung MZVLB512HAJQ-000L7 SSD (3,000 MB/s).

Simplified description of the benchmark parameters:

1: number of object files providing symbols
2: number of symbols per symbol provider object file
3: number of object files using all provided symbols symbols

Results for different benchmark parameters:

10000 10 10
nogold:  wall=4.35s user=3.45s system=0.88s 876820kB
gold:    wall=1.35s user=1.72s system=0.46s 739760kB
lld:     wall=0.73s user=1.20s system=0.24s 625208kB

1000 100 10
nogold:  wall=5.08s user=4.17s system=0.89s 924040kB
gold:    wall=1.57s user=2.18s system=0.54s 922712kB
lld:     wall=0.75s user=1.28s system=0.27s 664804kB

100 1000 10
nogold:  wall=5.53s user=4.53s system=0.95s 962440kB
gold:    wall=1.65s user=2.39s system=0.61s 987148kB
lld:     wall=0.75s user=1.30s system=0.25s 704820kB

10000 10 100
nogold:  wall=11.45s user=10.14s system=1.28s 1735224kB
gold:    wall=4.88s user=8.21s system=0.95s 2180432kB
lld:     wall=2.41s user=5.58s system=0.74s 2308672kB

1000 100 100
nogold:  wall=13.58s user=12.01s system=1.54s 1767832kB
gold:    wall=5.17s user=8.55s system=1.05s 2333432kB
lld:     wall=2.79s user=6.01s system=0.85s 2347664kB

100 1000 100
nogold:  wall=13.31s user=11.64s system=1.62s 1799664kB
gold:    wall=5.22s user=8.62s system=1.03s 2393516kB
lld:     wall=3.11s user=6.26s system=0.66s 2386392kB

This is the script that generates all the objects for the link tests:

generate-objects

#!/usr/bin/env bash
set -eu

# CLI args.

# Each of those files contains n_ints_per_file ints.
n_int_files="${1:-10}"
n_ints_per_file="${2:-10}"

# Each function adds all ints from all files.
# This leads to n_int_files x n_ints_per_file x n_funcs relocations.
n_funcs="${3:-10}"

# Do a debug build, since it is for debug builds that link time matters the most,
# as the user will be recompiling often.
cflags='-ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic'

# Cleanup previous generated files objects.
./clean

# Generate i_*.c, ints.h and int_sum.h
rm -f ints.h
echo 'return' > int_sum.h
int_file_i=0
while [ "$int_file_i" -lt "$n_int_files" ]; do
  int_i=0
  int_file="${int_file_i}.c"
  rm -f "$int_file"
  while [ "$int_i" -lt "$n_ints_per_file" ]; do
    echo "${int_file_i} ${int_i}"
    int_sym="i_${int_file_i}_${int_i}"
    echo "unsigned int ${int_sym} = ${int_file_i};" >> "$int_file"
    echo "extern unsigned int ${int_sym};" >> ints.h
    echo "${int_sym} +" >> int_sum.h
    int_i=$((int_i + 1))
  done
  int_file_i=$((int_file_i + 1))
done
echo '1;' >> int_sum.h

# Generate funcs.h and main.c.
rm -f funcs.h
cat <<EOF >main.c
#include "funcs.h"

int main(void) {
return
EOF
i=0
while [ "$i" -lt "$n_funcs" ]; do
  func_sym="f_${i}"
  echo "${func_sym}() +" >> main.c
  echo "int ${func_sym}(void);" >> funcs.h
  cat <<EOF >"${func_sym}.c"
#include "ints.h"

int ${func_sym}(void) {
#include "int_sum.h"
}
EOF
  i=$((i + 1))
done
cat <<EOF >>main.c
1;
}
EOF

# Generate *.o
ls | grep -E '\.c$' | parallel --halt now,fail=1 -t --will-cite "gcc $cflags -c -o '{.}.o' '{}'"

GitHub upstream.

Note that the object file generation can be quite slow, since each C file can be quite large.

Given an input of type:

./generate-objects [n_int_files [n_ints_per_file [n_funcs]]]

it generates:

main.c

#include "funcs.h"

int main(void) {
    return f_0() + f_1() + ... + f_<n_funcs>();
}

f_0.c, f_1.c, ..., f_<n_funcs>.c

extern unsigned int i_0_0;
extern unsigned int i_0_1;
...
extern unsigned int i_1_0;
extern unsigned int i_1_1;
...
extern unsigned int i_<n_int_files>_<n_ints_per_file>;

int f_0(void) {
    return
    i_0_0 +
    i_0_1 +
    ...
    i_1_0 +
    i_1_1 +
    ...
    i_<n_int_files>_<n_ints_per_file>
}

0.c, 1.c, ..., <n_int_files>.c

unsigned int i_0_0 = 0;
unsigned int i_0_1 = 0;
...
unsigned int i_0_<n_ints_per_file> = 0;

which leads to:

n_int_files x n_ints_per_file x n_funcs

relocations on the link.

Then I compared:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic               -o main *.o
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -fuse-ld=gold -Wl,--threads -Wl,--thread-count=`nproc` -o main *.o
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -fuse-ld=lld  -o main *.o

Some limits I've been trying to mitigate when selecting the test parameters:

at 100k C files, both methods get failed mallocs occasionally
GCC cannot compile a function with 1M additions

I have also observed a 2x in the debug build of gem5: https://gem5.googlesource.com/public/gem5/+/fafe4e80b76e93e3d0d05797904c19928587f5b5

Phoronix benchmarks

Phoronix did some benchmarking in 2017 for some real world projects, but for the projects they examined, the gold gains were not so significant: https://www.phoronix.com/scan.php?page=article&item=lld4-linux-tests&num=2 (archive).

Known incompatibilities

gold
- https://sourceware.org/bugzilla/show_bug.cgi?id=23869 gold failed if I do a partial link with LD and then try the final link with gold. lld worked on the same test case.
- https://github.com/cirosantilli/linux-kernel-module-cheat/issues/109 my debug symbols appeared broken in some places

LLD benchmarks

At https://lld.llvm.org/ they give build times for a few well known projects. with similar results to my synthetic benchmarks. Project/linker versions are not given unfortunately. In their results:

gold was about 3x/4x faster than LD
LLD was 3x/4x faster than gold, so a greater speedup than in my synthetic benchmark

They comment:

This is a link time comparison on a 2-socket 20-core 40-thread Xeon E5-2680 2.80 GHz machine with an SSD drive. We ran gold and lld with or without multi-threading support. To disable multi-threading, we added -no-threads to the command lines.

and results look like:

Program      | Size     | GNU ld  | gold -j1 | gold    | lld -j1 |    lld
-------------|----------|---------|----------|---------|---------|-------
  ffmpeg dbg |   92 MiB |   1.72s |   1.16s  |   1.01s |   0.60s |  0.35s
  mysqld dbg |  154 MiB |   8.50s |   2.96s  |   2.68s |   1.06s |  0.68s
   clang dbg | 1.67 GiB | 104.03s |  34.18s  |  23.49s |  14.82s |  5.28s
chromium dbg | 1.14 GiB | 209.05s |  64.70s  |  60.82s |  27.60s | 16.70s

Joselow answered 25/12, 2018 at 9:58 Comment(3)

I can confirm your findings, I see similar speedup for linking my projects. See also benchmarks here lld.llvm.org – Halvaard 26/11, 2020 at 22:55

I think one more important datapoint is memory usage. For example, ld simply OoM on my multicore machine because each spawned (per core) ld process ate up to 2GB. gold was about x2-3 better and lld better still. – Lavalley 6/6, 2022 at 16:45

@DanM. yes, good to know. I had never hit memory issues before, so I didn't think of benchmarking. I would likely instaquit based on time of that link though! – Joselow 6/6, 2022 at 16:58

As a Samba developer, I have been using the gold linker almost exclusively on Ubuntu, Debian, and Fedora since several years now. My assessment:

gold is many times (felt: 5-10 times) faster than the classical linker.
Initially, there were a few problems, but they have gone since roughly around Ubuntu 12.04.
The gold linker even found some dependency problems in our code, since it seems to be more correct than the classical one with respect to some details. See, e.g. this Samba commit.

I have not used gold selectively, but have been using symlinks or the alternatives mechanism if the distribution provides it.

Ankylostomiasis answered 12/6, 2015 at 11:17 Comment(0)

You could link ld to gold (in a local binary directory if you have ld installed to avoid overwriting):

ln -s `which gold` ~/bin/ld

ln -s `which gold` /usr/local/bin/ld

Dickman answered 13/8, 2010 at 11:0 Comment(2)

this is also possible with LLD, ld -s $(which lld) /usr/bin/ld, just make sure to remove /usr/bin/ld first. – Gotcher 8/5, 2022 at 5:56

s this a safe thing to do? Does gold have same possible parameters than ld??? – Bergen 13/2, 2023 at 7:59

Some projects seem to be incompatible with gold, because of some incompatible differences between ld and gold. Example: OpenFOAM, see http://www.openfoam.org/mantisbt/view.php?id=685 .

Underhand answered 23/8, 2015 at 14:9 Comment(0)

DragonFlyBSD switched over to gold as their default linker. So it seems to be ready for a variety of tools.
More details: http://phoronix.com/scan.php?page=news_item&px=DragonFlyBSD-Gold-Linker

Hypoacidity answered 25/11, 2015 at 6:16 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags