xvfb-run unreliable when multiple instances invoked in parallel
Asked Answered
E

2

20

Can you help me, why I get sometimes (50:50):

webkit_server.NoX11Error: Cannot connect to X. You can try running with xvfb-run.

When I start the script in parallel as:

xvfb-run -a python script.py 

You can reproduce this yourself like so:

for ((i=0; i<10; i++)); do
  xvfb-run -a xterm &
done

Of the 10 instances of xterm this starts, 9 of them will typically fail, exiting with the message Xvfb failed to start.

Extant answered 19/5, 2015 at 17:20 Comment(6)
What in particular have you tried? We wouldn't want to suggest what you know won't work.Pressey
that said, you say "parallely" -- meaning you're starting a number of instances at the same time? I could see there being a race condition in port allocation.Infinitude
Try passing a unique --server-num argument to each instance of xvfb-run, rather than relying on -a to handle potential races correctly.Infinitude
@CharlesDuffy My mistake, I am sorry. Thank you for you attention.Mendelevium
Could you be explicit about which version of xvfb-run is installed? The code given in github.com/revnode/xvfb-run/blob/master/xvfb-run -- three years old, which supposedly corresponds to the 1.0 release -- is definitely prone to race conditions, and unsafe when multiple copies are started at precisely the same time.Infinitude
I've tried to improve the question -- it now shows how folks can reproduce the problem themselves, and no longer requires a script.py which isn't shown.Infinitude
I
39

Looking at xvfb-run 1.0, it operates as follows:

# Find a free server number by looking at .X*-lock files in /tmp.
find_free_servernum() {
    # Sadly, the "local" keyword is not POSIX.  Leave the next line commented in
    # the hope Debian Policy eventually changes to allow it in /bin/sh scripts
    # anyway.
    #local i

    i=$SERVERNUM
    while [ -f /tmp/.X$i-lock ]; do
        i=$(($i + 1))
    done
    echo $i
}

This is very bad practice: If two copies of find_free_servernum run at the same time, neither will be aware of the other, so they both can decide that the same number is available, even though only one of them will be able to use it.

So, to fix this, let's write our own code to find a free display number, instead of assuming that xvfb-run -a will work reliably:

#!/bin/bash

# allow settings to be updated via environment
: "${xvfb_lockdir:=$HOME/.xvfb-locks}"
: "${xvfb_display_min:=99}"
: "${xvfb_display_max:=599}"

# assuming only one user will use this, let's put the locks in our own home directory
# avoids vulnerability to symlink attacks.
mkdir -p -- "$xvfb_lockdir" || exit

i=$xvfb_display_min     # minimum display number
while (( i < xvfb_display_max )); do
  if [ -f "/tmp/.X$i-lock" ]; then                # still avoid an obvious open display
    (( ++i )); continue
  fi
  exec 5>"$xvfb_lockdir/$i" || continue           # open a lockfile
  if flock -x -n 5; then                          # try to lock it
    exec xvfb-run --server-num="$i" "$@" || exit  # if locked, run xvfb-run
  fi
  (( i++ ))
done

If you save this script as xvfb-run-safe, you can then invoke:

xvfb-run-safe python script.py 

...and not worry about race conditions so long as no other users on your system are also running xvfb.


This can be tested like so:

for ((i=0; i<10; i++)); do xvfb-wrap-safe xchat & done

...in which case all 10 instances correctly start up and run in the background, as opposed to:

for ((i=0; i<10; i++)); do xvfb-run -a xchat & done

...where, depending on your system's timing, nine out of ten will (typically) fail.

Infinitude answered 19/5, 2015 at 21:23 Comment(6)
Nice. I just needed something like this for parallel runs of script xvfb-run -- and this works like a charm. So thanks!Shemikashemite
I wish this change was incorporated in xvfb-run.Hest
This answer still applies as of writing this, using xvfb 1.19.6-1ubuntu4 on Ubuntu 18.04.1 LTSPaddle
This is extremely helpful for usage in CI/CD environments with parallel builds, thank you!Conney
@Jan, ...mind, I generally advise having your CI/CD builds happen in distinct filesystem namespaces -- if they don't share a /tmp, you can't get collisions regardless.Infinitude
This helped me run two parallel xvfb-run instances for prod/dev purposes on the same ec2 machine. Thanks a lot!Bootleg
B
0

This questions was asked in 2015.

In my version of xvfb (2:1.20.13-1ubuntu1~20.04.2), this problem has been fixed.

It looks at /tmp/.X*-lock to find an available port, and then runs Xvfb. If Xvfb fails to start, it finds a new port and retries, up to 10 times.

Beatriz answered 6/2, 2022 at 5:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.