I'm planning to run ROS Rviz in a docker on a remote server, expecting the Rviz GUI to display on my local computer. But I cannot get it done. Any help would be appreciated.
My ROS docker image on the remote server is based on ros-melodic-desktop-full image (According to ROS Using Hardware Acceleration with Docker, ros-melodic-desktop-full already contains nvidia-docker2). Listed below is my Dockerfile:
FROM osrf/ros:melodic-desktop-full
# strace, xterm, mesa-utils are all for debugging X display. Especially, mesa-utils has glxinfo and glxgear
RUN apt-get update && apt-get install -y xauth strace xterm mesa-utils
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES \
${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES \
${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics
# QT_X11_NO_MITSHM is for running X server and X client on different machines.
ENV QT_X11_NO_MITSHM 1
ENTRYPOINT ["/bin/bash"]
And my workflow is from this blog: Running a graphical app in a Docker container, on a remote server. Basically, I use socat as a pipe to connect Unix domain sockat and TCP port 60xx (xx is the current $DISPLAY value). Below is my workflow. I first login the remote server using ssh -X user@address
. Then on the server, I execute these commands (docker image name is ros-nvidia-gui:1.0):
DISPLAY_NUMBER=$(echo $DISPLAY | cut -d. -f1 | cut -d: -f2)
socat TCP4:localhost:60${DISPLAY_NUMBER} UNIX-LISTEN:/tmp/.X11-unix/X${DISPLAY_NUMBER} &
export DISPLAY=:$(echo $DISPLAY | cut -d. -f1 | cut -d: -f2)
docker run -it --rm \
-e DISPLAY=${DISPLAY} \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /home/deq/.Xauthority:/root/.Xauthority \
--hostname $(hostname) \
-e QT_X11_NO_MITSHM=1 \
-e QT_QPA_PLATFORM='offscreen' \
--runtime=nvidia \
--gpus all \
ros-nvidia-gui:1.0
Then I get into the docker container. When I run roscore & rviz
in the container, the following exception is thrown:
root@node3:/# rviz
[ INFO] [1587175060.603895335]: rviz version 1.13.7
[ INFO] [1587175060.603985593]: compiled against Qt version 5.9.5
[ INFO] [1587175060.604014712]: compiled against OGRE version 1.9.0 (Ghadamon)
[ INFO] [1587175060.620394536]: Forcing OpenGl version 0.
[ WARN] [1587175068.907551767]: OGRE EXCEPTION(3:RenderingAPIException): Couldn`t open X display :11 in GLXGLSupport::getXDisplay at /build/ogre-1.9-B6QkmW/ogre-1.9-1.9.0+dfsg1/RenderSystems/GL/src/GLX/OgreGLXGLSupport.cpp (line 832)
terminate called after throwing an instance of 'Ogre::RenderingAPIException'
what(): OGRE EXCEPTION(3:RenderingAPIException): Couldn`t open X display :11 in GLXGLSupport::getXDisplay at /build/ogre-1.9-B6QkmW/ogre-1.9-1.9.0+dfsg1/RenderSystems/GL/src/GLX/OgreGLXGLSupport.cpp (line 832)
Aborted (core dumped)
It seems something is wrong with the OpenGL library.
So I checked the OpenGL library in the docker. When I run glxgears
, the three gears pop out on my local computer. This suggests the whole X11-forwarding-in-docker-on-a-remote-server thing works fine, and the the OpenGL in the docker is also good.
Then I checked glxinfo
, and it outputs the following (I only list the lines related to rendering, OpenGL, mesa and omitted others):
name of display: :11
display: :11 screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
GLX version: 1.4
Extended renderer info (GLX_MESA_query_renderer):
Vendor: VMware, Inc. (0xffffffff)
Device: llvmpipe (LLVM 9.0, 256 bits) (0xffffffff)
Version: 19.2.8
Accelerated: no
Video memory: 257669MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 3.3
Max compat profile version: 3.1
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.0
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 9.0, 256 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 19.2.8
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 3.1 Mesa 19.2.8
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 19.2.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
In short, the whole workflow seems to be fine, except Rviz. The most confusing thing is why Rviz is "forcing OpenGl version 0". I understand mesa's OpenGL is at odds with nvidia's OpenGL, so I have also tried to purge mesa in the docker container, but Rviz is automatically removed with it. This suggests Rviz only uses mesa's OpenGL? So I also tried to run the docker with only mesa, i.e. deleting the --runtime=nvidia
and --gpus all
in the docker run
command, but the same exception is thrown.
Please offer some kind help to me! My final goal is to run Rviz in docker on a remote server, and display the GUI on my local computer. I need to use the GPU on the remote server to accelerate Rviz, either Mesa OpenGL or Nvidia OpenGL is fine. Thank you!
Edit:
I narrowed the problem down to Ogre. I followed Rviz's exception to the source code of GLXGLSupport::getXDisplay. It seems mXDisplay = XOpenDisplay(displayString);
is wrong. Then I searched the manual of XOpenDisplay, and the ":11" display value seems fine. Now I'm really confused.
forcing OpenGl version 0
." There is nothing weird about this, it is just rvis's default and it will lead to a requesting a legacy GL context. But your error isn't an OpenGL issue, but an issue not reaching the X11 server at all. – Envelopmentforcing OpenGL version 0
answer! But regarding reaching the X11 server, I think my setting is valid at least forglxgears
which can successfully display on my local computer. Do you mean the setting is not valid for Rviz? Maybe I should dig further into Rviz's display mechanism... – Geldglxinfo
example uses display:12
, whilerviz
used:11
. – Envelopmentglxinfo
was from another ssh session so the display is different. It still does not work even when the displays fromglxinfo
andrviz
are the same. – Geld