Run Rviz from remote docker using X11
Asked Answered
G

0

8

I'm planning to run ROS Rviz in a docker on a remote server, expecting the Rviz GUI to display on my local computer. But I cannot get it done. Any help would be appreciated.

My ROS docker image on the remote server is based on ros-melodic-desktop-full image (According to ROS Using Hardware Acceleration with Docker, ros-melodic-desktop-full already contains nvidia-docker2). Listed below is my Dockerfile:

FROM osrf/ros:melodic-desktop-full

# strace, xterm, mesa-utils are all for debugging X display. Especially, mesa-utils has glxinfo and glxgear
RUN apt-get update && apt-get install -y xauth strace xterm mesa-utils

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES \
    ${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES \
    ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics

# QT_X11_NO_MITSHM is for running X server and X client on different machines.
ENV QT_X11_NO_MITSHM 1

ENTRYPOINT ["/bin/bash"]

And my workflow is from this blog: Running a graphical app in a Docker container, on a remote server. Basically, I use socat as a pipe to connect Unix domain sockat and TCP port 60xx (xx is the current $DISPLAY value). Below is my workflow. I first login the remote server using ssh -X user@address. Then on the server, I execute these commands (docker image name is ros-nvidia-gui:1.0):

DISPLAY_NUMBER=$(echo $DISPLAY | cut -d. -f1 | cut -d: -f2)

socat TCP4:localhost:60${DISPLAY_NUMBER} UNIX-LISTEN:/tmp/.X11-unix/X${DISPLAY_NUMBER} &

export DISPLAY=:$(echo $DISPLAY | cut -d. -f1 | cut -d: -f2)

docker run -it --rm \
    -e DISPLAY=${DISPLAY} \
    -v /tmp/.X11-unix:/tmp/.X11-unix \
    -v /home/deq/.Xauthority:/root/.Xauthority \
    --hostname $(hostname) \
    -e QT_X11_NO_MITSHM=1 \
    -e QT_QPA_PLATFORM='offscreen' \
    --runtime=nvidia \
    --gpus all \
    ros-nvidia-gui:1.0

Then I get into the docker container. When I run roscore & rviz in the container, the following exception is thrown:

root@node3:/# rviz
[ INFO] [1587175060.603895335]: rviz version 1.13.7
[ INFO] [1587175060.603985593]: compiled against Qt version 5.9.5
[ INFO] [1587175060.604014712]: compiled against OGRE version 1.9.0 (Ghadamon)
[ INFO] [1587175060.620394536]: Forcing OpenGl version 0.
[ WARN] [1587175068.907551767]: OGRE EXCEPTION(3:RenderingAPIException): Couldn`t open X display :11 in GLXGLSupport::getXDisplay at /build/ogre-1.9-B6QkmW/ogre-1.9-1.9.0+dfsg1/RenderSystems/GL/src/GLX/OgreGLXGLSupport.cpp (line 832)
terminate called after throwing an instance of 'Ogre::RenderingAPIException'
  what():  OGRE EXCEPTION(3:RenderingAPIException): Couldn`t open X display :11 in GLXGLSupport::getXDisplay at /build/ogre-1.9-B6QkmW/ogre-1.9-1.9.0+dfsg1/RenderSystems/GL/src/GLX/OgreGLXGLSupport.cpp (line 832)
Aborted (core dumped)

It seems something is wrong with the OpenGL library.

So I checked the OpenGL library in the docker. When I run glxgears, the three gears pop out on my local computer. This suggests the whole X11-forwarding-in-docker-on-a-remote-server thing works fine, and the the OpenGL in the docker is also good.

Then I checked glxinfo, and it outputs the following (I only list the lines related to rendering, OpenGL, mesa and omitted others):

name of display: :11
display: :11  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4

client glx vendor string: Mesa Project and SGI
client glx version string: 1.4

GLX version: 1.4

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: VMware, Inc. (0xffffffff)
    Device: llvmpipe (LLVM 9.0, 256 bits) (0xffffffff)
    Version: 19.2.8
    Accelerated: no
    Video memory: 257669MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 9.0, 256 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 19.2.8
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.1 Mesa 19.2.8
OpenGL shading language version string: 1.40
OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.0 Mesa 19.2.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00

In short, the whole workflow seems to be fine, except Rviz. The most confusing thing is why Rviz is "forcing OpenGl version 0". I understand mesa's OpenGL is at odds with nvidia's OpenGL, so I have also tried to purge mesa in the docker container, but Rviz is automatically removed with it. This suggests Rviz only uses mesa's OpenGL? So I also tried to run the docker with only mesa, i.e. deleting the --runtime=nvidia and --gpus all in the docker run command, but the same exception is thrown.

Please offer some kind help to me! My final goal is to run Rviz in docker on a remote server, and display the GUI on my local computer. I need to use the GPU on the remote server to accelerate Rviz, either Mesa OpenGL or Nvidia OpenGL is fine. Thank you!


Edit:

I narrowed the problem down to Ogre. I followed Rviz's exception to the source code of GLXGLSupport::getXDisplay. It seems mXDisplay = XOpenDisplay(displayString); is wrong. Then I searched the manual of XOpenDisplay, and the ":11" display value seems fine. Now I'm really confused.

Geld answered 18/4, 2020 at 3:9 Comment(5)
"The most confusing thing is why Rviz is forcing OpenGl version 0." There is nothing weird about this, it is just rvis's default and it will lead to a requesting a legacy GL context. But your error isn't an OpenGL issue, but an issue not reaching the X11 server at all.Envelopment
@Envelopment Thanks about the forcing OpenGL version 0 answer! But regarding reaching the X11 server, I think my setting is valid at least for glxgears which can successfully display on my local computer. Do you mean the setting is not valid for Rviz? Maybe I should dig further into Rviz's display mechanism...Geld
not sure if it means something, but your glxinfo example uses display :12, while rviz used :11.Envelopment
@Envelopment I'm really sorry. I edited my original question. The glxinfo was from another ssh session so the display is different. It still does not work even when the displays from glxinfo and rviz are the same.Geld
X11 forwarding with ssh uses the GLX X11 protocol to run OpenGL. GLX only supports up to OpenGL 1.4, which I believe to old to run RViz. If you want RViz rendering locally than run a local RViz and connect it via ROS to your remote computer. Otherwise if you really want RViz running on the remote machine you will need different protocol such as VNC which will allow for a newer version of OpenGL.Bresnahan

© 2022 - 2024 — McMap. All rights reserved.