Docker's `docker0` device dies repeatedly (`inet addr` disappears)
Asked Answered
E

5

14

I'm running Docker version 1.4.1, build 5bc2ff8 on Ubuntu 14.04. When I docker run any container, after a few minutes my docker0 bridge "dies", and the container stops being able to reach the network. Before the connection dies, running ifconfig reports a docker0 device with an inet addr like:

docker0   Link encap:Ethernet  HWaddr 56:84:7a:fe:97:99  
          inet addr:172.17.42.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: xxxx::xxxx:xxxx:xxxx:xxxx/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          [... etc.]

But after the connection dies, ifconfig shows that the ipv4 address has gone away:

docker0   Link encap:Ethernet  HWaddr 56:84:7a:fe:97:99 
          inet6 addr: xxxx::xxxx:xxxx:xxxx:xxxx/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8116 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15995 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:2444859 (2.4 MB)  TX bytes:17440729 (17.4 MB)

Restarting docker, e.g. with sudo service docker restart, brings the device back up -- but all my containers die and the problem starts over again. I can't reliably get anything to run for more than a few minutes at a time. Not long enough to even complete a docker build for most projects.

  1. What could be causing this?
  2. How can I diagnose it?
  3. What are some possible solutions?

Thanks!


Update: I can reliably trigger this docker0-dropping behavior simply by starting a container with docker run -t -i ubuntu /bin/bash, and then exiting with ctrl-d. When I do so, here's what I see in /var/log/syslog

myhost kernel:  docker0: port 1(veth80ddeaf) entered disabled state
myhost kernel:  device veth80ddeaf left promiscuous mode
myhost kernel:  docker0: port 1(veth80ddeaf) entered disabled state
'

myhost dhclient: Internet Systems Consortium DHCP Client 4.2.4
myhost dhclient: Copyright 2004-2012 Internet Systems Consortium.
myhost dhclient: All rights reserved.
myhost dhclient: For info, please visit https://www.isc.org/software/dhcp/
myhost dhclient: 
myhost dhclient: Listening on LPF/docker0/56:84:7a:fe:97:99
myhost dhclient: Sending on   LPF/docker0/56:84:7a:fe:97:99
myhost dhclient: Sending on   Socket/fallback
myhost kernel:  IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready

Update #2: The frequency of failure seems to depend on how long the container runs. For example:

docker run -i -t  ubuntu   sleep 0
--> `docker0` "survives" ~100% of the time

docker run -i -t  ubuntu   sleep 1
--> `docker0` survives ~80% of the time

docker run -i -t  ubuntu   sleep 5
--> `docker0` survives ~0% of the time
Endometrium answered 8/3, 2015 at 6:26 Comment(2)
Try to specify --net="host" as workaround. Will it works?Dorisdorisa
@maxd specifying --net="host" would prevent me from containerizing the container's networking. So that's not a solution to my problem -- thanks for the suggestion though!Endometrium
R
8

TL;DR

sudo apt remove netscript-2.4

sudo systemctl restart docker

Explanation

I have had a similar problem: every time I restarted docker, docker0 bridge went up, and then, as soon as I executed docker run hello-world and the program exited, it was gone. I could not get the hello-world to work again because docker0 vanished.

So I checked the system log (syslog via gnome-system-log) from a PC where docker worked regularly and the PC I was having this problem, the logs from the hello world command were a bit different in order, but the same in essence. But here's something I noticed: on the problematic PC, docker was using netscript to handle network interfaces, as soon as I removed it through sudo apt remove netscript-2.4 and restarted docker with sudo systemctl restart docker, everything went back to normal.

Recreation answered 3/8, 2021 at 4:4 Comment(2)
The only issue with your response that it is not very visible :D, solved my issue like a charm !Duke
I just upgraded from ubuntu 20.04 to 22.04 and some of my docker-compose services wouldn't come up and this was the fix. I originally dismissed this as a fix but I saw Usage: netscript ifup|ifdown|ifqos|ifreload in /var/log/syslog around the docker error and it seemed like compose or docker was trying to use netscript incorrectly.Crispin
R
3

How can I diagnose it?

When docker0 has an ip address, does it go away if you don't start any containers? If it persists indefinitely until you start a container, I would start by looking at the Docker logs as well as tailing the system logs when you start a container.

Does the ip address disappear at set intervals (e.g., every N minutes)? If so, I would look for logs from cron to see if some periodic task is responsible.

Are you running NetworkManager? Does disabling NetworkManager make the problem go away? I am running Docker on a system with NetworkManager without a problem, but I have no-auto-default=* set in my config, which may have an impact on this sort of thing.

Update

This is very suspicious:

myhost dhclient: Internet Systems Consortium DHCP Client 4.2.4
myhost dhclient: Copyright 2004-2012 Internet Systems Consortium.
myhost dhclient: All rights reserved.
myhost dhclient: For info, please visit https://www.isc.org/software/dhcp/
myhost dhclient: 
myhost dhclient: Listening on LPF/docker0/56:84:7a:fe:97:99
myhost dhclient: Sending on   LPF/docker0/56:84:7a:fe:97:99
myhost dhclient: Sending on   Socket/fallback

There should not be any dhclient process listening on docker0, and this is absolutely what is causing your ip address to disappear. If you are not explicitly running a dhcp client on this interface, this really suggests that NetworkManager is in fact trying to manage this interface. You said you disabled NetworkManager, but did you confirm that the process was stopped? What is the parent process of the dhclient that is listening on docker0? If you stop the dhclient process, does it get restarted? Does the problem go away?

Renshaw answered 9/3, 2015 at 1:33 Comment(5)
Thanks @Renshaw --great suggestions! I can report that the behavior 1) does not occur if I don't start any docker containers, 2) is not time-based, and 3) persists when network-manager is disabled. I've included a relevant syslog dump in an update to my question above. I'd appreciate any additional guidance.Endometrium
See additional update suggesting a race condition (duration of container live appears to affect the probability of failure). Any further thoughts about how to debug?Endometrium
When I run ps aux | grep -i dhclient and ps aux | grep -i network on my host system, I can confirm there are no results (dhclient is not running). But I do see these myhost dhclient entries in syslog -- possibly dhclient is somehow being invoked because of the container's stop event? Is there some way I can investigate this possibility?Endometrium
I'd love to get your take on this -- it appears that no dhclient is running on my host system before/after I start a new docker container. Is it possible that the launch of dhclient is somehow being triggered only when the container exits, or that some other process (called something other than dhclient is causing the myhost dhclient log lines to appear in my syslog)?Endometrium
I got the same problem here (dhclient started right after docker container exit). I thought as well that NetworkManager was the problem. But the problem was in fact related to wicd (wired and wireless network manager) another network manager installed on my machine. Disabling wicd fixed the problem for me.Cottontail
H
3

The wcid service seems to be the cause indeed. I found in the config :

(/etc/wicd/manager-settings.conf): wired_interface = docker0

I changed this to eth0.

I forgot to restart the service, but my issue disappeared when the wicd service was stopped. After the change above I started it again I had no more issues with it.

Apparently some autoconfig issue with wicd?

To get the bridge working again, you can use:

sudo ip addr add 172.17.0.1/24 dev docker0

and the bridge will get the IP back.

Heflin answered 12/9, 2016 at 10:54 Comment(1)
I suspected from wicd then I uninstalled but problem persisted. Maybe service was still there because I edited this file and rebooted system and network is ok now.Transience
S
1

I had this exact same issue and the root cause was wicd. Running:

sudo service wicd stop
sudo service docker restart

...should do the trick.

Sula answered 26/8, 2015 at 13:53 Comment(1)
you are an absolute life saver, spent 6 hours on this, thank you so muchChumley
C
0

For Ubuntu 22: In short: sudo apt remove netscript-2.4

Yes, this solution is equivalent to the previous, but the previous is difficult to get the point.

Coppage answered 24/12, 2023 at 7:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.