Ansible stops connecting to the host via ssh [closed]
Asked Answered
A

1

10

Introduction

For over a month I've been running the following command:

ansible-playbook -vvvvi host_test rhel-tests.yml

Which connected via SSH and ran tests on a host successfully without any problems. But as of the last couple days, I've received the following when running:

fatal: [10.2.16.2]: UNREACHABLE! => {
    "changed": false, 
    "unreachable": true
}

MSG:

Failed to connect to the host via ssh: OpenSSH_7.6p1, LibreSSL 2.6.2
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 48: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 35742
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 2
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Control master terminated unexpectedly
Shared connection to 10.2.16.2 closed.

Even though I can establish a normal SSH connection from bash to 10.2.16.2 just fine from the host I'm running.

Details

The contents of host_test are as follows:

[rhel]
10.2.16.2 node_type=xxx

[rhel:vars]
ansible_become=yes
ansible_become_method=su
ansible_become_user=root
ansible_connection=ssh
ansible_user=yyy
node_name=""


[cisco]

[cisco:vars]
node_name=""

[curtiss-wright]

[zzz]

[other]

[nmap:children]
rhel
cisco
curtiss-wright
other
zzz

[password-test]

Here's my ansible.cfg:

[defaults]
ask_vault_pass = True
filter_plugins = filter_plugins
host_key_checking = False
retry_files_enabled = False
inventory = hosts
stdout_callback = debug

[paramiko_connection]
record_host_keys=False

[ssh_connection]
ssh_args = -o LogLevel=QUIET -o ControlMaster=auto -o ControlPersist=2m -o UserKnownHostsFile=/dev/null
scp_if_ssh = True

My thoughts

  • Configuration changes are happening constantly on the target, so it's possible something was configured in ssh to limit connections in some way.
  • Tests are being added to rhel-tests.yml, so it's possible some sort of timeout is now being triggered that wasn't before. I've tried reverting back the version of rhel7 to about a month back, and the command still fails, so I believe that this is not likely to be the cause.
  • I'm using ansible version 2.5.4 installed via brew. I've tried updating to Ansible 2.6.2, but that seems to have done nothing.
  • I've tried several other suggestions found online, including using the paramiko_ssh connection type, which also fails.
  • I can run ansible -i hosts_test -m ping 10.2.16.2 and get a pong back
  • This question seems pretty close to my issue, but there aren't any lines in rhel-tests.yml that reboot or shutdown.

Question

What's causing my playbook to fail and how can I fix it?

Acotyledon answered 3/8, 2018 at 15:26 Comment(0)
A
7

The connection may be dropping due to the lack of output from your play.

Add the following to your ssh_args (docs for v2.4) in ansible.cfg:

-o ServerAliveInterval=50

What this ServerAliveInterval=50 does is keep the ssh connection alive when the play has a lack of output, by the client sending a null packet to the server every 50-seconds.

Acotyledon answered 3/8, 2018 at 19:56 Comment(2)
see here for what this does: unix.stackexchange.com/a/3027/21256Rambler
Ansible docs: docs.ansible.com/ansible/latest/reference_appendices/…Rondo

© 2022 - 2024 — McMap. All rights reserved.