cephadm: Not able to add nodes to ceph cluster (Error EINVAL: Failed to connect to host) [closed]
Asked Answered
C

3

5

I followed the following steps from https://docs.ceph.com/en/latest/cephadm/install/ to setup a ceph cluster on Centos 8.1

curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm
chmod +x cephadm
./cephadm add-repo --release octopus
./cephadm install

After the above command I found out that ceph requires either docker or podman to run. So I installed the community version of docker from https://docs.docker.com/engine/install/centos/ and continued the steps below.

./cephadm install
mkdir -p /etc/ceph
cephadm bootstrap --mon-ip *ip_of_the_current_machine (host1)*
cephadm install ceph-common
ssh-copy-id -f -i /etc/ceph/ceph.pub root@host2*
ceph orch host add host2

The above command fails with the error

[root@host1 home]# ceph orch host add host2
INFO:cephadm:Inferring fsid 12345678-2345-6789-1011-000129110013
INFO:cephadm:Inferring config /var/lib/ceph/12345678-2345-6789-1011-000129110013/mon.host1/config
INFO:cephadm:Using recent ceph image ceph/ceph:v15
Error EINVAL: Failed to connect to host2 (host2).
Check that the host is reachable and accepts connections using the cephadm SSH key
 
you may want to run:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > key
> ssh -F ssh_config -i key root@host2

I am able to login to host2 using the above steps. Could someone please tell if I am doing something wrong. How do I solve this problem.

Cholecystectomy answered 11/11, 2020 at 8:50 Comment(3)
Hello everyone, I would like to point out that any comments are welcome. If you plan to downvote this question, please make sure to add a comment so that helps me put this question in a better way.Cholecystectomy
It's unclear to me where your setup could be wrong. I just deployed a cluster with cephadm bootstrap and added a second node successfully. Did you install cephadm on the second node, too? Did you check if your ssh connection worked passwordless? I should mention that I installed cephadm directly from the repository (openSUSE Leap 15.2), not with the github script. But it worked flawlessly for me.Obannon
Also does the name resolution work? At least having both hosts in /etc/hosts should be sufficient.Obannon
C
6

So after days of debugging I figured out that python3 was missing on the node I wanted to add. All I had to do was check the last few logs using the command.

ceph log last cephadm

This gave the following log messages.

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1036, in _remote_connection
    raise execnet.gateway_bootstrap.HostNotFound(msg)
execnet.gateway_bootstrap.HostNotFound: Can't communicate with remote host `host2`, possibly because python3 is not installed there: cannot send (already closed?)
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 295, in _finalize
    next_result = self._on_complete(self._value)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 103, in <lambda>
    return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1201, in add_host
    return self._add_host(spec)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1187, in _add_host
    error_ok=True, no_fsid=True)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1104, in _run_cephadm
    with self._remote_connection(host, addr) as tpl:
  File "/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1055, in _remote_connection
    raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Failed to connect to host2 (host2).
Check that the host is reachable and accepts connections using the cephadm SSH key

Next to add the node I ran.

ceph orch host add host2 ip_address
Cholecystectomy answered 20/11, 2020 at 12:20 Comment(0)
R
1

I've faced with the same issue but my top error message was

2021-01-13T15:21:13.071913+0000 mgr.ha1.qzzjzw (mgr.18492) 167366 : cephadm [ERR] _Promise failed
Traceback (most recent call last):
  File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in bootstrap_exec
    s = io.read(1)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
    raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
EOFError: expected 1 bytes, got 0

and workaround helps me as well

ceph orch host add host2 ip_address
Ringo answered 13/1, 2021 at 15:59 Comment(0)
A
1

I faced the same issue as Oleg using cephadm on debian 10.

The workaround was to add the IP address.

sudo ./cephadm shell
ceph orch host add host2 ip_address
Added host 'host2'
Adelbert answered 12/5, 2021 at 16:7 Comment(1)
You may just run without 'cephadm shell', i.e.: $ sudo ceph orch host add host2 ip_addressMayotte

© 2022 - 2024 — McMap. All rights reserved.