Vagrant "Authentication failure" during up, but "vagrant ssh" can get in just fine
Asked Answered
K

3

5

I'm stumped. I'm trying to run a vagrant/virtualbox/coreos cluster on Windows 8.1 to develop the cluster for running in the cloud. I've tried this on four machines (all are Windows 8.1 with latest updates and all with the latest VirtualBox, Vagrant, Git, and the same config for Vagrant. I'm checking the Vagrant config out of a repo on all 4 system so I'm confident the configs are the same for each. I get 2 successes and 2 failures.

Two machines succeed like this:

Bringing machine 'core-01' up with 'virtualbox' provider...
==> core-01: Checking if box 'coreos-stable' is up to date...
(snip)
    core-01: SSH address: 127.0.0.1:2222
    core-01: SSH username: core
    core-01: SSH auth method: private key
    core-01: Warning: Connection timeout. Retrying...
==> core-01: Machine booted and ready!
==> core-01: Setting hostname...
==> core-01: Configuring and enabling network interfaces...

vagrant ssh and vagrant halt both work fine on these two systems.

Two other Windows machines fail like this:

Bringing machine 'core-01' up with 'virtualbox' provider...
==> core-01: Importing base box 'coreos-stable'...
==> core-01: Matching MAC address for NAT networking...
==> core-01: Checking if box 'coreos-stable' is up to date...
==> core-01: Setting the name of the VM: coreos-vm-cluster_core-01_1422899531630_88904
==> core-01: Clearing any previously set network interfaces...
==> core-01: Preparing network interfaces based on configuration...
    core-01: Adapter 1: nat
    core-01: Adapter 2: hostonly
==> core-01: Forwarding ports...
    core-01: 22 => 2222 (adapter 1)
==> core-01: Running 'pre-boot' VM customizations...
==> core-01: Booting VM...
==> core-01: Waiting for machine to boot. This may take a few minutes...
    core-01: SSH address: 127.0.0.1:2222
    core-01: SSH username: core
    core-01: SSH auth method: private key
    core-01: Warning: Connection timeout. Retrying...
    core-01: Warning: Authentication failure. Retrying...
    core-01: Warning: Authentication failure. Retrying...
    core-01: Warning: Authentication failure. Retrying...
    core-01: Warning: Authentication failure. Retrying...
    core-01: Warning: Authentication failure. Retrying...
    core-01: Warning: Authentication failure. Retrying...

Note how both the working and non-working systems experience one timeout connecting, but then the successful ones actually do connect and finish bringing up the VM, whereas the unsuccessful ones just get stuck with an authentication retry loop.

Following the authentication failure, if I leave it to time out or even if I ctrl+C, I can run "vagrant ssh core-01" and it takes me straight in:

CoreOS (stable)
core@localhost ~ $

'vagrant halt' also fails to make an ssh connection on these systems:

==> core-01: Attempting graceful shutdown of VM...
    core-01: Guest communication could not be established! This is usually because
    core-01: SSH is not running, the authentication information was changed,
    core-01: or some other networking issue. Vagrant will force halt, if
    core-01: capable.
==> core-01: Forcing shutdown of VM...

I can successfully use putty or other ssh clients to access the VM using insecure_private_key for authentication, so I'm assuming the VM itself has the correct config, and the problem lay with Vagrant's ability to call ssh to get in. If "Vagrant up" can't ssh in, it cannot finish the startup config for the VM, so I'd like to solve this primarily for that reason.

This is the ssh config that lets me get in with other ssh clients and I believe should be used by Vagrant:

Host: 127.0.0.1
Port: 2222
Username: core
Private key: C:/Users/Mike/.vagrant.d/insecure_private_key

I have also enabled GUI for the VM's and the console does not show any errors; it gets all the way to a login prompt just fine (which is also consistent with the fact that I can ssh in and otherwise use the VM).

I believe (but don't know how to verify) that Vagrant is calling the openssh client in C:\Program Files (x86)\Git\bin

All are running Vagrant version 1.7.2 and git 1.9.5. Ruby 2.0.0p353.

My %PATH% is about 500 chars long. I'm confident Vagrant is finding an ssh client of some sort due to getting at least one or two timeouts followed by an authentication failure.

Thanks in advance for any ideas!

Update: Buried deep in the output of "vagrant up --debug" is this little gem:

D, [2015-02-02T23:11:10.755468 #3920] DEBUG --
   net.ssh.authentication.session[14661cc]: trying publickey
E, [2015-02-02T23:11:10.756472 #3920] ERROR --
   net.ssh.authentication.key_manager[1473e1c]:
   could not load public key file
   `C:/Users/Mike/.vagrant.d/insecure_private_key': 
   Net::SSH::Exception (public key at
   C:/Users/Mike/.vagrant.d/insecure_private_key.pub is not valid)

That final "insecure_private_key.pub is not valid" seems like a solid clue.

I've tried modifying that file to ensure it has just LF for line endings as well as CRLF and it makes no difference. Visually it looks fine. It's also 100% byte-for-byte identical to the file that's working on one of the other systems. Why would it be invalid? I have verified the current user has full control permissions on the file and also tried vagrant up as Administrator. No change in behavior. :(

Khabarovsk answered 2/2, 2015 at 18:16 Comment(8)
To help troubleshoot, you can enable GUI mode for the VM. Sometimes this will show the guest OS being stuck at a login or some other step.Batfowl
Also, if you think it might be an issue with the SSH client being used by Vagrant, see some of the discussions and ideas in this SO thread: SSH to Vagrant box in Windows?Batfowl
Thanks BrianC. I tried this and did not see any errors go by on the console. I have edited above to reflect this additional debug step I took. Note that the VM is booting just fine (I say this because I can ssh into it), but Vagrant can't ssh in to finish its startup scripts.Khabarovsk
Ahh, I missed your note that a subsequent vagrant ssh does work. Is it always reproducible that 2 Windows machines work and 2 don't (even after a vagrant destroy)? You could try turning on more verbose debug messages with Vagrant, then compare a working with non-working system to see if any differences appear. (Docs: Debugging and Troubleshooting)Batfowl
@BrianC, yes this problem is 100% consistent, i.e. it works 100% of the time on 2 systems and fails 100% of the time on the other 2 systems. I've done numerous vagrant destroy's and the results remain consistent. "vagrant up --debug" was an excellent tip however! Check out the new edit above after "Buried deep in the output...".Khabarovsk
Are the permissions on the private key correct?Pediatrics
@Rob, current user has full control of that file according to security properties. I have also tried vagrant up --debug as administrator and get the same "insecure_private_key.pub is not valid" message. :( Checking permissions was a good idea; thanks.Khabarovsk
I resolved this issue today with the following changes: 1) rm insecure_private_key* 2) comment out: #config.ssh.private_key_path = './insecure_private_key' 3) destroy box and up again. Using Vagrant 1.7.2 now. Not sure if that was related.Khabarovsk
L
9

Remove
C:/Users/Mike/.vagrant.d/insecure_private_key

on next vagrant restart it will be created again (this time should be correct)

Loosen answered 27/5, 2015 at 6:47 Comment(1)
I use homestead, in my case I had to copy ~/.ssh/id_rsa to homestead_folder/.vagrant/machines/{my-machine-name}/virtualbox/private_keyGlyptics
B
0

Was the .pub file created by Puttygen (perhaps when creating a private key in Putty's format)? I did that and it prevented vagrant from connecting to the box, but I could connect using Putty and Puttygen's generated .ppk file.

Changing the extension on the Putty public key worked for me, presumably because Vagrant didn't try using it any more.

Brassica answered 25/2, 2015 at 19:33 Comment(1)
Good idea, so I decided to archive into another folder insecure_private_key* and recreate from scratch using ONLY ssh-keygen. Still no luck, and same error message as before (insecure_private_key.pub is not valid). I then moved insecure_private_key to the local dir and set config.ssh.private_key_path = './insecure_private_key' which now looks like it is a simple key mismatch. I don't have time to fully debug right now, but it seems like progress!Khabarovsk
S
0

When I created the PPK file of the insecure_private_key file, I also --out of habit-- created a .pub version. This appeared to cause the problem. Like Jon, when I removed the insecure_private_key.pub file, vagrant up was able to run all the way through.

If you have created an insecure_private_key.pub file using puttygen and run into this problem, I suggest removing it. It is not needed for vagrant and it only got in the way.

Salt answered 2/5, 2015 at 2:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.