Jenkins Windows agent connection getting terminated with java.nio.channels.ClosedChannelException
Asked Answered
C

12

18

While connecting to windows machine as agent, I am getting the following error, I think its some network related issue, but need some help where to start looking or what is a possible solution for this.

INFO: Terminated
Aug 01, 2017 10:15:54 PM hudson.remoting.JarCacheSupport$1 run
WARNING: Failed to resolve a jar 06bcb4519543f5ec83cf9d6da9f6cfbe
java.io.IOException: Failed to write to C:\Users\Administrator\.jenkins\cache\jars\06\BCB4519543F5EC83CF9D6DA9F6CFBE.jar
        at hudson.remoting.FileSystemJarCache.retrieve(FileSystemJarCache.java:133)
        at hudson.remoting.JarCacheSupport$1.run(JarCacheSupport.java:64)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:483)
        at java.util.concurrent.FutureTask.run(FutureTask.java:274)
        at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
        at java.lang.Thread.run(Thread.java:809)
Caused by: java.io.IOException: Backing channel 'JNLP4-connect connection to dr2r4m1p21/172.20.238.41:9001' is disconnected.
        at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
        at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
        at com.sun.proxy.$Proxy4.writeJarTo(Unknown Source)
        at hudson.remoting.FileSystemJarCache.retrieve(FileSystemJarCache.java:98)
        ... 5 more
Caused by: java.nio.channels.ClosedChannelException
        at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
        at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
        at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
        at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:166)
        at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
        at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
        at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48)
        at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
        at hudson.remoting.Engine$1$1.run(Engine.java:94)
        ... 1 more

Above mentioned stack trace is from salve (Windows) machine and my Jenkins/Controller is running on RHEL, I am able to see following stacktrace there.

INFO: Accepted JNLP4-connect connection #113 from /172.20.238.31:60363
Aug 01, 2017 12:45:55 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
WARNING: Computer.threadPoolForRemoting [#42] for Build_Agent terminated
java.nio.channels.ClosedChannelException
        at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
        at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
        at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
        at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
        at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:800)
        at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
        at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:311)
        at hudson.remoting.Channel.close(Channel.java:1295)
        at hudson.remoting.Channel.close(Channel.java:1263)
        at jenkins.slaves.DefaultJnlpSlaveReceiver.afterChannel(DefaultJnlpSlaveReceiver.java:173)
        at org.jenkinsci.remoting.engine.JnlpConnectionState$4.invoke(JnlpConnectionState.java:421)
        at org.jenkinsci.remoting.engine.JnlpConnectionState.fire(JnlpConnectionState.java:312)
        at org.jenkinsci.remoting.engine.JnlpConnectionState.fireAfterChannel(JnlpConnectionState.java:418)
        at org.jenkinsci.remoting.engine.JnlpProtocol4Handler$Handler$1.run(JnlpProtocol4Handler.java:334)
        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Consist answered 1/8, 2017 at 8:53 Comment(5)
Have you checked slave jar is running successfully on windows machineHypersensitize
slave jar is running successfully, as i try to run slave.jar on slave machine i am able to see slave trying to connect to master and then stops with above mentioned stack trace and java.nio.channels.ClosedChannelExceptionConsist
Have you used the fixed port or random JNLP port in jenkins??Hypersensitize
fixed 9001.., i think it maybe an issue of ports and security but i do not know where to start looking from!Consist
Restarting my jenkins server solved this issue.Oneupmanship
S
14
  • I observed the same error after our jenkins master was updated. It is likely due to incompatibility between Java 7 (v80) and latest Java 8.
  • Check the java version being used by your master, and the java version of your slave.
  • In my case, I am running swarm-client-2.0-jar-with-dependencies.jar on a linux host, and it was using Java 7.

    java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

  • Our jenkins master was upgraded and is now running Java 8

    java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

  • When the java on the slave was updated to Java 8, the connection issues disappeared.
Stereochemistry answered 7/8, 2017 at 1:38 Comment(3)
* Note that Java 7 is no longer receiving public updates: java.com/en/download/faq/java_7.xmlStereochemistry
The topic of this question is about the slave agent, but here is some useful info regarding maven java requirements between slave and master: wiki.jenkins.io/display/JENKINS/Maven+Project+Plugin in Maven jobs and Java versions compatibility section.Stereochemistry
Would it be a problem if the master has 1.8.211 and the slave has 1.8.311 versions?Venator
P
21

I was experiencing a similar error as the OP where the connection to my slave was dropping. The root cause of the issue was not due to a mismatch in Java versions between Jenkins slave and master hosts.

Solution If you are running Jenkins in an EC2 instance on AWS behind an Elastic Load Balancer (ELB), increase the "idle timeout" value under the "attributes" section from the default 60 seconds. I set the new value to 600 and no longer experienced the error.

It appears that if a single command in your build process takes greater than 60 seconds with no log output, the ELB will terminate the session due to idle activity.

Source: https://issues.jenkins-ci.org/browse/JENKINS-44001?focusedCommentId=312412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-312412

Puccini answered 26/10, 2017 at 18:43 Comment(3)
This is exactly what my problem was. Thanks!Maibach
Thank you, I use AWS with a classic ELB and had this exact issue.Electroencephalogram
I have Jenkins behind AWS ELB, this solved my issue as well!Wire
S
14
  • I observed the same error after our jenkins master was updated. It is likely due to incompatibility between Java 7 (v80) and latest Java 8.
  • Check the java version being used by your master, and the java version of your slave.
  • In my case, I am running swarm-client-2.0-jar-with-dependencies.jar on a linux host, and it was using Java 7.

    java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

  • Our jenkins master was upgraded and is now running Java 8

    java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

  • When the java on the slave was updated to Java 8, the connection issues disappeared.
Stereochemistry answered 7/8, 2017 at 1:38 Comment(3)
* Note that Java 7 is no longer receiving public updates: java.com/en/download/faq/java_7.xmlStereochemistry
The topic of this question is about the slave agent, but here is some useful info regarding maven java requirements between slave and master: wiki.jenkins.io/display/JENKINS/Maven+Project+Plugin in Maven jobs and Java versions compatibility section.Stereochemistry
Would it be a problem if the master has 1.8.211 and the slave has 1.8.311 versions?Venator
B
6

I experienced the same issue. I found out that the windows slave switched to a "sleep" mode specially if your jobs are not running against a GUI.

  • For windows... no move of the mouse or keyboard means no activity.

Then to successfully solve it. On a Windows7 slave, here is what I did:

  • Control Panel\Hardware and Sound\Power Options
  • Show additionnal plans
  • select High performance

  • Control Panel\Hardware and Sound\Power Options\Edit Plan Settings

  • turn off display never
  • Change advanced power settings -->turn off hard disk after 10000 min

Should be ok after this procedure

Banket answered 2/10, 2017 at 7:59 Comment(0)
D
6

in addition to the error log in the post, I got also the error log under the jenkins directory in the slave (for me it was C:\jenkins\jenkins-slave.err.log):

JNLP file http://jenkins.domain.com/computer/my_slave_name/slave-agent.jnlp?encrypt=true has invalid arguments: [#####################################, my_slave_name, -workDir, c:\jenkins, -internalDir, remoting, -url, http://jenkins.domain.com/, -headless, -jar-cache, C:\Users\Administrator.jenkins\cache\jars] Most likely a configuration error in the master "-workDir" is not a valid option

my solution:

1)windows slave level: close the services console in the GUI for all users - this is must. from some reason Microsoft is locking installation/removal of windows services

2)windows slave level: kill all java and jenkins-slave processes (if exist)

3)windows slave level: delete the jenkins slave service (if exist) from cmd: sc delete jenkinsslave-c__jenkins /force (in my case)

4)windows slave level: verify that you have java 8 installed: i'm using jdk1.8.0_151 . uninstall all old java version

5)jenkins master ui level: Change the way the Jenkins is connect to the slave under slave configure --> Launch method: Let Jenkins control this Windows slave as a Windows service (instead of Launch agent via Java Web Start)

6) aws level: Increase the aws elb Idle timeout to 600 (from 60) - like @njtman suggested

7)jenkins master ui level: relaunch the agent in jenkins and wait several minutes.

my environment:

jenkins: 2.89.2 , os: windows 2012 R2, java: jdk1.8.0_151

Dingle answered 10/1, 2018 at 14:25 Comment(0)
S
1

On Windows, I recognized that I needed to add the "-noCertificateCheck" attribute to the arguments of the jenkins-slave.xml in the workdir. We use a cert from a internal PKI on the master and this was the easiest way to work around it (having everything in the internal network).

<arguments>-Xrs  -jar "%BASE%\slave.jar" -jnlpUrl https://jenkins.ourdomain.com/computer/Windows%20build%20server%20-%20Bare%20metal/slave-agent.jnlp -secret abc -noCertificateCheck</arguments>

I recognized this by manually running the agent from the command prompt:

java -jar agent.jar -jnlpUrl https://jenkins.ourdomain.com/computer/Windows%20build%20server%20-%20Bare%20metal/slave-agent.jnlp -secret abc -workDir "D:\agentroot" -noCertificateCheck
Seemaseeming answered 2/11, 2018 at 10:21 Comment(0)
L
1

Well... for me it worked the following solution:

mark the node "temporary offline" and put it back "online" again

reconnect

Laryngo answered 13/2, 2019 at 18:28 Comment(0)
D
1

The user2015131 suggestion inspired me to find my solution for this issue.

The problem

I explain my case, it may work for some people:

  1. I installed Jenkins as a service a long time before on my slave machines.
  2. I updated Java on the Jenkins' master computer.

So the Jenkins service's code stored on the slave is outdated.

The solution

Follow the next steps on every slave machine:

  1. Update Java version.

    Be sure the Java version is the same or compatible with the one installed on the master computer.

  2. Remove the old slave code. It's located inside the folder specified in the Remote Root Directory field under the node's configuration.

    I removed every jenkins-slave.* file, leaving only the jenkins_agent.pid file and the folders "remoting" and "workspace".

  3. Go to the slave node interface on Jenkins from the web browser and click on the button.

    You will download a new JNLP file to install a new (updated) Jenkins service on the slave machine.

  4. Run the downloaded file, go to menu and click on "Install as a service".

Hope it helps!

Domingadomingo answered 5/8, 2019 at 22:25 Comment(0)
C
0

No time to breath for virtual slave...

ok, here how I've solved my special case:

I had some VM's with libvirt/quemu running as slaves. Because the libvirt-plugin was to unreliable for me I've started those VM's on my own. I asked my self: "Why this libvirt-plugin had a mandatory delay time... Impatience...

So if the libvirt-client (slave) is saying hello to jenkins you should probably wait some secs to let this poor guy breath a bit. After starting up.

The slave was a win7 the host a ubuntu 18.04

Chicoine answered 10/7, 2018 at 9:37 Comment(0)
R
0

I faced the same issue, however, the reason was quite unrelated to slave configuration as I don't have any slave configured. i was running Jenkins on Tomcat (9.x), JDK 17, Windows Server 2018. I had the jenkins.war file in Tomcat and the exploded WAR (webapps). Deleting the 'jenkins' folder in webapps (exploded WAR) and staring tomcat solved the problem.

This is to record the occurrence and that someone might find this useful.

Riles answered 26/5, 2022 at 19:29 Comment(0)
R
0

I was facing the same issue . The solution was to remove client cache files. you can find jar Cache location on java output run command.

Rattlebrain answered 19/6, 2022 at 13:6 Comment(0)
A
-1

I was facing the same issue , rectified using below steps

  1. Go to Jenkins Url -> Manage Jenkins -> Node -> Your Node ..you will get Custom WorkDir path lets say C:/Jenkins
  2. Open WorkDir Path and delete complete remoting directory
  3. Re-launch the slave
Amnesty answered 29/9, 2018 at 6:28 Comment(0)
N
-1

Simple in my case, First my master node server restarted. while other devops guy may be restarted jenkins agent service in master.So i had to restart jenkins slave service in slave node. And it just worked.

Naaman answered 24/11, 2021 at 13:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.