I'm trying to address an issue with an Expect script that logs into a very large number of devices (thousands). The script is about 1500 lines and fairly involved; its job is to audit managed equipment on a network with many thousands of nodes. As a result, it logs into the devices via telnet, runs commands to check on the health of the equipment, logs this information to a file, and then logs out to proceed to the next device.
This is where I'm running into my problem; every expect
in my script includes a timeout and an eof like so:
timeout {
lappend logmsg "$rtrname timed out while <description of expect statement>"
logmessage
close
wait
set session 0
continue
}
eof {
lappend logmsg "$rtrname disconnected while <description of expect statement>"
logmessage
set session 0
continue
}
My final expect
closes each spawn session manually:
-re "OK.*#" {
close
send_user "Closing session... "
wait
set session 0
send_user "closed.\n\n"
continue
}
The continues bring the script back to the while loop that initiates the next spawn session, assuming session = 0.
The set session 0 tracks when a spawn session closes either manually by the timeout or via EOF before a new spawn session is opened, and everything seems to indicate that the spawn sessions are being closed, yet after a thousand or so spawned sessions, I get the following error:
spawn telnet <IP removed>
too many programs spawned? could not create pipe: too many open files
Now, I'm a network engineer, not a UNIX admin or professional programmer, so can someone help steer me towards my mistake? Am I closing telnet spawn sessions but not properly closing a channel? I wrote a second, test script, that literally just connects to devices one by one and disconnects immediately after a connection is formed. It doesn't log in or run any commands as my main script does, and it works flawlessly through thousands of connections. That script is below:
#!/usr/bin/expect -f
#SPAWN TELNET LIMIT TEST
set ifile [open iad.list]
set rtrname ""
set sessions 0
while {[gets $ifile rtrname] != -1} {
set timeout 2
spawn telnet $rtrname
incr sessions
send_user "Session# $sessions\n"
expect {
"Connected" {
close
wait
continue
}
timeout {
close
wait
continue
}
eof {
continue
}
}
In my main script I'm logging every single connection and why they may EOF or timeout (via the logmessage process which writes a specific reason to a file), and even when I see nothing but successful spawned connections and closed connections, I get the same problem with my main script but not the test script.
I've been doing some reading on killing process IDs, but as I understand it, close should be killing the process ID of the current spawn session, and wait should be halting the script until the process is dead. I've also tried using a simple "exit" command from the devices to close the telnet connection, but this doesn't produce any better results.
I may simply need a suggestion on how to better track the opening and closing of my sessions and ensure that, between devices, no spawn sessions remain open. Any help that can be offered will be much appreciated.
Thank you!
continue
inside aneof
clause is a good idea, but my knowledge of Expect is still a bit patchy… – ForgetCtrl + ']'
) to close the telnet session instead ofclose
&wait
? – Egotist