I have a Azure DevOps CI release that runs a massive number of selenium tests on the same server at the same time. Typically it works great, but occasionally my selenium test task will timeout due to this error:
2020-05-07T15:47:37.0692681Z Completed TestExecution Model... 2020-05-07T15:47:48.6637501Z The STDIO streams did not close within 10 seconds of the exit event from process 'C:\TFSAgent5_work_tasks\VSTest_ef087383-ee5e-42c7-9a53-ab56c98420f9\2.153.9\Modules\DTAExecutionHost.exe'. This may indicate a child process inherited the STDIO streams and has not yet exited. 2020-05-07T16:08:50.9254238Z ##[error]The task has timed out.
This typically occurs on a test rerun, I see it maybe once every 100 test runs. It's a killer issue because it will lock up the test agent for the maximum amount of time the timeout is set to (in my case 30 minutes). A number of other posts point out that this can occur if your not properly closing your selenium driver, however I believe I am, and, in my case 99/100 time it works great, this is the code that I use to close my selenium driver:
[AssemblyCleanup]
public static void Cleanup()
{
try
{
driver.Close();
driver.Quit();
}
catch (Exception e)
{
Debug.WriteLine(e.Message);
}
}
They're really not allot of useful suggestions floating around. I think this issue is related to the load the test agent server (or test server) is under. When I run a smaller CI release (nightly) I never see this issue.
Has anyone experienced this issue under high load before? I wonder where that "10 seconds" comes from and whether that can be adjusted somehow? Is there a issue with the code that I use to close the driver, is there a better way of closing that that will ensure even when it's locked up I can still kill it, maybe something I could add to my catch statement?