Background (skip to Question below if not interested)
I have a simulator that runs through three states:
- Single threaded startup (I/O ok)
- Multi-threaded in-memory CPU-bound simulation stage (I/O not ok)
- Post-simulation, post-join single threaded stage (I/O ok)
What the heck! During standard testing, CPU usage dropped from 100% down to 20%, and the total run took about 30 times longer than normal (130secs vs 4.2secs).
When Callgrind
revealed nothing suspicious, my head buzzed as I was on the precipice of rolling back to the last commit, losing all bug-fixes.
Discouraged, I walked into the server room during a run and noticed nasty grinding sounds, verified later to be caused by writes to Mysql sockets in /proc/PID/fd!!! It turned out that Mysql code, several layers deep in Stage 2., was causing problems.
Lessons Learned
- Accidental I/O can be lethal to a real-time application
- Unit testing is not enough: I need benchmarking, too
Fix I will introduce thread-local-storage IOSentinels and asserts() on ReadAllowed() and WriteAllowed() to ensure that Stage 2 threads will never do any IO.
Question
Anyone have any luck with attaching/writing a benchmarking framework with googletest?
Unfortunately, all my googletests passed this time. Had I stepped away for a bit and come back without noticing the run-time, this would have been a disastrous commit, and possibly much harder to fix.
I would like googletest to fail if a run takes >2 or 3 times the last runtime: this last part is tricky because for very quick runs, system state can cause something to take twice as long but still be ok. But for a long simulation run/test, I don't expect runtimes to change by a great deal (>50% would be unusual).
I am open to suggestions here, but it would be nice to have a low-maintenance check that would work with automated testing so it will be obvious if the system suddenly got slow, even if all the outputs appear to be ok.
EXPECT_LT( TEST_RUNTIME, 1000)
for less than a thousand milliseconds... – Savoirvivre