Volatile isn't good enough, but practically it will always work because the operating system scheduler will always take a lock eventually. And will work well on a core with a strong memory model, like x86 which burns a lot of juice to keep caches synchronized between cores.
So what really only matters is how quickly a thread will respond to the stop request. It is easy to measure, just start a Stopwatch in the control thread and record the time after the while loop in the worker thread. The results I measured from repeating taking 1000 samples and taking the average, repeated 10 times:
volatile bool, x86: 550 nanoseconds
volatile bool, x64: 550 nanoseconds
ManualResetEvent, x86: 2270 nanoseconds
ManualResetEvent, x64: 2250 nanoseconds
AutoResetEvent, x86: 2500 nanoseconds
AutoResetEvent, x64: 2100 nanoseconds
ManualResetEventSlim, x86: 650 nanoseconds
ManualResetEventSlim, x64: 630 nanoseconds
Beware that the results for volatile bool are very unlikely to look that well on a processor with a weak memory model, like ARM or Itanium. I don't have one to test.
Clearly it looks like you want to favor ManualResetEventSlim, giving good perf and a guarantee.
One note with these results, they were measured with the worker thread running a hot loop, constantly testing the stop condition and not doing any other work. That's not exactly a good match with real code, a thread won't typically check the stop condition that often. Which makes the differences between these techniques largely inconsequential.