The Answer by John Bollinger and the Answer by Stephen C are both correct and informative. I thought I would add a code example to show:
- How both virtual threads and platform/kernel threads respect
Thread.sleep
.
- The astounding performance increase that is possible with Project Loom technology.
Benchmarking code
Let's simply write a loop. On each loop, we instantiate a Runnable
to perform a task, and submit that task to an executor service. Our task is: do some simple math, subtraction from the long
returned by System.nanoTime
. Finally, we print that number to the console.
But the trick is that before the calculation, we sleep the thread performing that task. Since each and every sleeping for an initial twelve seconds, we should see nothing appear on the console until after at least 12 seconds of dead time.
Then the submitted tasks perform their work.
We run this in two ways, by enabling/disabling a pair of commented-out lines.
ExecutorService executorService = Executors.newFixedThreadPool( 5 )
A conventional pool of conventional threads, using 5 of the 6 real cores (no hyper-threading) on this Mac mini (2018) with a 3 GHz Intel Core i5 processor and 32 gigs of RAM.
ExecutorService executorService = Executors.newVirtualThreadExecutor()
An executor service backed by the new virtual threads (fibers) provided by Project Loom in this special build of early-access Java 16.
package work.basil.example;
import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class TooFast
{
public static void main ( String[] args )
{
TooFast app = new TooFast();
app.demo();
}
private void demo ( )
{
System.out.println( "INFO - starting `demo`. " + Instant.now() );
long start = System.nanoTime();
try (
// 5 of 6 real cores, no hyper-threading.
ExecutorService executorService = Executors.newFixedThreadPool( 5 ) ;
//ExecutorService executorService = Executors.newVirtualThreadExecutor() ;
)
{
Duration sleep = Duration.ofSeconds( 12 );
int limit = 100;
for ( int i = 0 ; i < limit ; i++ )
{
executorService.submit(
new Runnable()
{
@Override
public void run ( )
{
try {Thread.sleep( sleep );} catch ( InterruptedException e ) {e.printStackTrace();}
long x = ( System.nanoTime() - 42 );
System.out.println( "x = " + x );
}
}
);
}
}
// With Project Loom, the flow-of-control blocks here until all submitted tasks have finished.
Duration demoElapsed = Duration.ofNanos( System.nanoTime() - start );
System.out.println( "INFO - demo took " + demoElapsed + " ending at " + Instant.now() );
}
}
Results
The results are startling.
Firstly, in both cases we see a delay of just over 12 seconds before any console activity. So we know that the Thread.sleep
is being truly executed by both platform/kernel threads and virtual threads.
Secondly, the virtual threads complete all the tasks in mere seconds versus minutes, hours or days for the conventional threads.
With 100 tasks:
- Conventional threads take 4 minutes (PT4M0.079402569S).
- Virtual threads take just over 12 seconds (PT12.087101159S).
With 1,000 tasks:
- Conventional threads take 40 minutes (PT40M0.667724055S).
( This makes sense: 1,000 * 12 / 5 / 60 = 40 )
- Virtual threads take 12 seconds (PT12.177761325S).
With 1,000,000 tasks:
- Conventional threads take… well, days.
(I did not actually wait. I had previously experienced a 29-hour run of a half-million loops in an earlier version of this code.)
- Virtual threads take 28 seconds (PT28.043056938S).
(If we subtract the 12 seconds of dead-time spent sleeping, a million threads performing all their work in the remaining 16 seconds comes to about 62,500 threaded-tasks per second being executed with immediacy.)
Conclusion
With conventional threads, we can see a repeated burst of several lines suddenly appearing on the console. So we can see how the platform/kernel threads are actually on the core, blocked, as they wait for their 12-second Thread.sleep
to expire. Then all five threads wake up at about the same moment, having all started at about the same moment, every 12 seconds, simultaneously do their math and write to console. This behavior is confirmed as we see little usage of the CPU cores in the Activity Monitor app.
As an aside: I would assume the host OS’s notices our Java threads are actually busy doing nothing, and then using its CPU scheduler to suspend our Java threads while blocked, to let other processes such as other apps use the CPU cores. But if so, this is transparent to our JVM. From the JVM’s perspective, the sleeping Java threads are taking up the CPU during the entire nap.
With virtual threads, we see dramatically different behavior. Project Loom is designed such that when a virtual thread blocks, the JVM moves that virtual thread off the platform/kernel thread, and puts in its place another virtual thread. This within-JVM swapping of threads is vastly cheaper than is swapping platform/kernel threads. The platform/kernel thread carrying those various virtual threads can stay busy rather than waiting for each block to pass.
For more info, see any of the recent (late 2020) talks by Ron Pressler of Project Loom at Oracle, and his 2020-05 paper, State of Loom. This behavior of rapidly swapping blocked virtual threads is so efficient that the CPU can be kept busy the entire time. We can confirm this effect in the Activity Monitor app. Here is a screenshot of Activity Monitor running the million tasks with virtual threads. Notice how the CPU cores are virtually 100% busy after all million threads finish napping for 12 seconds.
So all the work is effectively being done immediately as all million threads were simultaneously taking their 12 second nap while the platform/kernel threads were taking their naps serially in groups of five. We see in that screenshot above how the work of the million tasks is being done all at once in a matter of seconds while platform/kernel threads do the same amount of work, but spread it out over days.
Note that this kind of dramatic performance increase only occurs when your tasks are often blocked. If using CPU-bound tasks, such as video-encoding, then you should use platform/kernel threads rather than virtual threads. Most business apps see much blocking, such as waiting for calls to the file system, database, other external services, or the network to access remote services. Virtual threads shine in that kind of often-blocked workload.
sleep
will park the virtual thread but release the carrier thread, it does not simulate “processing work taking place” but just a blocking operation taking no CPU time at all. This is not what we normally understand by the phrase “processing work taking place”. – Dicephalous