We're creating a heavy-load network-traffic-centric application and run those server quite successful for many, many years under Java 8. Network-traffic-centric means that quite often the server has to handle up to 700 MBit/s.
Now we'd like to switch to Java 21.
I can confirm that Java 13 behaves performance-wise like Java 8 while Java 21 behaves like Java 14. So a change obviously took place from Java 13 to Java 14. I did my tests using Azul Zulu but also tried another implementation to assure it's not a problem of Zulu.
While evaluating we saw, that Java 21 behaves worse performance-wise than Java 8 which surprised us quite a lot .
I created a sample in which you can see the effect:
Main class
package senderreceiverbenchmark;
import java.io.*;
import java.net.*;
import java.util.concurrent.*;
public class SenderReceiverBenchmark
{
public static void main(String[] args) throws IOException
{
ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();
Statistics statistics = null;
switch (args.length)
{
case 1: //receiver mode
{
System.out.println( "Receiver waiting at port " + Integer.valueOf(args[0]));
statistics = new Statistics("Received");
executorService.scheduleAtFixedRate(statistics, 10, 10, TimeUnit.SECONDS);
ServerSocket serverSocket = new ServerSocket(Integer.parseInt(args[0]));
ExecutorService executorServiceReceiver = Executors.newCachedThreadPool();
Socket socket;
while((socket = serverSocket.accept()) != null)
{
executorServiceReceiver.submit(new Receiver(socket.getInputStream(), statistics));
}
break;
}
case 4: //sender mode
{
System.out.println( "Sending to " + args[0] + ":" + Integer.valueOf(args[1]) + " with [" + Integer.valueOf(args[2]) + "] connections and framesize [" + Integer.valueOf(args[3]) + " KB]");
statistics = new Statistics("Send");
executorService.scheduleAtFixedRate(statistics, 10, 10, TimeUnit.SECONDS);
ExecutorService executorServiceSender = Executors.newFixedThreadPool(Integer.parseInt(args[2]));
long SLEEP_TIME_BETWEEN_SENDING = 50;
for (int i = 0; i < Integer.parseInt(args[2]); i++) //creating independant sender ...
{
executorServiceSender.submit(new Sender(args[0], Integer.parseInt(args[1]), Integer.parseInt(args[3]), SLEEP_TIME_BETWEEN_SENDING, statistics));
}
break;
}
default:
System.out.println( "For Receiver use: LoopbackBenchmark <ServerSocket>" );
System.out.println( "For Sender use: LoopbackBenchmark <host> <port> <NumberOfConnections> <Framesize KB>" );
System.exit(-1);
break;
}
}
}
Sender:
package senderreceiverbenchmark;
import java.io.*;
import java.net.Socket;
import java.net.SocketException;
import java.util.concurrent.Callable;
public class Sender implements Callable<Object>
{
private final OutputStream outputStream;
private final Statistics statistics;
private final byte[] preallocatedRandomData = new byte[65535];
private final long sleepTime;
public Sender(String host, int port, int framesizeKB, long sleepTimeBetweenSend, Statistics statistics) throws SocketException, IOException
{
this.statistics = statistics;
Socket socket = new Socket( host, port );
outputStream = socket.getOutputStream();
this.sleepTime = sleepTimeBetweenSend;
}
@Override
public Object call() throws Exception
{
statistics.handledConections.addAndGet(1);
while (true)
{
this.outputStream.write(preallocatedRandomData);
statistics.overallData.addAndGet(preallocatedRandomData.length);
Thread.sleep(sleepTime);
}
}
}
Receiver:
package senderreceiverbenchmark;
import java.io.*;
import java.util.concurrent.Callable;
public class Receiver implements Callable<Object>
{
private final InputStream inputStream;
private final Statistics statistics;
private final byte[] buffer = new byte[65535];
public Receiver(InputStream inputStream, Statistics statistics)
{
this.inputStream = inputStream;
this.statistics = statistics;
}
@Override
public Object call() throws Exception
{
statistics.handledConections.addAndGet(1);
while (true)
{
int readBytes = this.inputStream.read(buffer);
if( readBytes > 0 )
{
statistics.overallData.addAndGet(readBytes);
}
}
}
}
A bit statistics:
package senderreceiverbenchmark;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
public class Statistics implements Runnable
{
public final AtomicLong overallData = new AtomicLong(0L);
public final AtomicLong handledConections = new AtomicLong(0L);
private final String mode;
private long previousRun = System.currentTimeMillis();
public Statistics(String tag)
{
this.mode = tag;
}
@Override
public void run()
{
long dataSentPerSecond = overallData.get() / TimeUnit.MILLISECONDS.toSeconds((System.currentTimeMillis() - previousRun));
System.out.println(mode + ", Connections: " + handledConections.get() + ", Sent overall: " + dataSentPerSecond / (1024*1024) + " MB/s" );
overallData.set(0);
previousRun = System.currentTimeMillis();
}
}
Forgive me the sample has no (good) error handling but should be fine for demonstration purposes.
Now start first the receiver:
Benchmark.bat 4711
Then start the sender:
Benchmark.bat 127.0.0.1 4711 300 128
This is now starting up 300 sender threads sending every 50ms a packet of 128KB data to the receiver.
When you first doing that with Java 8 as runtime and then with Java 21 as runtime you will see something like this:
The first half is showing the sample application running on Java 8, the second half on Java 21.
Compared to Java 8 the newer Java 21 needs 10%-15% more CPU power.
Can someone explain where this comes from and what I can do about it?
Update: As some of the commenters couldn't reproduce it I ask colleagues to run the sample to get a wider test range.
10 other guys beside of my own test DO SEE the effect very clearly. On 2 VMs and one physical machine I can't see the effect.
Any how I don't see a commondenominator whyit's there or not. CPU are from Intel/AMD, OS were Win 10, Win 11, Server 2012, Server 2019.
Also I tried beside the Azul Zulu builds the buildfrom MS and from OpenLogic but changing the builds had no effect.
Solution: The hint to JEP 353 pushed me into the right direction. I still don't get it why Java 13 behaves the same as Java 8 even there the JEP 353 was done but anyway this hint inspired me.
What I did was, that I changed my sample application above.
Instead of
ExecutorService executorServiceReceiver = Executors.newCachedThreadPool();
I used
ExecutorService executorServiceReceiver = Executors.newVirtualThreadPerTaskExecutor();
Same I did for executorServiceSender
.
After that I see very clearly that Java 21 behaves better than Java 8.
Have a look to the screenshot: Black rectangle is Java 8, red rectangle is Java 21 with platform threads and green rectangle is Java 21 with virtual threads.
Needless to say the number of used platform/OS-Threads overall in the system is much lower. I
Thanks for all the constructive comments pushing me into the right direction.
strictfp
which is now a no-op. – Trichosis