Why do I see a performance degradation in network traffic between Java 8 and Java 21?
Asked Answered
P

2

10

We're creating a heavy-load network-traffic-centric application and run those server quite successful for many, many years under Java 8. Network-traffic-centric means that quite often the server has to handle up to 700 MBit/s.

Now we'd like to switch to Java 21.

I can confirm that Java 13 behaves performance-wise like Java 8 while Java 21 behaves like Java 14. So a change obviously took place from Java 13 to Java 14. I did my tests using Azul Zulu but also tried another implementation to assure it's not a problem of Zulu.

While evaluating we saw, that Java 21 behaves worse performance-wise than Java 8 which surprised us quite a lot .

I created a sample in which you can see the effect:

Main class

package senderreceiverbenchmark;

import java.io.*;
import java.net.*;
import java.util.concurrent.*;

public class SenderReceiverBenchmark
{
    public static void main(String[] args) throws IOException
    {
        ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();
        Statistics statistics = null;
        
        switch (args.length)
        {
            case 1: //receiver mode
            {
                System.out.println( "Receiver waiting at port " + Integer.valueOf(args[0]));
        
                statistics = new Statistics("Received");
                executorService.scheduleAtFixedRate(statistics, 10, 10, TimeUnit.SECONDS);
        
                ServerSocket serverSocket = new ServerSocket(Integer.parseInt(args[0]));
                
                ExecutorService executorServiceReceiver = Executors.newCachedThreadPool();
                
                Socket socket;
                while((socket = serverSocket.accept()) != null)
                {
                    executorServiceReceiver.submit(new Receiver(socket.getInputStream(), statistics));
                }
                
                break;
            }
            case 4: //sender mode
            {
                System.out.println( "Sending to " + args[0] + ":" + Integer.valueOf(args[1]) + " with [" + Integer.valueOf(args[2]) + "] connections and framesize [" + Integer.valueOf(args[3]) + " KB]");
        
                statistics = new Statistics("Send");
                executorService.scheduleAtFixedRate(statistics, 10, 10, TimeUnit.SECONDS);
        
                ExecutorService executorServiceSender = Executors.newFixedThreadPool(Integer.parseInt(args[2]));
                long SLEEP_TIME_BETWEEN_SENDING = 50;
                for (int i = 0; i < Integer.parseInt(args[2]); i++) //creating independant sender ...
                {
                    executorServiceSender.submit(new Sender(args[0], Integer.parseInt(args[1]), Integer.parseInt(args[3]), SLEEP_TIME_BETWEEN_SENDING, statistics));
                }
                
                break;
            }
            default:
                System.out.println( "For Receiver use: LoopbackBenchmark <ServerSocket>" );
                System.out.println( "For Sender use: LoopbackBenchmark <host> <port> <NumberOfConnections> <Framesize KB>" );
                System.exit(-1);
                break;
        }
    }
}

Sender:

package senderreceiverbenchmark;

import java.io.*;
import java.net.Socket;
import java.net.SocketException;
import java.util.concurrent.Callable;

public class Sender implements Callable<Object>
{
    private final OutputStream outputStream;
    private final Statistics statistics;
    private final byte[] preallocatedRandomData = new byte[65535];
    private final long sleepTime;

    public Sender(String host, int port, int framesizeKB, long sleepTimeBetweenSend, Statistics statistics) throws SocketException, IOException
    {
        this.statistics = statistics;
        
        Socket socket = new Socket( host, port );

        outputStream = socket.getOutputStream();
        this.sleepTime = sleepTimeBetweenSend;
    }

    @Override
    public Object call() throws Exception
    {
        statistics.handledConections.addAndGet(1);
        
        while (true)
        {
            this.outputStream.write(preallocatedRandomData);
            statistics.overallData.addAndGet(preallocatedRandomData.length);
            Thread.sleep(sleepTime);
        }
    }
}

Receiver:

package senderreceiverbenchmark;

import java.io.*;
import java.util.concurrent.Callable;

public class Receiver implements Callable<Object>
{
    private final InputStream inputStream;
    private final Statistics statistics;
    private final byte[] buffer = new byte[65535];
    
    public Receiver(InputStream inputStream, Statistics statistics)
    {
        this.inputStream = inputStream;
        this.statistics = statistics;
    }
    
    @Override
    public Object call() throws Exception
    {
        statistics.handledConections.addAndGet(1);
        
        while (true)
        {
            int readBytes = this.inputStream.read(buffer);
            if( readBytes > 0 )
            {
                statistics.overallData.addAndGet(readBytes);
            }
        }
    }
}

A bit statistics:

package senderreceiverbenchmark;

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

public class Statistics implements Runnable
{
    public final AtomicLong overallData = new AtomicLong(0L);
    public final AtomicLong handledConections = new AtomicLong(0L);
    private final String mode;
    private long previousRun = System.currentTimeMillis();
    
    public Statistics(String tag)
    {
        this.mode = tag;
    }

    @Override
    public void run()
    {
        long dataSentPerSecond = overallData.get() / TimeUnit.MILLISECONDS.toSeconds((System.currentTimeMillis() - previousRun));

        System.out.println(mode + ", Connections: " + handledConections.get() + ", Sent overall: " + dataSentPerSecond / (1024*1024) + " MB/s" );

        overallData.set(0);
        previousRun = System.currentTimeMillis();
    }
}

Forgive me the sample has no (good) error handling but should be fine for demonstration purposes.

  1. Now start first the receiver:

    Benchmark.bat 4711
    
  2. Then start the sender:

    Benchmark.bat 127.0.0.1 4711 300 128
    

This is now starting up 300 sender threads sending every 50ms a packet of 128KB data to the receiver.

When you first doing that with Java 8 as runtime and then with Java 21 as runtime you will see something like this:

CPU load Java 8 vs Java 21

The first half is showing the sample application running on Java 8, the second half on Java 21.
Compared to Java 8 the newer Java 21 needs 10%-15% more CPU power.

Can someone explain where this comes from and what I can do about it?

Update: As some of the commenters couldn't reproduce it I ask colleagues to run the sample to get a wider test range.

10 other guys beside of my own test DO SEE the effect very clearly. On 2 VMs and one physical machine I can't see the effect.

Any how I don't see a commondenominator whyit's there or not. CPU are from Intel/AMD, OS were Win 10, Win 11, Server 2012, Server 2019.

Also I tried beside the Azul Zulu builds the buildfrom MS and from OpenLogic but changing the builds had no effect.

Solution: The hint to JEP 353 pushed me into the right direction. I still don't get it why Java 13 behaves the same as Java 8 even there the JEP 353 was done but anyway this hint inspired me.

What I did was, that I changed my sample application above.

Instead of

ExecutorService executorServiceReceiver = Executors.newCachedThreadPool();

I used

ExecutorService executorServiceReceiver = Executors.newVirtualThreadPerTaskExecutor();

Same I did for executorServiceSender.

After that I see very clearly that Java 21 behaves better than Java 8.

Have a look to the screenshot: Black rectangle is Java 8, red rectangle is Java 21 with platform threads and green rectangle is Java 21 with virtual threads.

enter image description here

Needless to say the number of used platform/OS-Threads overall in the system is much lower. I

Thanks for all the constructive comments pushing me into the right direction.

Polis answered 29/8, 2024 at 11:43 Comment(13)
What does the runtime performance look like? I would think that using more CPU indicates it is more effectively utilizing the hardware. It's a network application. Those are typically IO bound (not CPU bound).Trichosis
Not sure if I'm getting your question? When running my sample you'll see it needs more CPU for doing the same stuff resulting in same transferred data which can be seen as "identical runtime performance". Our "real-world" application shows same runtime performace BUT on higher load when usiing Jasva 21 which results in higher costs for the customer cause he needs more hardware and also it would mean when upgrading an existing system to Java 21 he might has to replace the hardware. so at least I need to understand why a newer version behaves worse. Hope I get yourpoint.Polis
You say it "behaves worse". It does not. The operating system reports that it uses more of the available CPU time. That is not a bad thing in and of itself. You need to perform more testing on a wider array of platforms. One change I know of was the switch from the old fashioned Intel FPU code to SSE instructions for floating point math. Which only effected Intel based JVMs. But I assume that's what you're using. I don't know.Trichosis
Well, our application is only used on Windows plattform. Here we see same effect on Windows 11 and on Server OS. At least it's same effect on AMD Ryzen and AMD EPYC. But might be a good proposal to test on Intel explicitly.Polis
And I disagree......it behaves worse. There is no benefit in higher CPU load onJava 21 but it uses more resources of the given machine meaning it can handle less load in total cause CPU earlier hits the roof.Polis
I was thinking ARM. And by Intel I meant x86. The JVM used to use 80-bit extended precision IEEE-754 by default on x86. That was controllable with strictfp which is now a no-op.Trichosis
Use a Java profiler, compare both executions and see if you can identify where the CPU is being consumed differently.Boyish
Did you study the Java 14 Release Notes for a change that might impact your app?Impeccable
A screenshot of Window’s TaskManager is far away from a performance analysis. Not to speak of all the possible influences that can affect the outcome which you didn’t preclude. Even the fact that one test ran after the other could be enough to have the second test run slower (with more CPU consumption) because the CPU is now hotter and can’t use the maximum clock speed.Theocrasy
Don't use Task Manager to evaluate the CPU. Use a tool like Java's VisualVM and observe the actual Java process CPU and memory usageBoyish
I tried to keep it quite simple here. Thanks a lot for all the comments and yes, a Taskmanager screenshot is not a performance analysis. But you can trust me we did a lot of testing over months now and there is in fact a different behavior running the same code. VisualVM shows the same. Yes, I read the complete Java 14 changelog and haven't found a hint. Also we used a profiler and not surprisingly most of the time the application executes the read() on Inputstream. But all that doesn't explain why we see a higher CPU usage on Java 21 compared to Java 8 and that's the reason for this post.Polis
Does the problem happen on Windows only? I could not reproduce described behavior on Linux using your sample code.Anastaciaanastas
So far I have only tried with Windows cause only that platform is used by our customers which makes only Windows relevant for us. I asked more and more people to run the application and on some machines there is no effect but I haven't found acommon denominator. We saw it under Win10 and Win 11. We saw it on different Intels but also on different AMDs. I have not see it on a WinServer 2012 but don't know yet if this was related to OS. Still investigating. Just wanted to keep you up to date.Polis
S
1

I have tried using InteliJ profiler to run your application several times but could not reproduce consistently any cpu performance issue as to make a solid case.

However in your example you use the

  • java.net.Socket
  • java.net.ServerSocket

The following changes could explain performance differences in specific scenarios while using JDK13 and later versions when compared to JDK8.

Socket and ServerSocket have been reimplemented in JDK13 according to JEP 353 as to prepare the ground for virtual threads of project loom. If you inspect close inside JEP 353, you will find the following:

Aside from behavioral differences, the performance of the new implementation may differ to the old when running certain workloads. In the old implementation several threads calling the accept method on a ServerSocket will queue in the kernel. In the new implementation, one thread will block in the accept system call, the others will queue waiting to acquire a java.util.concurrent lock. Performance characteristics may differ in other scenarios too.

Scaremonger answered 30/8, 2024 at 14:33 Comment(3)
Also from the JEP: "The implementation uses the thread stack as the I/O buffer, an approach that has required increasing the default thread stack size on several occasions." Using memory other than stack memory certainly seems like an almost-guaranteed way to run slower. If the new implementation isn't using stack memory, it has to actively get memory from some other source. That's not free.Isthmian
Hey guys, thanks a lot for runningmy sample. The hints are valuable but as you lined out the changes took place in Java 13 which is showing on my side same results as Java 8. The changed behaviour I see in starting in Java 14. Anyhow I would like to learn why you can't see this effect. Can you line out a bit what machine you run that on? Have you tried without profiler, just from command line as well? (Sometimes I see in general different behaviour when comparing stuff running in profiler and without.)Polis
Thanks for that hint. I still don't get it as JEP353 was done in Java13 and there I don't see a difference but obviously it has been prework for virtual threads and that pushed me into the right direction. I'll edit the original post and extend it with the solution I found. Thanks a lot, guys!Polis
E
-1

I know how frustrating it can be to get to the bottom of mysterious performance issues. I don’t have an answer but hopefully a clue down some direction.

In a recent YouTube video from Java, there was some elaboration around the slow down effects introduced. String concatenation generates classes for each variation. Maybe it has something to do with that.

I’m happy to delete this answer once you do discover the issue. Let me know in the comments if you can rule out the string concatenation cause.

https://youtu.be/ThtrTwooKDc?si=cG3c6cg1J-jDsxAd

Here are some other recent videos from Java that maybe also give us a clue in this quest.

https://youtu.be/52E9bZvoB-g?si=B1OZ5Itl_APNdO-v

https://youtu.be/JI09cs2yUgY?si=gzZNgNYgI8sc_G2F

Ejection answered 4/9, 2024 at 3:55 Comment(1)
Hey Luke, thanksfor your comments. As my sample code doesn't do any String operation explicitly I wouldn't see this is a root cause. Anyhow very interesting points. I made yesterday a good progress and I will later the day update the post about my findings.Polis

© 2022 - 2025 — McMap. All rights reserved.