Performance: Apache HttpAsyncClient vs multi-threaded URLConnection
Asked Answered
T

1

6

I am trying to choose the best approach for making a large number of http requests in parallel. Below are the two approaches I have so far:

  1. Using Apache HttpAsyncClient and CompletableFutures:

    try (CloseableHttpAsyncClient httpclient = HttpAsyncClients.custom()
    .setMaxConnPerRoute(2000).setMaxConnTotal(2000)
    .setUserAgent("Mozilla/4.0")
    .build()) {
    httpclient.start();
    HttpGet request = new HttpGet("http://bing.com/");
    long start = System.currentTimeMillis();
    CompletableFuture.allOf(
            Stream.generate(()->request).limit(1000).map(req -> {
                CompletableFuture<Void> future = new CompletableFuture<>();
                httpclient.execute(req, new FutureCallback<HttpResponse>() {
                    @Override
                    public void completed(final HttpResponse response) {
                        System.out.println("Completed with: " + response.getStatusLine().getStatusCode())
                        future.complete(null);
                    }
                    ...
                });
                System.out.println("Started request");
                return future;
    }).toArray(CompletableFuture[]::new)).get();
    
  2. Conventional thread-per-request approach:

    long start1 = System.currentTimeMillis();
    URL url = new URL("http://bing.com/");
    ExecutorService executor = Executors.newCachedThreadPool();
    
    Stream.generate(()->url).limit(1000).forEach(requestUrl ->{
        executor.submit(()->{
            try {
                URLConnection conn = requestUrl.openConnection();
                System.out.println("Completed with: " + conn.getResponseCode());
            } catch (IOException e) {
                e.printStackTrace();
            }
        });
        System.out.println("Started request");
    });
    

Across multiple runs, I noticed that the conventional approach was finishing almost twice as fast as the async/future approach.

Although I expected dedicated threads to run faster, is the difference supposed to be this remarkable or perhaps there is something wrong with the async implementation? If not, what is the right approach to go about here?

Theater answered 7/12, 2018 at 6:31 Comment(9)
I discarded the results just to make the code simple, you could assume reading of the results.Theater
Id it is the same destination I would argue the best performance is by not overrunning your partner with large amount of parallel connections. With a decent limit in Parallelität having synchronous requests are typically faster as the same number of async thread. (And yes nobody wants to hear that)Rondel
This was written with the natural assumption that the partner is scaled enough. Say you are writing a service which needs to hit a dependent service before returning its own response, in such cases, would you want 1000 threads to serve 1000 requests or would you want a few threads asynchronously serving multiple requests?Theater
In the first case, you makes a request and get HttpResponse Object. In the second case, you only create URLConnection but does not do any requests.Hauler
In my benchmarks I read the response status code before printing the time taken(using HttpResponse::getStatusLine().getStatusCode() and HttpURLConnection::getResponseCode() respectively. However I just noted that the time taken is roughly the same the moment you read the response stream as a string. I wonder why this is? Both shouldn't be reading the response body unless the stream is readTheater
*Update: On further testing, it seems that the threaded version is still faster when the stream is read as well.Theater
I'm not sure what you are asking: are you comparing the performance of HttpClient vs. URL.openConnection, or are you comparing the performance of CompletableFutures vs a ThreadPool? You seem to be doing both at once, which is confusing.Erotic
I am comparing the performance of async IO vs threaded IO. I just happened to use these two as reference implementations of each.Theater
You should consume response entity so connection can be reused. And also you should close connection in 2nd case.Labourer
P
5

The question in place is dependent on a lot of factors:

  • hardware
  • operating system (and its configuration)
  • JVM implementation
  • Network devices
  • Server behaviour

First question - is the difference supposed to be this remarkable?

Depends on the load, pool size and network but it could be way more than the observed factor of 2 in each of the directions (in favour of Async or threaded solution). According to your later comment the difference is more because of a misconduct, but for the sake of argument I'll explain the possible cases.

Dedicated threads could be quite a burden. (Interrupt handling and thread scheduling is done by the operating system in case you are are using Oracle [HotSpot] JVM as these tasks are delegated.) The OS/system could become unresponsive if there are too many threads and thus slowing your batch processing (or other tasks). There are a lot of administrative tasks regarding thread management this is why thread (and connection) pooling is a thing. Although a good operating system should be able to handle a few thousand concurrent threads, there is always the chance that some limits or (kernel) event occur.

This is where pooling and async behaviour comes in handy. There is for example a pool of 10 phisical threads doing all the work. If something is blocked (waits for the server response in this case) it gets in the "Blocked" state (see image) and the following task gets the phisical thread to do some work. When a thread is notified (data arrived) it becomes "Runnable" (from which point the pooling mechanism is able to pick it up [this could be the OS or JVM implemented solution]). For further reading on the thread states I recommend W3Rescue. To understand the thread pooling better I recommend this baeldung article.

Thread transitions

Second question - is something wrong with the async implementation? If not, what is the right approach to go about here?

The implementation is OK, there is no problem with it. The behaviour is just different from the threaded way. The main question in these cases are mostly what the SLA-s (service level agreements) are. If you are the only "customer of the service, then basically you have to decide between latency or throughput, but the decision will affect only you. Mostly this is not the case, so I would recommend some kind of pooling which is supported by the library you are using.

Third question - However I just noted that the time taken is roughly the same the moment you read the response stream as a string. I wonder why this is?

The message is most likely arrived completely in both cases (probably the response is not a stream just a few http package), but if you are reading the header only that does not need the response itself to be parsed and loaded on the CPU registers, thus reducing the latency of reading the actual data received. I think this is a cool represantation in latencies (source and source): Reach times

This came out as a quite long answer so TL.DR.: scaling is a really hardcore topic, it depends on a lot of things:

  • hardware: number of phisical cores, multi-threading capacity, memory speed, network interface
  • operating system (and its configuration): thread management, interruption handling
  • JVM implementation: thread management (internal or outsourced to OS), not to mention GC and JIT configurations
  • Network devices: some are limiting the concurrent connections from a given IP, some pools non HTTPS connections and act as proxies
  • Server behaviour: pooled workers or per-request workers etc

Most likely in your case the server was the bottleneck as both methods gave the same result in the corrected case (HttpResponse::getStatusLine().getStatusCode() and HttpURLConnection::getResponseCode()). To give a proper answer you should measure your servers performance with some tools like JMeter or LoadRunner etc and then size your solution accordingly. This article is more on DB connection pooling, but the logic is applicable in here as well.

Poss answered 10/12, 2018 at 20:43 Comment(6)
If the async implementation is correct, do you have a theory for why or when async is favored over the other? Maybe a case where async would be faster and a case where sync is faster? Also I don't think the server was the bottleneck since server wouldn't bias the latencies depending on whether the client hit it with multiple threads or in an async mannerTheater
"Most likely in your case the server was the bottleneck as both methods gave the same result in the corrected case". To be clear, when just the status code was read, the async version was 2x slower. On further testing, even reading the response is faster in the threaded version, albeit with less than 2x differenceTheater
We use mostly the async pattern in case you can do some useful calculation while you are waiting for the response. For example tell the user, that the message was sent and you are waiting for the response, or do some other tasks. I just realized that the code is not really async as you are calling .get() which waits in synchron for the Future<?> to complete.Poss
I am doing a get() on an aggregate CompletableFuture (meaning each future individually is run in a non blocking manner), essentially I expect this to give me a P100 for 1000 requests which came in parallel. I am actually looking for a specific reason for the first to be slower than the other. General principles of async programming and arbitrary listing of commonly known factors for performance is not really helping hereTheater
Isn't a .parallel() missing this case?Poss
I'm sorry I dint catch that, maybe a code example would help? If you are referring to Stream().generate().parallel() that is not really required here as I am spawning threads within the forEach(). So you are basically asking to parallelize the creation of threadsTheater

© 2022 - 2024 — McMap. All rights reserved.