Highly Concurrent Apache Async HTTP Client IOReactor issues
Asked Answered
S

1

7

Application description :

  • I'm using Apache HTTP Async Client ( Version 4.1.1 ) Wrapped By Comsat's Quasar FiberHttpClient ( version 0.7.0 ) in order to run & execute a highly concurrent Java application that uses fibers to internally send http requests to multiple HTTP end-points
  • The Application is running on top of tomcat( however , fibers are used only for internal request dispatching. tomcat servlet requests are still handled the standard blocking way )
  • Each external request opens 15-20 Fibers internally , each fiber builds an HTTP request and uses the FiberHttpClient to dispatch it
  • I'm using a c44xlarge server ( 16 cores ) to test my application
  • The end-points i'm connecting to preempt keep-alive connections, meaning if I try to maintain by resusing sockets , conncetions get closed during requests execution attempts. Therefor , I disable connection recycling.
  • According to the above sections, here's the tunning for my fiber http client ( which of course I'm using a single instance of ):

    PoolingNHttpClientConnectionManager connectionManager = 
    new PoolingNHttpClientConnectionManager(
        new DefaultConnectingIOReactor(
            IOReactorConfig.
                custom().
                setIoThreadCount(16).
                setSoKeepAlive(false).
                setSoLinger(0).
                setSoReuseAddress(false).
                setSelectInterval(10).
                build()
                )
        );
    
    connectionManager.setDefaultMaxPerRoute(32768);
    connectionManager.setMaxTotal(131072);
    FiberHttpClientBuilder fiberClientBuilder = FiberHttpClientBuilder.
            create().
            setDefaultRequestConfig(
                    RequestConfig.
                    custom().
                    setSocketTimeout(1500).
                    setConnectTimeout(1000).
                    build()
            ).
           setConnectionReuseStrategy(NoConnectionReuseStrategy.INSTANCE).
           setConnectionManager(connectionManager).
           build();
    
  • ulimits for open-files are set super high ( 131072 for both soft and hard values )

  • Eden is set for 18GB , Total heap size is 24GB
  • OS Tcp stack is also well tuned :

kernel.printk = 8 4 1 7 kernel.printk_ratelimit_burst = 10 kernel.printk_ratelimit = 5 net.ipv4.ip_local_port_range = 8192 65535 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.optmem_max = 40960 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.core.netdev_max_backlog = 100000 net.ipv4.tcp_max_syn_backlog = 100000 net.ipv4.tcp_max_tw_buckets = 2000000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 1

Problem description

  • Under low-medium load all is well , connections are leased , cloesd and the pool replenishes
  • Beyond some concurrency point , the IOReactor Threads ( 16 of them ) seem to stop functioning properly, prior to dying.
  • I've written a small thread to get the pool stats and print them each second. At around 25K leased connections , actual data is not sent anymore over the socket connections , The Pending stat clibms to a sky-rocketing 30K pending connection requests as well
  • This situation persists and basically renders the application useless. At some point the I/O Reactor threads die, not sure when and I haven't been able to catch the exceptions so far
  • lsofing the java process , I can see it has tens of thousands of file descriptors , almost all of them are in CLOSE_WAIT ( which makes sense , as the I/O reactor thread die/stop functioning and never get to actually closing them
  • During the time the application breaks, the server is not heavily overloaded/cpu stressed

Questions

  • I'm guessing I am reaching some sort of boundary somewhere , though I'm rather clueless as to what or where it may reside. Except from the following
  • Is it possible I'm reaching an OS port ( all applicative requests are originating from a single internal IP after all) limits and creates an error that sends IO Reactor threads to die ( something similar to open files limit errors ) ?
Saddlebag answered 21/10, 2016 at 15:53 Comment(0)
S
5

Forgot to answer this, but I got what's going on roughly a week after posting the question :

  1. There was some sort of miss-configuration that caused the io-reactor to spawn with only 2 threads.

  2. Even after providing more reactor threads, the issue persisted. It turns out that our outgoing requests were mostly SSL. Apache SSL connection handling propagates the core handling to the JVM's SSL facilities which simply - are not efficient enough for handling thousands of SSL connections requests per second. Being more specific, some methods inside SSLEngine(If I recall correctly) are synchronized. doing thread-dumps under high loads shows the IORecator threads blocking each-other while trying to open SSL connections.

  3. Even trying to create a pressure release valve in the form of connection lease-timeout didn't work because the backlogs created were to large, rendering the application useless.

  4. Offloading SSL outgoing requests handling to nginx performed even worse - because the remote endpoints are terminating the requests preemptively, SSL client session cache could not be used ( same goes for the JVM implementation ).

Wound up putting a semaphore in-front of the entire module, limiting the whole thing to ~6000 at any given moment, which solved the issue.

Saddlebag answered 6/3, 2017 at 6:22 Comment(7)
Hi, I'm trying to find out how to configure the number of IOReactor threads. My application always start with just one of them. The dispatchers threads are created properly through the "IOReactorConfig.custom().setIoThreadCount()". Can you share your config?Phlogistic
Not sure I got your issue IOReactorConfig.custom().setIoThreadCount() is what sets the reactor thread count. If that works properly, you should as much threads as set by this method. Please share your entire configurationSaddlebag
The IOReactorConfig.custom().setIoThreadCount() sets the number of dispatcher threads. I was trying to increase the number of threads that iterate through the events loop, but that doesn't seem the idea behind the reactor pattern, right? It uses a single thread for that. My specific issue is that my application's throughput is very poor. I have free CPU and memory, but seems that http-client keeps working at the same rate, regardless of the amount of dispatchers threads.Phlogistic
Seems that my throughput problem was related with the backend server performing poorly. After correcting it, NIO's performance became very good.Phlogistic
@Phlogistic it's great that you resolved your issue. What you referred to was the "acceptor thread", I believe and yes you usually don't see more than one or two of these.Saddlebag
@Saddlebag can you throw more light on point 4... what does "perform even worse" refer herePalstave
Nginx (and the JVM) has an SSL session cache optimization on the server side. Nginx shares this cache between its workers to even greater optimize performance. AFAIK, SSL session cache works for both initiator and terminator (client & server). However, this optimization is only the scenarios where client connect as part of a long-lived session (say, when client is a real user browsing on a secure site as part of an authenticated session). For my use case, these were server to server connection, where the remote end kept closing SSL connection very swiftly, Nginx gave up rather quicklySaddlebag

© 2022 - 2024 — McMap. All rights reserved.