Can Boost ASIO be used to build low-latency applications?
Asked Answered
A

2

25

Can Boost ASIO be used to build low-latency applications, such as HFT (High Frequency Trading)?

  • So Boost.ASIO uses platform-specific optimal demultiplexing mechanism: IOCP, epoll, kqueue, poll_set, /dev/poll

  • Also can be used Ethernet-Adapter with TOE (TCP/IP offload engine) and OpenOnload (kernel-bypass BSD sockets).

But can Low-latency application be built by using Boost.ASIO + TOE + OpenOnload?

Arrington answered 8/6, 2017 at 23:19 Comment(0)
F
19

I evaluated Boost Asio for use in high frequency trading a few years ago. To the best of my knowledge the basics are still the same today. Here are some reasons why I decided not to use it:

  1. Asio relies on bind() style callbacks. There is some overhead here.
  2. It is not obvious how to arrange certain low-level operations to occur at the right moment or in the right way.
  3. There is rather a lot of complex code in an area which is important to optimize. It is harder to optimize complex, general code for specific use cases. Thinking that you will not need to look under the covers would be a mistake.
  4. There is little to no need for portability in HFT applications. In particular, having "automatic" selection of a multiplexing mechanism is contrary to the mission, because each mechanism must be tested and optimized separately--this creates more work rather than reducing it.
  5. If a third-party library is to be used, others such as libev, libevent, and libuv are more battle-hardened and avoid some of these downsides.

Related: C++ Socket Server - Unable to saturate CPU

Flit answered 8/6, 2017 at 23:50 Comment(7)
I can see pts. 5. and (somewhat) 4. About 2. I agree when it comes to correctly handling strands in combination with custom handler types. Is that what you mean by "in the right way"? I don't really see how (1.) bind style callbacks are required, let alone where the minimal overhead could be avoided with async completions. (3.) Are you are referring to library code or calling code? Maybe you have a compelling example?Vocable
@sehe: I don't have example code using Boost.Asio - we discarded the prototypes we built with it when we realized it wasn't going to work. Back then libuv was not viable, but libev and libevent were--still we chose to go our own way. Some of what I mean by "in the right way" is the need to (a) use specific sockopts/ioctls at specific times, (b) have complete control over prioritization (including willful starvation) of multiple connections, and (c) use specific, nonportable APIs like accept4() (which Asio even today doesn't use, hence cannot do atomic SOCK_CLOEXEC).Flit
Ah. By "in the right way" you meant in relation to external requirements (I had assumed you meant "for Asio"). Atomic SOCK_CLOEXEC is one of those that I ran into and had to work around. I don't expect any sample code, just examples like these that clarify the scenarios you have in mind with the abstract bullet texts :)Vocable
Starvation is indeed an issue. In fact, I hate that I cannot control handler queue depth (without fancy extensions or intermediate queuing - which would be a lot of overhead). FWIW I rarely (never?) use bind with Asio. I don't see the (non-tutorial level) documentation suggesting it a lot. When you look at e.g. sample composed operations, these are usually dedicated handler types, which I don't think incur any overhead either.Vocable
About SOCK_CLOEXEC, do you mean this problem #21530040 or something else? I.e. is there no possible to set SOCK_CLOEXEC for asio-socket even by using native_handle_type sock_fd = asio_socket.native_handle(); fcntl(sock_fd, F_SETFD, fcntl(sock_fd, F_GETFD) | FD_CLOEXEC); boost.org/doc/libs/1_53_0/doc/html/boost_asio/reference/ip__tcp/…Arrington
@Alex: Yes the SOCK_CLOEXEC problem is the same one described in your link. The workaround code you show is sort of OK but not atomic so not safe in some multi-threaded applications.Flit
One crucial issue would be able to prevent asio from doing any heap allocations.Myrmeco
F
37

This is the advice from the Asio author, posted to the public SG-14 Google Group (which unfortunately is having issues, and they have moved to another mailing list system):

I do work on ultra low latency financial markets systems. Like many in the industry, I am unable to divulge project specifics. However, I will attempt to answer your question.

In general:

  • At the lowest latencies you will find hardware based solutions.

  • Then: Vendor-specific kernel bypass APIs. For example where you encode and decode frames, or use a (partial) TCP/IP stack implementation that does not follow the BSD socket API model.

  • And then: Vendor-supplied drop-in (i.e. LD_PRELOAD) kernel bypass libraries, which re-implement the BSD socket API in a way that is transparent to the application.

Asio works very well with drop-in kernel bypass libraries. Using these, Asio-based applications can implement standard financial markets protocols, handle multiple concurrent connections, and expect median 1/2 round trip latencies of ~2 usec, low jitter and high message rates.

My advice to those using Asio for low latency work can be summarised as: "Spin, pin, and drop-in".

Spin: Don't sleep. Don't context switch. Use io_service::poll() instead of io_service::run(). Prefer single-threaded scheduling. Disable locking and thread support. Disable power management. Disable C-states. Disable interrupt coalescing.

Pin: Assign CPU affinity. Assign interrupt affinity. Assign memory to NUMA nodes. Consider the physical location of NICs. Isolate cores from general OS use. Use a system with a single physical CPU.

Drop-in: Choose NIC vendors based on performance and availability of drop-in kernel bypass libraries. Use the kernel bypass library.

This advice is decoupled from the specific protocol implementation being used. Thus, as a Beast user you could apply these techniques right now, and if you did you would have an HTTP implementation with ~10 usec latency (N.B. number plucked from air, no actual benchmarking performed). Of course, a specific protocol implementation should still pay attention to things that may affect latency, such as encoding and decoding efficiency, memory allocations, and so on.

As far as the low latency space is concerned, the main things missing from Asio and the Networking TS are:

  • Batching datagram syscalls (i.e. sendmmsg, recvmmsg).

  • Certain socket options.

These are not included because they are (at present) OS-specific and not part of POSIX. However, Asio and the Networking TS do provide an escape hatch, in the form of the native_*() functions and the "extensible" type requirements.

Cheers, Chris

Frescobaldi answered 23/2, 2019 at 22:44 Comment(2)
This seems like it should be part of your other answer, not a separate answer.Flit
Thank! If someone is interested in the "Spin, Pin and Drop-In" technique on specific examples of commands and devices alexeyab.com/2017/04/the-fastest-interconnect-for-hundreds.htmlArrington
F
19

I evaluated Boost Asio for use in high frequency trading a few years ago. To the best of my knowledge the basics are still the same today. Here are some reasons why I decided not to use it:

  1. Asio relies on bind() style callbacks. There is some overhead here.
  2. It is not obvious how to arrange certain low-level operations to occur at the right moment or in the right way.
  3. There is rather a lot of complex code in an area which is important to optimize. It is harder to optimize complex, general code for specific use cases. Thinking that you will not need to look under the covers would be a mistake.
  4. There is little to no need for portability in HFT applications. In particular, having "automatic" selection of a multiplexing mechanism is contrary to the mission, because each mechanism must be tested and optimized separately--this creates more work rather than reducing it.
  5. If a third-party library is to be used, others such as libev, libevent, and libuv are more battle-hardened and avoid some of these downsides.

Related: C++ Socket Server - Unable to saturate CPU

Flit answered 8/6, 2017 at 23:50 Comment(7)
I can see pts. 5. and (somewhat) 4. About 2. I agree when it comes to correctly handling strands in combination with custom handler types. Is that what you mean by "in the right way"? I don't really see how (1.) bind style callbacks are required, let alone where the minimal overhead could be avoided with async completions. (3.) Are you are referring to library code or calling code? Maybe you have a compelling example?Vocable
@sehe: I don't have example code using Boost.Asio - we discarded the prototypes we built with it when we realized it wasn't going to work. Back then libuv was not viable, but libev and libevent were--still we chose to go our own way. Some of what I mean by "in the right way" is the need to (a) use specific sockopts/ioctls at specific times, (b) have complete control over prioritization (including willful starvation) of multiple connections, and (c) use specific, nonportable APIs like accept4() (which Asio even today doesn't use, hence cannot do atomic SOCK_CLOEXEC).Flit
Ah. By "in the right way" you meant in relation to external requirements (I had assumed you meant "for Asio"). Atomic SOCK_CLOEXEC is one of those that I ran into and had to work around. I don't expect any sample code, just examples like these that clarify the scenarios you have in mind with the abstract bullet texts :)Vocable
Starvation is indeed an issue. In fact, I hate that I cannot control handler queue depth (without fancy extensions or intermediate queuing - which would be a lot of overhead). FWIW I rarely (never?) use bind with Asio. I don't see the (non-tutorial level) documentation suggesting it a lot. When you look at e.g. sample composed operations, these are usually dedicated handler types, which I don't think incur any overhead either.Vocable
About SOCK_CLOEXEC, do you mean this problem #21530040 or something else? I.e. is there no possible to set SOCK_CLOEXEC for asio-socket even by using native_handle_type sock_fd = asio_socket.native_handle(); fcntl(sock_fd, F_SETFD, fcntl(sock_fd, F_GETFD) | FD_CLOEXEC); boost.org/doc/libs/1_53_0/doc/html/boost_asio/reference/ip__tcp/…Arrington
@Alex: Yes the SOCK_CLOEXEC problem is the same one described in your link. The workaround code you show is sort of OK but not atomic so not safe in some multi-threaded applications.Flit
One crucial issue would be able to prevent asio from doing any heap allocations.Myrmeco

© 2022 - 2024 — McMap. All rights reserved.