in HFT does it make sense to try to parallel orders processing?
Asked Answered
A

1

5

Well I assume this is more theoretical question for those who familar with hft. I receive orders from FAST and process them. I receive about 2-3 thousands orders per second. The question is if I should try to process them synchronous or asynchronous.

Every time I receive next order I need to do following:

  • update orderbook of corresponding instrument
  • update indexes and indicators that depends on that order
  • update strategies and schedule some actions if required (buy/sell something etc.)

To do that synchronous I have about 200-300 µs (to be able to process 3000 orders per second). That should be enough I think.

Just to schedule asynchronous task I spent I think ~30 µs

Pros and cons:

Synchronous:

  • ++ don't need to synchronize things!
  • ++ delay between "order is received" and "actions is taken" is less because don't need to schedule tasks or pass data/work to another process (very important in hft!).
  • -- however "order is received" action may be delayed because we can wait in socket buffer waiting for previous order to process

Asynchronous:

  • ++ ability to use the power of modern servers (my server has 24 cores for example)
  • ++ in some scenarios faster, because don't wait while previous message is processed.
  • ++ can process more messages or can do more "complex" things per message
  • -- need to synchronize a lot of things what can slow-down program

Example of synchronization: We receive MSFT order updated and then INTC order update and process them in different threads. In both cases we trigger NASDAQ index recalculation. So NASDAQ index calculation should be synchronized. However this particular problem can be workarounded to avoid synchronization... It's just an example of possible synchronization.

So the question is should I process orders updates synchronous or asynchronous. So far I process them asynchronous and I have dedicated thread per instrument. Because I can process asynchronous two updated for different instruments (MSFT and INTC), but two updates for one instrument (MSFT) should be processed synchronous.

Agribusiness answered 3/7, 2012 at 7:19 Comment(1)
Have you considered using thread pool?Aircondition
I
4

I receive orders from FAST and process them. I receive about 2-3 thousands orders per second

Really? You work at an exchange? Becuase seriously, I get data from 5 exchanges, but those aren ot orders ;) I suggest you get your term in line - you get 2-3 thousand EVENTS, but I really doubt you get ORDERS.

Have you ever thought of doing a multi stage processing setup? I.e. you get data in 2 thread, hand it over to another thread to find the instrument (id instead strings), hand it over to another thread to update order book, hand it over to another thread to do indicators, hand irt over to X threads to do strategies?

No need to schedule tasks al lthe time, just synced queues with one tas processing messages on each of them. Can be super fficient with a no-lock approach.

Brutally speaking: I am all for multi threaded, but all in core processing must maintain cardinality, so classical multi threading is out. Why? I need fully repeatable processing, so that unit tests get determined output.

So far I process them asynchronous and I have dedicated thread per instrument

You do not trade a LOT, right? I mean, I track about 200.000 instruments (5 complete exchanges). Allocating 200.000 threads would be - ah - prohibitive ;)

GO staged pipeline - that means that the core loops can be small and you can distribute them to enough cores that you are a lot more scalable. THen properly optimize - for example it is quite common for updates of one instrument to come followed by another update for the SAME instrument (for example multiple executions while a large order executes). Take advantage of that.

Imamate answered 3/7, 2012 at 7:26 Comment(12)
it seems you know a lot about subject :) probably you can suggest how can I design my application in more details, probably you can add some links. Yes I process not too much instruments, less then 100, from only one exchange so far. However I need minimal delays as I trading HFT-arbitrage, and everything else already optimized (best colocation, fast hardware etc.). I think I will introduce separate thread for main index calculation. I think I should not notify it every time some stock from index changes, instead I just will recalculate it always. Probably 10 000 times per second.Agribusiness
i'm working not on the main exchange so we don't have as much order as CME NASDAQ etc. has :)Agribusiness
Have you ever thought of doing a multi stage processing setup? I.e. you get data in 2 thread, hand it over to another thread to find the instrument (id instead strings), hand it over to another thread to update order book, hand it over to another thread to do indicators, hand irt over to X threads to do strategies? No need to schedule tasks al lthe time, just synced queues with one tas processing messages on each of them. Can be super fficient with a no-lock approach. do you do that way in you program? any examples? references? articles?Agribusiness
Yes, I do it like that. No, not many examples outside, sadly. And no, I wont write any for you ouside a quite expensive contract job. It is really quite easy, though - just make loiical steps and detach them. First step should JUST be "take message from FIX interface" so that this loop is as tight as possible, then go from there in stages. My last high perofrmance system had 6 stages and one used 140 threads, then we resynced after again ;)Imamate
What do you use to extract objects from synched Query? do you use "infinity loop"? Have you considired using BlockingCollection? for example look at my code here #11096669 . how do you protect from not to process to updates for one instrument at the same time? you have to use lock for that. Or you have per-instrument-query?Agribusiness
I mostly run infinite loops - alternatively I use a thread pool, dependson usage. BlockingCOllection is highly inefficient, I have my own corcular buffer classes that use a non locking approach. I also write the API to avoid any locking if possible, for example the target thread may take 10 items out of the queue in one call, then loop over them. Also, according to your other code, I use a PROFILER to FIND OUT WHAT IS WRONG. YOu should have 3 of those servers anyway (double redundancy), so put up one with profilin tools, finished.Imamate
thanks. one more question if you have some time :) Why are you talking that BlockingCollection is inefficient? It takes about 10 µs to add element to collection and to receive it in different thread. is that too much?Agribusiness
I manage to take items out of my queue with basically very few instructions, just moving pointers around. If you do HFT, you fight for every cycle, seriously. Your scenario should normally NOT Have locks and not require waiting (properly scale the input queue for that). In this case a Spinlock has pretty much ZERO overhead, with the exception of sadly the mandatory memory barrier for memory synchronization (which sucks, but Intel wont fix that for another 2-3 generations until they introduce transactional memory bus). You DO NOT WANT LOCKING AND BLOCKING -context switches are slow.Imamate
I see. That's ammazing. How much totaly do you spend µs from when order update is received and all actions are performed? I guess I a lot behind you :)Agribusiness
Not really. We do trading stuff here, but we stay away from HFT for a good reason - it is a race to the bottom. I mostly need performane because i need to filter out the quotes we do NOT process from a 5 ewxchange feed fast, and some people pay me for stuff like that. We mostly do "real" trading softare - i.e. take market positions, directional.Imamate
I see. and one more question, by SpinLock do you mean System.Threading.SpinLock? are you using them because they has "zero overhead"?Agribusiness
They are extremely efficient when they do NOT block, and when you design the queues right, that is pretty much always the case - in this case they "spin" the thread reading a variable and do NOT force a thread context switch. Linux does a lot of that stuff - lock free kernel etc.Imamate

© 2022 - 2024 — McMap. All rights reserved.