What are some good data structures to store a large orderbook?
Asked Answered
K

1

6

I'm writing a Bitcoin trader app that is fetching orderbooks from exchanges. A typical orderbook looks like this: https://www.bitstamp.net/api/order_book/ (it has two parts, "bids" and "asks", they should be stored separately but in identical data structures). One solution would be to store only part of this large orderbook, that would solve the access efficiency problem, but it introduces a set of other problems that have to do with consistency and update restrictions. So for now, it seems that a better solution is to fetch an orderbook and keep updating it.

Now this trader app later updates this fetched orderbook with new orders and with removed orders. For example, if you have an order at $900 to buy 1.5BTC in an orderbook, it may be cancelled completely or it may be updated to either contain more or less BTC. Also, a new order may be added below or above that price.

There are two critical operations:

  1. quickly find an order with exactly the same price (in the case of an update or cancelling)

  2. quickly find an order with the price closest to the one provided, but below it

In the case of an update we may not actually know it is an update, so we may start doing (2) and end up doing (1).

I'm not an expert in data structures, so I started looking through most common ones and for now I have a sense it should be some kind of tree, but I'm not sure which one. My most uneducated guess would be a data structure in which each node is a digit in a price, so, for example, to quickly find all nodes with the price of $900 we do items['9']['0'] and then look for leaf nodes. It's still a mess in my head for now, so please don't judge me too harsh. Any advice would be great.

Kummerbund answered 19/1, 2014 at 20:19 Comment(2)
How large is large? Thousands of entries? Millions? Billions?Octane
Thousands, a typical orderbook is currently 4-5k entries. But remember, we're talking browser javascript performance here, as this is what I'm writing in.Kummerbund
H
5

It sounds like you want a simple binary search tree (BST): (well, a self-balancing one)

A binary search tree (BST) is a node-based binary tree data structure which has the following properties:

  • The left subtree of a node contains only nodes with keys less than the node's key.
  • The right subtree of a node contains only nodes with keys greater than the node's key.
  • The left and right subtree each must also be a binary search tree.
  • There must be no duplicate nodes (an easy constraint to get around though, if need be).

A BST allows you to efficiently do both of your operations - to find an element matching some value, or one where the value is closest, but smaller.

The running time of both of these operations are O(log n), and, more specifically, the number of comparisons are quite close to log2n, which is around 12 for n = 5000, which is pretty much nothing (and there's a bit of work to rebalance the tree, but that should be a similar amount of work).

Haulage answered 19/1, 2014 at 23:29 Comment(1)
Fitting into memory is not a problem. Insertion and search time are.Kummerbund

© 2022 - 2024 — McMap. All rights reserved.