Database design for large amounts of data
Asked Answered
G

3

6

I would like to store stock trading data for 1000 symbols. The data is actually converted from text files so there is no need for inserts and updates; only read-only access will be required.

The data is basically grouped like this: each symbol has many records: {timestamp, price, quantity}, each record represents a trade.

An approximate upperbound of data for one symbol is 5 records/second, 8 hours for each working day, i.e. 5x60x60x8 = 144K per day. I.e. 1K symbols would generate 144M records per day.

Most of operations over the data would be something like:

  • give me all records for a symbol for the period Date D1, Time T1 to Date D2, Time T2
  • find an min/max/avg of price or quantity for the period [D1, T1...D2, T2]

Now the question: what would be the best design for a database in this case?

  • Can I store all trades for symbol in a single table? Tables would quickly grow too big in this case though.
  • Shall I create a separate table per day/week/month? I.e. 2013-10-25_ABC (ABC - symbol name). In this case we may get 1K new tables per day/week/month.
  • Or, may be plain text files would be enough in such case? E.g., having all symbols data as files under 2013-10-15 folder, resulting in 1K files in each folder

The database may be either MS SQL or MySQL. The total time period - up to 5 years. Thank you!

Germangermana answered 29/10, 2013 at 10:34 Comment(0)
J
4

Third option is the best 1. You need high Read performance with almost negligible writes.

You requirements are best suited for NoSql databases. Single table with no relationships; MySQL would be overkill. More info --> NoSql Databases

Jemena answered 29/10, 2013 at 10:45 Comment(0)
S
4

That's a whole lot of data. Do look at NoSQl.

Using SQL, here are some basic ideas:

Put all price data in a table, using as small data types as possible. Use a SymbolId (int) to reference the symbol, the smallest datetime type needed, the smallest monetary type needed.

Do denormalize. Make a second table with min/max/avg per day and SymbolId.

Research horizontal partitioning and use indexes.

Specialistic answered 31/10, 2013 at 20:13 Comment(0)
A
3

Since you'll be running queries from one datetime to another I wouldn't split tables up at all. Instead, learn more about sharding. Below is the schema I would use:

symbols
    id          varchar(6) // MSFT, GOOG, etc.
    name        varchar(50) // Microsoft, Google, etc.
    ...

trades
    id              unsigned bigint(P)
    symbol_id       varchar(6)(F symbols.id)
    qwhen           datetime
    price           double
    quantity        double
    ...
Annuitant answered 29/10, 2013 at 16:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.