Influxdb for a financial application
Asked Answered
C

2

6

I'm migrating my financial analysis application data from MongoDB to InfluxDB because the data and the analysis is growing exponentially.

My current scenario is:

1) Get the tick every second from the exchanges and store it in a measurement called 'tick';

2) Have a continuous query running every 10 seconds grouping this 'tick' data by minute into a measurement called 'ohlc' (candlestick data);

And here's come my doubts.. When i was using Mongo as my database, in the moment that i get the ticks i already transform it in candlestick data and calculate some indicators (MACD, EMA, BB, RSI) and store it.

I see that InfluxDB has Kapacitor as it data processor, there's a way to write some scripts in Kapacitor to calculate this indicators or should i stream the data to NodeJS and calculate it myself?

If i have to stream the data, what is the best practices to do it?

Clipping answered 23/5, 2018 at 15:58 Comment(0)
P
2

There are a few options when you're using InfluxDB. With Kapacitor, you can incorporate user-defined functions in any language that has protocol buffer support or you can write a TICKscript to do the data transformation.

You can also use the Continuous Queries feature of the database, although they can sometimes be expensive queries depending on the queries and the intervals.

If you want to write your own function in NodeJS, you basically just write some code that listens on a unix domain socket, Kapacitor connects to that socket, and data can then be written over that socket connection (full docs here).

If you want to write a TICKscript, here are a couple examples:

// {alert_name}

// metric: {alert_metric}
// available_fields: [[other_telegraf_fields]]

// TELEGRAF CONFIGURATION
// [inputs.{plugin}]
//   # full configuration

// DEFINE: kapacitor define {alert_name} -type batch -tick 
//{plugin}/{alert_name}.tick -dbrp telegraf.autogen
// ENABLE: kapacitor enable {alert_name}

// Parameters
var info = {info_level} 
var warn = {warn_level}
var crit = {crit_level}
var infoSig = 2.5
var warnSig = 3
var critSig = 3.5
var period = 10s
var every = 10s

// Dataframe
var data = stream
  |from()
    .database('telegraf')
    .retentionPolicy('autogen')
    .measurement({plugin})
    .groupBy('host')
  |window()
    .period(period)
    .every(every)
  |mean({alert_metric})
    .as("stat")

// Thresholds
var alert = data
  |eval(lambda: sigma("stat"))
    .as('sigma')
    .keep()
  |alert()
    .id('{{ index .Tags "host"}}/{alert_metric}')
    .message('{{ .ID }}:{{ index .Fields "stat" }}')
    .info(lambda: "stat" > info OR "sigma" > infoSig)
    .warn(lambda: "stat" > warn OR "sigma" > warnSig)
    .crit(lambda: "stat" > crit OR "sigma" > critSig)

// Alert
alert
  .log('/tmp/{alert_name}_log.txt')

I hope that helps!

Philanthropic answered 30/5, 2018 at 18:57 Comment(0)
C
2

Q: InfluxDB has Kapacitor as its data processor which is operated through writing tick scripts comparing it to writing a simple NodeJS application, doing the calculation there and writing the results back to influxdb. Which is better?

A: Depends.

It all boils down to how complicated the calculation is expected to be, how much data and are you adventurous enough to learn tick script.

In short, Kapacitor is definitely the way to go as it is designed to handle complicated calculation, with scale. Its downside is that;

  1. tick script has steep learning curve
  2. it is still a relatively new technology, if your calculation involves something fancy which Kapacitor don't support then you will have to build your own UDF.
  3. higher chance of hitting unknown bugs

When you use Kapacitor you are basically making use of its pipeline style framework for data processing. What is this "pipeline" style thingy? I won't go too deeply into it but in short, each node you define in your tick script is join-up as a sequential chain of data processing nodes. During execution, data will simultaneously flow through individual station in a non-stopping fashion (for most nodes) to get stuff done.

This framework is also the reason why kapacitor is so fast.

NodeJS on the other hand. If you are already familiar with it then basically zero time investment into learning it. Javascript is pretty easy. Plenty of references unlike Tick script.

The most disadvantageous about NodeJS is that Javascript is Single threaded. That is, at one time only one data point or 1 data bucket can be processed. If your calculation involves some expensive computational procedure then NodeJS is not recommended.

However, if the calculation is a straight-forward single step kind E.g. if X is True then: write back to influxdb as Y then you should be fine.

NodeJS or not. I think for a start if your calculation is expected to be simple then a quick and dirty Javascript should do it. Also save your time from having tick script headaches. But mind you, if you are intending to do some crazy calculation in the later stage, your NodeJS application may not scale well. You might end up going back to Kapacitor.

Cancer answered 1/6, 2018 at 14:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.