rrd tool alternative for high volume

B

4

7

I am interested in knowing if there is any alternative to rrdtool for logging time series data. I am looking at something that can scale for a large number of devices to monitor.

From what I read on this subject, rrdtool is I/O bound when you hit it with large amounts of data. Since I envision this to scale to a very large number of devices to monitor, I am curious if there's any alternative that would not choke on I/O. Preferable SQL based, but not necessarily.

Thanks

Burdock answered 3/3, 2009 at 5:51 Comment(2)

If it's I/O bound, wouldn't that be good? It means you can take a hardware solution, such as RAID, solid-state disks, and multiple machines to track unrelated data? – Pokey 12/12, 2009 at 13:43

my point as well ... the question is just how well is the HW being used ... the rrdcached the use is quite optimal ... a database (at the end of the day) also has to write stuff to disk, but since it is much more general purpose I doubt it will be able todo it as efficiently as rrdtool ... – Galantine 29/11, 2010 at 7:21

G

4

If I/O performance is the main worry then you want to look into something like rrdcached which is available in the current version (1.4) of the RRDTools.

The I/O overhead is not a function of the data being written, after all each value 8 bytes per data source. The I/O bandwidth comes from the fact a whole sector (typically 4k) needs to be read in before being written out. Suddenly to write 8 bytes you have read/written 8k bytes.

The rrdcached coalesces all these write together so when an RRD is updated the ratio of useful data (actual DS values) to wasted data (the spare bytes in the sector) is reduced.

All the RRDTools will automatically work with rrdcached when they detect it running (via an environment variable). This allows them to trigger flushes when needed, for example when generating a graph from the data.

While switching to an SQL based solution may help consider the extra I/O that will be required to support SQL. Considering you don't tend to use RRD data in that sort of random access pattern a database is a bit of a sledgehammer for the problem. While sticking with RRDTool will keep access to all the eco-system of tools that understand and can work with the files, which is useful especially if you are already familiar with it.

Gonad answered 4/2, 2010 at 11:34 Comment(0)

P

5

There are some time series databases which have high availability and/or scalability as goals.

Maybe have a look at

rrdcached, a caching layer on top of rrd
whisper, the database engine behind graphite
opentsdb is a distributed, scalable Time Series Database (TSDB) written on top of HBase
reconnoiter although its focus is more on monitoring

Precritical answered 25/8, 2011 at 13:39 Comment(0)

G

4

If I/O performance is the main worry then you want to look into something like rrdcached which is available in the current version (1.4) of the RRDTools.

The I/O overhead is not a function of the data being written, after all each value 8 bytes per data source. The I/O bandwidth comes from the fact a whole sector (typically 4k) needs to be read in before being written out. Suddenly to write 8 bytes you have read/written 8k bytes.

The rrdcached coalesces all these write together so when an RRD is updated the ratio of useful data (actual DS values) to wasted data (the spare bytes in the sector) is reduced.

All the RRDTools will automatically work with rrdcached when they detect it running (via an environment variable). This allows them to trigger flushes when needed, for example when generating a graph from the data.

While switching to an SQL based solution may help consider the extra I/O that will be required to support SQL. Considering you don't tend to use RRD data in that sort of random access pattern a database is a bit of a sledgehammer for the problem. While sticking with RRDTool will keep access to all the eco-system of tools that understand and can work with the files, which is useful especially if you are already familiar with it.

Gonad answered 4/2, 2010 at 11:34 Comment(0)

O

2

A friend of mine did some work a while ago on a SQL backend to store round robin data: http://rrs.decibel.org

However, I suspect that since you're asking about "devices to monitor", you may be looking for a more complete solution.

Ocampo answered 3/3, 2009 at 6:2 Comment(2)

I found that in my research. I didn't look to be maintained, so I was a bit reluctant in considering it. – Burdock 3/3, 2009 at 6:26

I just found that too, seems like the last update was 2005. Doesn't mean it wouldn't work now, I just didn't take the time to extract the tarball. :-/ – Opulence 11/4, 2011 at 17:15

F

1

If I/O operations per second is your main bottleneck and you're using Linux, there's an easy hack that only costs you memory. Use a tmpfs mount to stage your RRD writes.

All the i/o operations will be done in memory and won't incur any of the bottlenecks found in doing disk i/o (this is even faster than using solid state disks). You can then use a cron job and rsync to copy only changed RRDs to disk once every few minutes.

Create the directories

bash-4.2# mkdir /mnt/rrd-reads
bash-4.2# mkdir /mnt/rrd-writes

Create a 500MB-maximum RAM filesystem with appropriate options

bash-4.2# mount -t tmpfs -o size=500m,mode=0750,uid=collectd,gid=collectd none /mnt/rrd-writes
bash-4.2# echo "none /mnt/rrd-writes tmpfs size=500m,mode=0750,uid=collectd,gid=collectd 1 2" >> /etc/fstab

Copy the old RRD files into the new mount point

bash-4.2# cp -a /var/lib/collectd/rrd/* /mnt/rrd-writes

Configure your rrd-writing application to write to the new mount point

bash-4.2# sed -i -e 's/DataDir "\/var\/lib\/collectd\/rrd"/DataDir "\/mnt\/rrd-writes"/' /etc/collectd/collectd.conf

Set up a cron job to sync only the changed RRDs to disk once every 2 minutes

bash-4.2# echo "*/2 * * * * collectd rsync -a /mnt/rrd-writes/* /mnt/rrd-reads/ ; sync" > /etc/cron.d/rrd-sync

Don't forget to copy your saved RRD files into the mount point before you start your rrd-writing application! You may need to edit the init script for that service to make sure the files are there before it starts. If it starts without the files in place, new bare ones will be created and you'll be very confused once the read directory gets overwritten with empty RRDs.

If at some point you need to resize the tmpfs mount, you can do that on the fly:

bash-4.2# mount -t tmpfs -o remount,size=850m /mnt/rrd-writes

Foray answered 12/7, 2013 at 18:36 Comment(0)

Recommended topics

Hot tags