Remove data from RRDTool
Asked Answered
G

4

5

I have several graphs created by RRDTool that collected bad data during a time period of a couple hours.

How can I remove the data from the RRD's during that time period so that it no longer displays?

Goode answered 24/4, 2012 at 13:8 Comment(0)
G
12

Best method I found to do this...

  1. Use RRDTool Dump to export RRD files to XML.
  2. Open the XML file, find and edit the bad data.
  3. Restore the RRD file using RRDTool Restore .
Goode answered 25/4, 2012 at 14:11 Comment(0)
R
4

I had a similar problem where I wanted to discard the most recent few hours from my RRDtool databases, so I wrote a quick script to do it (apologies for the unconventional variable names - coding style inherited from work, sigh):

#!/usr/bin/env python2                                                                                                                                                                                 
"""                                                                                                                                                                                                    
Modify XML data generated by `rrdtool dump` such that the last update was at                                                                                                                           
the unixtime specified (decimal). Data newer than this is simply omitted.                                                                                                                              

Sample usage::                                                                                                                                                                                         

    rrdtool dump foo.rrd \
       | python remove_samples_newer_than.py 1414782122 \
       | rrdtool restore - foo_trimmed.rrd                                                                                          
"""                                                                                                                                                                                                    

import sys                                                                                                                                                                                             

assert sys.argv[1:], "Must specify maximum Unix timestamp in decimal"                                                                                                                                  

iMaxUpdate = int(sys.argv[1])

for rLine in iter(sys.stdin.readline, ''):                                                                                                                                                             
    if "<lastupdate>" in rLine:                                                                                                                                                                        
        # <lastupdate>1414782122</lastupdate> <!-- 2014-10-31 19:02:02 GMT -->                                                                                                                         
        _, _, rData = rLine.partition("<lastupdate>")                                                                                                                                                  
        rData, _, _ = rData.partition("</lastupdate")                                                                                                                                                  
        iLastUpdate = int(rData)                                                                                                                                                                       
        assert iLastUpdate < iMaxUpdate, "Last update in RRD older than " \                                                                                                                            
                                    "the time you provided, nothing to do"                                                                                                                             
        print "<lastupdate>{0}</lastupdate>".format(iMaxUpdate)                                                                                                                                        
    elif "<row>" in rLine:                                                                                                                                                                             
        # <!-- 2014-10-17 20:04:00 BST / 1413572640 --> <row><v>9.8244774011e+01</v><v>8.5748587571e-01</v><v>4.2046610169e+00</v><v>9.3016101695e+01</v><v>5.0000000000e-02</v><v>1.6652542373e-01</  v><v>1.1757062147e+00</v><v>1.6901226735e+10</v><v>4.2023108608e+09</v><v>2.1457537707e+08</v><v>3.9597816832e+09</v><v>6.8812800000e+05</v><v>3.0433198080e+09</v><v>6.0198912250e+06</v><v>2.        0000000000e+00</v><v>0.0000000000e+00</v></row>                                                                                                                                                        
        rData, _, _ = rLine.partition("<row>")                                                                                                                                                         
        _, _, rData = rData.partition("/")                                                                                                                                                             
        rData, _, _ = rData.partition("--")                                                                                                                                                            
        rData = rData.strip()                                                                                                                                                                          
        iUpdate = int(rData)                                                                                                                                                                           
        if iUpdate < iMaxUpdate:                                                                                                                                                                       
            print rLine,                                                                                                                                                                               
    else:                                                                                                                                                                                              
        print rLine,                                                                                                                                                                                   

Worked for me. Hope it helps someone else.

Raynaraynah answered 31/10, 2014 at 19:35 Comment(0)
K
2

If you want to avoid writing and editing of xml file as this may takes few file IO calls(based on how much bad data you have) , you can also read entire rrd into memory using fetch and update values in-memory.

I did similar task using python + rrdtool and i ended up doing :

  1. read rrd in-memory in a dictionary
  2. fix values in the dictionary
  3. delete existing rrd file
  4. create new rrd with same name.
Ki answered 1/10, 2012 at 19:11 Comment(1)
If you open sourced your solution, I bet you'd help out a lot of people!Goode
E
0

The only who proposed, what exactly to edit, was RobM. I tried his solution, and it did not work for me in rrdtool 1.4.7

My database uses AVERAGE, MAX and MIN. It contains DERIVE, GAUGE and COMPUTED. Intervals: second (70), minute (70), hour (25), day (367). My task: delete some last part (typical reason: clock moved back).

I applied RobM's solution: change to my new end time, delete all after it. Restored database seemed to be normal. But it did not accept new additions. I examined a newly created empty database. And I found in it 70 second records with NaN, same for minute and hour.

So, my working solution - if I delete records in some period end, I add the same number of NaN records in this period beginning, with correctly decreasing times. Exception - daily records, they are only deleted without addition. If period becomes empty after deletes, I fill it with NaN records ending to my new end time (rounded to the period boundary).

Estaminet answered 28/5, 2015 at 23:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.