Optimising a group of dc.js line graphs
Asked Answered
H

1

8

I have a group of graphs visualizing a bunch of data for me (here), based off a csv with approximately 25,000 lines of data, each having 12 parameters. However, doing any interaction (such as selecting a range with the brush on any of the graphs) is slow and unwieldy, completely unlike the dc.js demo found here, which deals with thousands of records as well but maintains smooth animations, or crossfilter's demo here which has 10 times as many records (flights) as I do.

I know the main resource hogs are the two line charts, since they have data points every 15 minutes for about 8 solid months. Removing either of them makes the charts responsive again, but they're the main feature of the visualizations, so is there any way I can make them show less fine-grained data?

The code for the two line graphs specifically is below:

        var lineZoomGraph = dc.lineChart("#chart-line-zoom")
            .width(1100)
            .height(60)
            .margins({top: 0, right: 50, bottom: 20, left: 40})
            .dimension(dateDim)
            .group(tempGroup)
            .x(d3.time.scale().domain([minDate,maxDate]));

        var tempLineGraph = dc.lineChart("#chart-line-tempPer15Min")
            .width(1100).height(240)
            .dimension(dateDim)
            .group(tempGroup)
            .mouseZoomable(true)
            .rangeChart(lineZoomGraph)
            .brushOn(false)
            .x(d3.time.scale().domain([minDate,maxDate])); 

Separate but relevant question; how do I modify the y-axis on the line charts? By default they don't encompass the highest and lowest values found in the dataset, which seems odd.

Edit: some code I wrote to try to solve the problem:

var graphWidth = 1100;
var dataPerPixel = data.length / graphWidth;

var tempGroup = dateDim.group().reduceSum(function(d) {
    if (d.pointNumber % Math.ceil(dataPerPixel) === 0) {
        return d.warmth;
    }
});

d.pointNumber is a unique point ID for each data point, cumulative from 0 to 22 thousand ish. Now however the line graph shows up blank. I checked the group's data using tempGroup.all() and now every 21st data point has a temperature value, but all the others have NaN. I haven't succeeded in reducing the group size at all; it's still at 22 thousand or so. I wonder if this is the right approach...

Edit 2: found a different approach. I create the tempGroup normally but then create another group which filters the existing tempGroup even more.

var tempGroup = dateDim.group().reduceSum(function(d) { return d.warmth; });
    var filteredTempGroup = {
        all: function () {
            return tempGroup.top(Infinity).filter( function (d) { 
                if (d.pointNumber % Math.ceil(dataPerPixel) === 0) return d.value;
            } );
        }
    };

The problem I have here is that d.pointNumber isn't accessible so I can't tell if it's the Nth data point (or a multiple of that). If I assign it to a var it'll just be a fixed value anyway, so I'm not sure how to get around that...

Holleyholli answered 12/4, 2015 at 20:39 Comment(11)
The "fake group" approach in your second edit seems reasonable. Since your data is probably in date order anyway (?), the index should be pretty much the same as the pointNumber, so adding a parameter to your filter callback function should give you an index you can use: .filter( function (d, i) { return (i % Math.ceil(dataPerPixel) === 0); } ). Also note that the filter callback function should return a boolean not a value.Holloway
OK, so that works, kind of. I get a much more manageable 1071 results, but the results are also out of order, which confuses me. If you look at the live website now you'll see what I mean. The group's objects start off correctly at the first few data points, then jump ahead a few days and then jump back... so the points are fine, just somehow disordered.Holleyholli
Um, yeah, you probably want to use .all() instead of .top(Infinity), for obvious reasons. Missed that.Holloway
That did the trick, excellent :) What exactly is the reason behind that? Furthermore, I'd like the resolution change to adapt to the zoom level because right now when I zoom in there's not enough points and it looks a bit blocky... Is there a way to know how many data points are currently being shown in the graph at any time/zoom level?Holleyholli
.all() is sorted on the key, and .top() on the value. You yourself are defining the number of data points that dc.js sees, but you might use chart.x().range() and the number of observations per unit of time to figure out how many data points there are to sample from.Holloway
I'd need to do that whenever a zoom event is triggered by the smaller graph, which isn't exposed to me using the dc library. Using xAxisMax - xAxisMin gives me a time range in seconds is perfect, but to use that properly in a calculation I need a zoom event function...Holleyholli
I think .on('preRedraw', function() { ...}) should work. github.com/dc-js/dc.js/blob/master/web/docs/…Holloway
That it does. Two issues... one is that xAxisMax - xAxisMin gives a constant result regardless of chart zoom level (and so does chart.xAxisLength and chart.x().range()). Also once I have a ratio I want to use, I essentially have to redefine the filteredTempGroup that I have, which I'm not sure is possible on the fly...Holleyholli
I'm sorry, I meant .domain() - i.e. the seconds input, whereas .range() is the pixels output. Notice that filteredTempGroup is always generated dynamically (on pull), so it should be fine to return different data every time it is called.Holloway
You'll have to calculate on the two date objects in the domain array and write to a variable with scope that filteredTempGroup can access.Holloway
OK, so I have something that works now, but it's still a little sluggish since 5 times every time I zoom it's recalculating a new datapoint/pixel ratio. Is there a reason the preRedraw event is called 5 times every time I do a zoom? In theory though, this has been solved.Holleyholli
H
4

When dealing with performance problems with d3-based charts, the usual culprit is the number of DOM elements, not the size of the data. Notice the crossfilter demo has lots of rows of data, but only a couple hundred bars.

It looks like you might be attempting to plot all the points instead of aggregating them. I guess since you are doing a time series it may be unintuitive to aggregate the points, but consider that your plot can only display 1100 points (the width), so it is pointless to overwork the SVG engine plotting 25,000.

I'd suggest bringing it down to somewhere between 100-1000 bins, e.g. by averaging each day:

var daysDim = data.dimension(function(d) { return d3.time.day(d.time); });

function reduceAddAvg(attr) {
  return function(p,v) {
    if (_.isLegitNumber(v[attr])) {
      ++p.count
      p.sums += v[attr];
      p.averages = (p.count === 0) ? 0 : p.sums/p.count; // gaurd against dividing by zero
    }
    return p;
  };
}
function reduceRemoveAvg(attr) {
  return function(p,v) {
    if (_.isLegitNumber(v[attr])) {
      --p.count
      p.sums -= v[attr];
      p.averages = (p.count === 0) ? 0 : p.sums/p.count;
    }
    return p;
  };
}
function reduceInitAvg() {
  return {count:0, sums:0, averages:0};
}
...
// average a parameter (column) named "param" 
var daysGroup = dim.group().reduce(reduceAddAvg('param'), reduceRemoveAvg('param'), reduceInitAvg);

(reusable average reduce functions from the FAQ)

Then specify your xUnits to match, and use elasticY to auto-calculate the y axis:

chart.xUnits(d3.time.days)
   .elasticY(true)
Holloway answered 13/4, 2015 at 15:41 Comment(3)
Thanks for the suggestion... I went in search of other people using resampling functions on line graphs in d3 but there's no concrete examples out there, which I find strange since someone must've had my problem at some point... I attempted to solve it myself with a function which I've put in an edit in my original answer, but the problem is that it doesn't get rid of the original data, only makes every N data point have a value and the rest is NaN.Holleyholli
did you use the reduce functions Gordon suggested?Sparker
Sampling is really a data technique rather than a charting technique, if that helps your googling. Your attempt looked mostly okay to me but I haven't tried it. Personally, I'd stick with averages so as not to lose data, since this amount of data should be fine in JavaScriptHolloway

© 2022 - 2024 — McMap. All rights reserved.