Is it possible to group by multiple dimensions in crossfilter?
Asked Answered
T

3

8

For Example If we have data for books, authors and date information. Can we build a crossfilter for how many books are present for author per month?

Toxinantitoxin answered 27/5, 2013 at 5:46 Comment(0)
F
25

In pseudo sql terms, what you are trying to do is:

SELECT COUNT(book)
GROUP BY author, month

The way I approach this type of problem is to 'group' the fields together into a single dimension. So in your case I would concatenate the month and author information together, into a dimension.

Let this be our test data:

var cf = crossfilter([
{ date:"1 jan 2014", author: "Mr X", book: "Book 1" },
{ date:"2 jan 2014", author: "Mr X", book: "Book 2" },
{ date:"3 feb 2014", author: "Mr X", book: "Book 3" },
{ date:"1 mar 2014", author: "Mr X", book: "Book 4" },
{ date:"2 apr 2014", author: "Mr X", book: "Book 5" },
{ date:"3 apr 2014", author: "Mr X", book: "Book 6"},
{ date:"1 jan 2014", author: "Ms Y", book: "Book 7" },
{ date:"2 jan 2014", author: "Ms Y", book: "Book 8" },
{ date:"3 jan 2014", author: "Ms Y", book: "Book 9" },
{ date:"1 mar 2014", author: "Ms Y", book: "Book 10" },
{ date:"2 mar 2014", author: "Ms Y", book: "Book 11" },
{ date:"3 mar 2014", author: "Ms Y", book: "Book 12" },
{ date:"4 apr 2014", author: "Ms Y", book: "Book 13" }
]);  

The dimension is defined as follows:

var dimensionMonthAuthor = cf.dimension(function (d) {
  var thisDate = new Date(d.date);
  return 'month='+thisDate.getMonth()+';author='+d.author;
});

And now we can just simply do a reduce count to calculate how many books there are per author, per month (i.e. per dimension unit):

var monthAuthorCount = dimensionMonthAuthor.group().reduceCount(function (d) { return d.book; }).all();

And the results are as follows:

{"key":"month=0;author=Mr X","value":2}
{"key":"month=0;author=Ms Y","value":3}
{"key":"month=1;author=Mr X","value":1}
{"key":"month=2;author=Mr X","value":1}
{"key":"month=2;author=Ms Y","value":3}
{"key":"month=3;author=Mr X","value":2}
{"key":"month=3;author=Ms Y","value":1}
Fernando answered 26/11, 2013 at 15:4 Comment(0)
H
8

I didn't find the accepted answer all that helpful.

I used the following instead.

I first made a keyed group (in your case month)

   var authors = cf.dimension(function (d) {
     return +d['month'];
   })

Next, I used a map reduce method on the keyed dataset to compute the averages

The grouping helper function:

var monthsAvg = authors.group().reduce(reduceAddbooks, reduceRemovebooks, reduceInitialbooks).all();

The map-reduce functions:

function reduceAddbooks(p, v) {
    p.author = v['author'];
    p.books = +v['books'];
    return p;
}

function reduceRemovebooks(p, v) {
    p.author = v['author'];
    p.books = +v['books'];
    return p;
}

function reduceInitialbooks() {
    return {
        author:0,
        books:0
    };
}
Hydrophobic answered 18/1, 2015 at 21:8 Comment(0)
C
5

I want to update an old answer with a new work around described in: https://github.com/dc-js/dc.js/pull/91

This performance hasn't been tested on large data-sets

  var cf = crossfilter([
  { date:"1 jan 2014", author: "Mr X", book: "Book 1" },
  { date:"2 jan 2014", author: "Mr X", book: "Book 2" },
  { date:"3 feb 2014", author: "Mr X", book: "Book 3" },
  { date:"1 mar 2014", author: "Mr X", book: "Book 4" },
  { date:"2 apr 2014", author: "Mr X", book: "Book 5" },
  { date:"3 apr 2014", author: "Mr X", book: "Book 6"},
  { date:"1 jan 2014", author: "Ms Y", book: "Book 7" },
  { date:"2 jan 2014", author: "Ms Y", book: "Book 8" },
  { date:"3 jan 2014", author: "Ms Y", book: "Book 9" },
  { date:"1 mar 2014", author: "Ms Y", book: "Book 10" },
  { date:"2 mar 2014", author: "Ms Y", book: "Book 11" },
  { date:"3 mar 2014", author: "Ms Y", book: "Book 12" },
  { date:"4 apr 2014", author: "Ms Y", book: "Book 13" }
  ]);

  var dimensionMonthAuthor = cf.dimension(function (d) {
    var thisDate = new Date(d.date);
    //stringify() and later, parse() to get keyed objects
    return JSON.stringify ( { date: thisDate.getMonth() , author: d.author } ) ;
  });

  group = dimensionMonthAuthor.group();
  //this forEach method could be very expensive on write.
  group.all().forEach(function(d) {
    //parse the json string created above
    d.key = JSON.parse(d.key);
  });

  return group.all()

Results in:

[ { key: { date: 0, author: 'Mr X' },
    value: 2 },
  { key: { date: 0, author: 'Ms Y' },
    value: 3 },
  { key: { date: 1, author: 'Mr X' },
    value: 1 },
  { key: { date: 2, author: 'Mr X' },
    value: 1 },
  { key: { date: 2, author: 'Ms Y' },
    value: 3 },
  { key: { date: 3, author: 'Mr X' },
    value: 2 },
  { key: { date: 3, author: 'Ms Y' },
    value: 1 } ]
Candescent answered 21/3, 2016 at 16:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.