Does Crossfilter require a flat data structure?
Asked Answered
L

1

10

All the examples of Crossfilter I've found use a flat structure like this:

[
  { name: “Rusty”,  type: “human”, legs: 2 },
  { name: “Alex”,   type: “human”, legs: 2 },
  ...
  { name: “Fiona”,  type: “plant”, legs: 0 }
]

or

"date","open","high","low","close","volume","oi" 11/01/1985,115.48,116.78,115.48,116.28,900900,0 11/04/1985,116.28,117.07,115.82,116.04,753400,0 11/05/1985,116.04,116.57,115.88,116.44,876800,0

I have hundreds of MBs of flat files I process to yield a 1-2MB JSON object with a structure roughly like:

{
  "meta": {"stuff": "here"},
  "data": {
    "accountName": {
      // rolled up by week
      "2013-05-20": {
        // any of several "dimensions"
        "byDay": {
          "2013-05-26": {
            "values": {
              "thing1": 1,
              "thing2": 2,
              "etc": 3
            }
          },
          "2013-05-27": {
            "values": {
              "thing1": 4,
              "thing2": 5,
              "etc": 6
            }
          }
          // and so on for day
        },
        "bySource": {
          "sourceA": {
            "values": {
              "thing1": 2,
              "thing2": 6,
              "etc": 7
            }
          },
          "sourceB": {
            "values": {
              "thing1": 3,
              "thing2": 1,
              "etc": 2
            }
          }
        }
      }
    }
  }
}

Which I'd like to display as a table like:

Group: byDay* || bySource || byWhatever

           | thing1 | thing2 | etc
2013-05-26 |      1 |      2 |   2
2013-05-27 |      4 |      5 |   7

or:

Group: byDay || bySource* || byWhatever

           | thing1 | thing2 | etc
sourceA    |      2 |      6 |   6
sourceB    |      3 |      1 |   3

Flattening this JSON structure would be difficult and yield a very large object.

I'd love to take advantage of Crossfilter's wonderful features, but I'm unsure if it's possible.

Is it possible for me to define/explain my current structure to Crossfilter? Perhaps there's another way I could approach this? I'll readily admit that I don't have a good grasp on dimensions and many other key Crossfilter concepts.

Lamprey answered 14/6, 2013 at 18:14 Comment(0)
L
6

Crossfilter works on an array of records, with each element of the array being mapped to one or more values via dimensions (which are defined using accessor functions).

Even if your data contains aggregate results, you can use this with Crossfilter, but note that it's technically impossible to combine data that has been aggregated across different dimensions, such as combining the "by day" and "by source" data in your example above. You could create a Crossfilter for each aggregated dimension, e.g. one for "by day", and run queries and groups on this, but I'm not sure how useful that would be compared with what you already have.

As for memory usage, are you sure flattening your flattened structure would really be that problematic? Bear in mind that each record (element of the flattened array) can contain references to strings and other objects in your nested structure, so you wouldn't necessarily use up all that much memory.

Lowercase answered 14/6, 2013 at 21:35 Comment(7)
I've edited my question to show some of views I'd like from the data. I'm not sure how I'd flatten the data structure. It includes rollup/summed values (not raw ones). Would the above example flatten to something like gist.github.com/jfsiii/5786087? Sorry for any notification spam. I'm learning that the comment textfield has different behavior than the question textfield.Lamprey
Your example views are simply tabular forms of your data. Can you give an example of the kinds of queries (groups or filters) you want Crossfilter to show?Lowercase
I've updated the answer to address your question about combining aggregates.Lowercase
Thanks for the clarification on combining aggregated dimensions. Thankfully, I don't need to do that. I'm just looking to group by dimensions like data, source, etc, then, within that view, find the top K clicks or sort by impressions. Are day and source the dimensions in Crossfilter parlance? I don't doubt that I'm asking how to drive this car while facing backwards in the passenger seat. I'm trying to get my bearings but am having a difficult time because my initial data structure is so different from all the examples. Can we chat on IRC/IM? I'll limit the time to whatever you wish.Lamprey
Does crossfilter by default use string references for keys and values? Are my repeated field names essentially all pointed at one instance, or am I well advised to have 1-2 character field names?Tessler
@JasonDavies - I meant to tag you in that last question.Tessler
Perhaps you should post that as a new question…Lowercase

© 2022 - 2024 — McMap. All rights reserved.