Reduce timeseries array and aggregate data
Asked Answered
B

4

6

I see many posts/questions addressing this issue, so I assume it is not trivial. I am quite a beginner, looking for a more elegant solution.

I need to reduce this kind of array containing 1-minute detailed data into 5-minute data. Just computing the sum of consecutive values for 5 minutes, and then recreating a shorter array. Then the timestamp "created_at" should be the timepoint of the end of the 5-minute period.

let array = [
    { steps: 40, created_at: '2022-09-03T11:36:00.000Z' },
    { steps: 13, created_at: '2022-09-03T11:37:00.000Z' },
    { steps: 40, created_at: '2022-09-03T11:38:00.000Z' },
    { steps: 40, created_at: '2022-09-03T11:39:00.000Z' },
    { steps: 34, created_at: '2022-09-03T11:40:00.000Z' },
    { steps: 86, created_at: '2022-09-03T11:41:00.000Z' },
    { steps: 23, created_at: '2022-09-03T11:42:00.000Z' },
    { steps: 78, created_at: '2022-09-03T11:43:00.000Z' },
    { steps: 67, created_at: '2022-09-03T11:44:00.000Z' },
    { steps: 80, created_at: '2022-09-03T11:45:00.000Z' },
    { steps: 34, created_at: '2022-09-03T11:46:00.000Z' },
    { steps: 64, created_at: '2022-09-03T11:47:00.000Z' },
    { steps: 32, created_at: '2022-09-03T11:48:00.000Z' },
    { steps: 78, created_at: '2022-09-03T11:49:00.000Z' },
    { steps: 45, created_at: '2022-09-03T11:50:00.000Z' }
    ]

My solution is too complex I think:

const moment = require(`moment`);
const newArray = array.map(
    (item)=> {       
      const timestamp = moment(item.created_at).valueOf();
      const timeStampgroup =  Math.ceil((timestamp)/300000);
      return {...item, timeStampgroup: timeStampgroup}    
        }
    );
//console.log(newArray);  

const reducedArray = Array.from(newArray.reduce(
    (m, {timeStampgroup: timeStampgroup, steps}) => m.set(timeStampgroup, (m.get(timeStampgroup) || 0) + steps), new Map
  ), ([timeStampgroup, steps]) => ({timeStampgroup, steps}));
//console.log(reducedArray);

const result = reducedArray.map(entry => ({steps : entry.steps, created_at : moment(entry.timeStampgroup*300000).toISOString()}));
console.log(result);

[
 { steps: 167, created_at: '2022-09-03T11:40:00.000Z' },
 { steps: 334, created_at: '2022-09-03T11:45:00.000Z' },
 { steps: 253, created_at: '2022-09-03T11:50:00.000Z' }
]

Does anyone see a less complicated way to achieve the same result, in one pass maybe ?

Thanks a lot !

Lorenzo

Blacktail answered 3/9, 2022 at 17:23 Comment(1)
What should happen if the last element has created_at something like 2022-09-03T11:53:00.000Z. In other words, where should the steps for minutes 51, 52, 53 be displayed? in one pass maybe ? --> Assume this means with just one loop. Yes, it is possible with just one loop (one such solution using .reduce() has been posted below)Joanjoana
F
1

Assuming there are no gaps in the sequence, and that the array is sorted, you won't get a whole lot faster than this:

const results = [];

for (let i = 0; i < array.length; i += 5) {
    const created_at = array[i].created_at;
    const steps = array.slice(i, i + 5).reduce((acc, item) => acc + item.steps, 0);
    results.push({ created_at, steps });
}

Update:

This might actually be faster than my initial answer, I haven't given it full thought, but it's certainly faster than using Date or moment.js. More importantly, it's now correct (see update 2).

const groups = new Map();

for (let i = 0; i < array.length; i++) {
    const { created_at, steps } = array[i];
    const ten_min = created_at.slice(0, 15);
    const five_min = created_at.slice(15, 16) >= '5' ? '5:00.000Z' : '0:00.000Z'
    const group_id = ten_min + five_min;
    if (groups.has(group_id)) {
        groups.set(group_id, groups.get(group_id) + steps)
    } else {
        groups.set(group_id, steps)
    }
}

const aggregated = Array.from(groups, ([created_at, steps]) => ({ created_at, steps }));

Update 2:

Rounding up the interval is a gotcha I didn't initially think of, and it requires you encode a lot more knowledge of dates. I know you wanted eloquence, but others have already answered with concise solutions, so I just updated my answer's approach to make it correct while trying to stay performant, for posterity.

function NextInterval(iso_ts) {

    // changing date is a complicated edge case, 
    // best not try figure this one out with strings.
    // could be further optimsed to only create a Date for times >= 23:55,
    // but that requires a bit more string manipulation below.
    const min = iso_ts.slice(14, 16);
    if (min > '55') {
        let item = new Date(iso_ts);
        item.setMinutes(60, 0, 0); // next hour (also handles date changes)
        return item.toISOString();
    }

    const single_min = iso_ts.slice(15, 16)
    if (single_min > '5') {
        const next_ten = parseInt(iso_ts.slice(14, 15)) + 1;
        return iso_ts.slice(0, 14) + next_ten + '0:00.000Z'
    }

    // we're group upto and including
    if (single_min === '0') {
        return iso_ts
    }

    return iso_ts.slice(0, 15) + '5:00.000Z'
}

function Aggregate(items) {
    const groups = new Map();

    for (let i = 0; i < items.length; i++) {
        const { created_at, steps } = items[i];
        const group_id = NextInterval(created_at)
        if (groups.has(group_id)) {
            groups.set(group_id, groups.get(group_id) + steps)
        } else {
            groups.set(group_id, steps)
        }
    }
    
    // converted by to an array, but Maps are iterable too...
    return Array.from(groups, ([created_at, steps]) => ({ created_at, steps }));
}

Featured answered 3/9, 2022 at 19:6 Comment(6)
If you look close enough, the timestamp in my resulting array is 11:40, 11:45, 11:50, e.g. the end of the 5 min period, when the steps are counted. Your solution returns 11:36, 11:41, 11:46, or the beginning of the 5 min period. The original array will always be sorted, but I cannot exclude a missing value now and then, even if the original data should be completed with zeros for missing step values. That could be an issue, and I think my use of moment to regroup the available values in 5 min "timeStampgroups" was my way to solve it.Blacktail
@LorenzoSandini, I think your counts are wrong, I get the same counts in my version at 36, 41, and 46 minutes, as you do at 40, 45, 50. Anyway, I'll update with a better version that groups them into rounded 5 minute intervals.Featured
Your count at min 36 should be identical to my count at min 40, since it's the total of 5 values (36+37+38+39+40). You "labeled" it according to the first value, and I labeled it according to the last... but thanks anyway for your help, much appreciated !Blacktail
Ah yes, if course, that makes much more sense. I didn't even think about which side of the interval it was. I'll reverse the loop and that should do the trick. Will update when back at my computerFeatured
@LorenzoSandini when I last commented I thought I had a way to avoid introducing knowledge of time beyond minutes, but unfortunately that's just not possible. I updated my answer to keep the same performant approach, as this might be useful for processing lots of data, but I know it's not exactly eloquent.Featured
I had done a bit of slicing myself, but couldn't get around the change of date and ended up using moment. Working without dependencies surely is a plus.Blacktail
J
1

Presented below is one possible way to achieve the desired objective.

Code Snippet

const myTransform = arr => (
  arr.reduce(
    (acc, {steps: crst, created_at: crtd}, idx) => {
      const dgt = +(crtd?.[15] || 0);
      acc.cnt += +crst;
      if (
        [0, 5].includes(dgt) ||
        idx === arr.length - 1
      ) {
        acc.resArr.push({
          steps: acc.cnt, created_at: crtd
        });
        acc.cnt = 0;
      }
      return acc;
    },
    {cnt: 0, resArr: []}
  )?.resArr
);

/*
// Code Explanation
// Method to transform array as required
const myTransform = arr => (      // "arr" is the arg/param array
  arr.reduce(             // use ".reduce()" to iterate over arr
    // "acc" -> accumulator (is an object with 2 props "cnt" and "resArr")
    (acc, {steps: crst, created_at: crtd}, idx) => {
    // destructure iterator & rename props
    // steps -> crst (current-steps)  | created_at -> crtd (current-create-dt)
      
      // extract the 16th elt of date-time-stamp (this is the units position of the minute)
      const dgt = +(crtd?.[15] || 0);

      // accumulate crst to accumulator's count variable
      acc.cnt += +crst;
      
      // if current element's timestamp is minute-5 or minute-10 - OR -
      // the current elt being processed is the last
      if (
        [0, 5].includes(dgt) ||
        idx === arr.length - 1 
      ) {
      
        // push current count & create-dt to "resArr"
        acc.resArr.push({
          steps: acc.cnt, created_at: crtd
        });
        
        // reset the accumulator's count back to zero
        // (so it begins to count for the next 5 minute interval)
        acc.cnt = 0;
        
      }
      
      // always return "acc"
      return acc;
    },

    // initially set the "acc" as an object with 2 props below
    {cnt: 0, resArr: []}
    
  )?.resArr       // extract only the result-array
);
*/

const myArray = [{
    steps: 40,
    created_at: '2022-09-03T11:36:00.000Z'
  },
  {
    steps: 13,
    created_at: '2022-09-03T11:37:00.000Z'
  },
  {
    steps: 40,
    created_at: '2022-09-03T11:38:00.000Z'
  },
  {
    steps: 40,
    created_at: '2022-09-03T11:39:00.000Z'
  },
  {
    steps: 34,
    created_at: '2022-09-03T11:40:00.000Z'
  },
  {
    steps: 86,
    created_at: '2022-09-03T11:41:00.000Z'
  },
  {
    steps: 23,
    created_at: '2022-09-03T11:42:00.000Z'
  },
  {
    steps: 78,
    created_at: '2022-09-03T11:43:00.000Z'
  },
  {
    steps: 67,
    created_at: '2022-09-03T11:44:00.000Z'
  },
  {
    steps: 80,
    created_at: '2022-09-03T11:45:00.000Z'
  },
  {
    steps: 34,
    created_at: '2022-09-03T11:46:00.000Z'
  },
  {
    steps: 64,
    created_at: '2022-09-03T11:47:00.000Z'
  },
  {
    steps: 32,
    created_at: '2022-09-03T11:48:00.000Z'
  },
  {
    steps: 78,
    created_at: '2022-09-03T11:49:00.000Z'
  },
  {
    steps: 45,
    created_at: '2022-09-03T11:50:00.000Z'
  }
];

console.log(
  'transforming to 5 minute intervals...\n',
  myTransform(myArray)
);
.as-console-wrapper { max-height: 100% !important; top: 0 }

Explanation

Inline comments added to the snippet above.

Joanjoana answered 4/9, 2022 at 15:21 Comment(4)
If there were array-elts for minutes 51, 52, 53 (ie, the last elt is not minute 50, but min 53), then the solution will accumulate the steps from 51, 52, 53 and show it as minute 53. We can easily make it show as min 55 (if required, instead of min 53). I think it can be done without needing any date manipulation at all, too.Joanjoana
Oooh, nice. This is way out of my league but thanks for putting inline comments. Indeed having data rounded to the next multiple of 5 could be handy, if I were handling "past" data. But in case the next multiple of 5 would be in the future, it could break other time-dependent processes.Blacktail
Okay, if it is guaranteed that the last elt will always be minute 5 or 10, then the snippet would always end with it. Kudos to you, Dr. Sandini - for teaching yourself JavaScript. Please let me know if any questions or clarifications.Joanjoana
I now made sure that my original array contains data that are at least 5 min old. So in the case the last elts of the array were for minutes 51, 52, 53, it would be now ok to round the created_at datestring to minute 55 (or the next multiple of 5 min) and make it show as such. I think I can round a digit, but if the last elt is at 11:58PM, the whole datestring would be affected...Blacktail
R
0

Like this?

const moment = require(`moment`)
require('moment-round')

const array = [
  { steps: 40, created_at: '2022-09-03T11:36:00.000Z' },
  { steps: 13, created_at: '2022-09-03T11:37:00.000Z' },
  { steps: 40, created_at: '2022-09-03T11:38:00.000Z' },
  { steps: 40, created_at: '2022-09-03T11:39:00.000Z' },
  { steps: 34, created_at: '2022-09-03T11:40:00.000Z' },
  { steps: 86, created_at: '2022-09-03T11:41:00.000Z' },
  { steps: 23, created_at: '2022-09-03T11:42:00.000Z' },
  { steps: 78, created_at: '2022-09-03T11:43:00.000Z' },
  { steps: 67, created_at: '2022-09-03T11:44:00.000Z' },
  { steps: 80, created_at: '2022-09-03T11:45:00.000Z' },
  { steps: 34, created_at: '2022-09-03T11:46:00.000Z' },
  { steps: 64, created_at: '2022-09-03T11:47:00.000Z' },
  { steps: 32, created_at: '2022-09-03T11:48:00.000Z' },
  { steps: 78, created_at: '2022-09-03T11:49:00.000Z' },
  { steps: 45, created_at: '2022-09-03T11:50:00.000Z' },
]

const result = Array.from(
  array.reduce((m, item) => {
    const key = moment(item.created_at).ceil(5, 'minutes').toISOString()
    return m.set(key, (m.get(key) || 0) + item.steps)
  }, new Map()),
  ([created_at, steps]) => ({ created_at, steps })
)

console.log(result)

Reggy answered 3/9, 2022 at 17:42 Comment(0)
B
0

I am thankful for all the comments. Using native JS instead of moment is a good idea, and in this case is quite easy to write down.

let array = [
    { steps: 40, created_at: '2022-09-03T11:36:00.000Z' },
    { steps: 13, created_at: '2022-09-03T11:37:00.000Z' },
    { steps: 40, created_at: '2022-09-03T11:38:00.000Z' },
    { steps: 40, created_at: '2022-09-03T11:39:00.000Z' },
    { steps: 34, created_at: '2022-09-03T11:40:00.000Z' },
    { steps: 86, created_at: '2022-09-03T11:41:00.000Z' },
    { steps: 23, created_at: '2022-09-03T11:42:00.000Z' },
    { steps: 78, created_at: '2022-09-03T11:43:00.000Z' },
    { steps: 67, created_at: '2022-09-03T11:44:00.000Z' },
    { steps: 80, created_at: '2022-09-03T11:45:00.000Z' },
    { steps: 34, created_at: '2022-09-03T11:46:00.000Z' },
    { steps: 64, created_at: '2022-09-03T11:47:00.000Z' },
    { steps: 32, created_at: '2022-09-03T11:48:00.000Z' },
    { steps: 78, created_at: '2022-09-03T11:49:00.000Z' },
    { steps: 45, created_at: '2022-09-03T11:50:00.000Z' }
    ];

const newArray = array.map(
  (item)=> {       
    const timestamp2UTC = new Date(item.created_at);
    const timestamp2epoch = timestamp2UTC.getTime();
    const timeStampGroup2 =  Math.ceil((timestamp2epoch)/300000);
    return {...item, timestamp2 : timestamp2UTC, timestamp2epoch : timestamp2epoch, timeStampGroup2: timeStampGroup2}    
      }
  );
//console.log(newArray);  

const reducedArray = Array.from(newArray.reduce(
  (m, {timeStampGroup2: timeStampGroup2, steps}) => m.set(timeStampGroup2, (m.get(timeStampGroup2) || 0) + steps), 
  new Map), ([timeStampGroup2, steps]) => ({timeStampGroup2, steps}));
//console.log(reducedArray);

const result2 = reducedArray.map(entry => ({steps : entry.steps, created_at : new Date(entry.timeStampGroup2*300000).toISOString()}));
console.log(`result2:`, result2);

Again this requires 3 steps, but it works. My incoming data is always ordered, but otherwise array sorting (by date) should be done first.

Using moment, the execution of the script above takes around 19ms on my computer, and with native JS around 4ms.

Blacktail answered 4/9, 2022 at 14:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.