Group by date intervals
Asked Answered
W

3

7

I have a collection with documents like this:

{ datetime: new Date(), count: 1234 }

I want to get sums of count by 24 hours, 7 days and 30 days intervals.

The result should be like:

{ "sum": 100,  "interval": "day" }
{ "sum": 700,  "interval": "week" }
{ "sum": 3000, "interval": "month" }

In more abstract terms, I need to group results by multiple conditions (in this case — multiple time intervals)

The MySQL equivalent would be:

SELECT 
    IF (time>CURRENT_TIMESTAMP() - INTERVAL 24 HOUR, 1, 0) last_day,
    IF (time>CURRENT_TIMESTAMP() - INTERVAL 168 HOUR, 1, 0) last_week,
    IF (time>CURRENT_TIMESTAMP() - INTERVAL 720 HOUR, 1, 0) last_month,
    SUM(count) count
FROM table
GROUP BY    last_day,
            last_week,
            last_month
Whereby answered 3/1, 2015 at 1:55 Comment(0)
R
3

There are two different ways to do this. One is to issue a separate count() query for each of the ranges. This is pretty easy, and if the datetime field is indexed, it will be fast.

The second way is to combine them all into one query using a similar method as your SQL example. To do this, you need to use the aggregate() method, creating a pipeline of $project to create the 0 or 1 values for the new "last_day", "last_week", and "last_month" fields, and then use the $group operator to do the sums.

Rollerskate answered 3/1, 2015 at 4:58 Comment(0)
A
18

There are date aggregation operators available to the aggregation framework of MongoDB. So for example a $dayOfYear operator is used to get that value from the date for use in grouping:

db.collection.aggregate([
    { "$group": {
        "_id": { "$dayOfYear": "$datetime" },
        "total": { "$sum": "$count" }
    }}
])

Or you can use a date math approach instead. By applying the epoch date you convert the date object to a number where the math can be applied:

db.collection.aggregate([
    { "$group": {
        "_id": { 
            "$subtract": [
                { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                { "$mod": [
                    { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                    1000 * 60 * 60 * 24
                ]}
            ]
        },
        "total": { "$sum": "$count" }
    }}
])

If what you are after is intervals from a current point in time then what you want is basically the date math approach and working in some conditionals via the $cond operator:

db.collection.aggregate([
    { "$match": {
        "datetime": { 
            "$gte": new Date(new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 365 ))
        }
    }},
    { "$group": {
        "_id": null,
        "24hours": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 )
                    ]},
                    "$count",
                    0
                ]
            }
        },
        "30days": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 30 )
                    ]},
                    "$count",
                    0
                ]
            }
        },
        "OneYear": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 365 )
                    ]},
                    "$count",
                    0
                ]
            }
        }
    }}
])

It's essentially the same approach as the SQL example, where the query conditionally evaluates whether the date value falls within the required range and decides whether or not to add the value to the sum.

The one addition here is the additional $match stage to restrict the query to only act on those items that would possibly be within the maximum one year range you are asking for. That makes it a bit better than the presented SQL in that an index could be used to filter those values out and you don't need to "brute force" through non matching data in the collection.

Always a good idea to restrict the input with $match when using an aggregation pipeline.

Alleviative answered 3/1, 2015 at 2:5 Comment(4)
I don't need sums of every day. I need sums of only these three time intervals.Whereby
@Whereby Then what do you think you do? What are you expecting? All sums in a single query result or is running separate queries okay? You need to explain what you expect as a result.Alleviative
I want to get three entries for each interval (day, week, month) as a result. Ideally, in a single query.Whereby
@Whereby That is very possible once you explain it clearly. The approach with MongoDB is very much the same.Alleviative
R
3

There are two different ways to do this. One is to issue a separate count() query for each of the ranges. This is pretty easy, and if the datetime field is indexed, it will be fast.

The second way is to combine them all into one query using a similar method as your SQL example. To do this, you need to use the aggregate() method, creating a pipeline of $project to create the 0 or 1 values for the new "last_day", "last_week", and "last_month" fields, and then use the $group operator to do the sums.

Rollerskate answered 3/1, 2015 at 4:58 Comment(0)
B
1

Starting in Mongo 5, it's a nice use case for the $dateDiff operator in association with a $facet stage:

// { date: ISODate("2021-12-04"), count: 3  } <= today
// { date: ISODate("2021-11-29"), count: 5  } <= last week
// { date: ISODate("2021-11-24"), count: 1  } <= last month
// { date: ISODate("2021-11-12"), count: 12 } <= last month
// { date: ISODate("2021-10-04"), count: 8  } <= too old
db.collection.aggregate([

  { $set: {
    diff: { $dateDiff: { startDate: "$$NOW", endDate: "$date", unit: "day" } }
  }},

  { $facet: {
    lastMonth: [
      { $match: { diff: { $gt: -30 } } },
      { $group: { _id: null, total: { $sum: "$count" } } }
    ],
    lastWeek: [
      { $match: { diff: { $gt: -7 } } },
      { $group: { _id: null, total: { $sum: "$count" } } }
    ],
    lastDay: [
      { $match: { diff: { $gt: -1 } } },
      { $group: { _id: null, total: { $sum: "$count" } } }
    ]
  }},

  { $set: {
    lastMonth: { $first: "$lastMonth.total" },
    lastWeek: { $first: "$lastWeek.total" },
    lastDay: { $first: "$lastDay.total" }
  }}
])
// { lastMonth: 21, lastWeek: 8, lastDay: 3 }

This:

  • first computes (with $dateDiff) the number of days of difference between today ("$$NOW") and the document's date

    • if the date is 3 days ago, diff will be set to -3

    • the intermediate result being:

      { date: ISODate("2021-12-04"), count: 3,  diff: 0   }
      { date: ISODate("2021-11-29"), count: 5,  diff: -5  }
      { date: ISODate("2021-11-24"), count: 1,  diff: -10 }
      { date: ISODate("2021-11-12"), count: 12, diff: -22 }
      { date: ISODate("2021-10-04"), count: 8,  diff: -61 }
      
  • then performs a $facet stage that allows us to run multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its result is stored as an array of documents.

    • this way, we can create a lastMonth field that'll contain the sum of counts ($sum: "$count") for documents whose day diff with today is more than 30 days ({ $match: { diff: { $gt: -30 } } })

    • while we do the same for lastWeek and lastDay.

    • the intermediate result being:

      {
        lastMonth: [{ _id: null, total: 21 }],
        lastWeek: [{ _id: null, total: 8 }],
        lastDay: [{ _id: null, total: 3 }]
      }
      
  • and finally cleans up the $facet output with a $set stage to get fields in a nice format:

    { lastMonth: 21, lastWeek: 8, lastDay: 3 }
    
Brotherton answered 4/12, 2021 at 17:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.