Calculate the median in MongoDB aggregation framework
Asked Answered
S

7

31

Is there a way to calculate the median using the MongoDB aggregation framework?

Spidery answered 8/12, 2013 at 16:57 Comment(2)
AFAIK there is no such thing as $median so probably you'll have to use map-reduce for this.Scup
There is an open feature request to add support for a $median accumulator. Please upvote/watch SERVER-4929 in the MongoDB issue tracker.Excurrent
C
42

The median is somewhat tricky to compute in the general case, because it involves sorting the whole data set, or using a recursion with a depth that is also proportional to the data set size. That's maybe the reason why many databases don't have a median operator out of the box (MySQL doesn't have one, either).

The simplest way to compute the median would be with these two statements (assuming the attribute on which we want to compute the median is called a and we want it over all documents in the collection, coll):

count = db.coll.count();
db.coll.find().sort( {"a":1} ).skip(count / 2 - 1).limit(1);

This is the equivalent to what people suggest for MySQL.

Causeuse answered 11/3, 2014 at 14:50 Comment(2)
I know it is not allowed to put a comment just to say thanks.. but this is beautiful :)Spokeswoman
warning: this is code light, but server heavyHatband
M
12

It's possible to do it in one shot with the aggregate framework.

Sort => put in Array sorted values => get Size of array => divide size by two => get Int value of the division (left side of median) => add 1 to left side ( right side) => get array element at left side and right side => average of the two elements

This is a sample with Spring java mongoTemplate :

The model is a list of book with the login of the author ("owner"), the objective is to get the median of book by users :

        GroupOperation countByBookOwner = group("owner").count().as("nbBooks");

    SortOperation sortByCount = sort(Direction.ASC, "nbBooks");

    GroupOperation putInArray = group().push("nbBooks").as("nbBooksArray");

    ProjectionOperation getSizeOfArray = project("nbBooksArray").and("nbBooksArray").size().as("size");

    ProjectionOperation divideSizeByTwo = project("nbBooksArray").and("size").divide(2).as("middleFloat");

    ProjectionOperation getIntValueOfDivisionForBornLeft = project("middleFloat", "nbBooksArray").and("middleFloat")
            .project("trunc").as("beginMiddle");

    ProjectionOperation add1ToBornLeftToGetBornRight = project("beginMiddle", "middleFloat", "nbBooksArray")
            .and("beginMiddle").project("add", 1).as("endMiddle");

    ProjectionOperation arrayElementAt = project("beginMiddle", "endMiddle", "middleFloat", "nbBooksArray")
            .and("nbBooksArray").project("arrayElemAt", "$beginMiddle").as("beginValue").and("nbBooksArray")
            .project("arrayElemAt", "$endMiddle").as("endValue");

    ProjectionOperation averageForMedian = project("beginMiddle", "endMiddle", "middleFloat", "nbBooksArray",
            "beginValue", "endValue").and("beginValue").project("avg", "$endValue").as("median");

    Aggregation aggregation = newAggregation(countByBookOwner, sortByCount, putInArray, getSizeOfArray,
            divideSizeByTwo, getIntValueOfDivisionForBornLeft, add1ToBornLeftToGetBornRight, arrayElementAt,
            averageForMedian);

    long time = System.currentTimeMillis();
    AggregationResults<MedianContainer> groupResults = mongoTemplate.aggregate(aggregation, "book",
            MedianContainer.class);

And here the resulting aggregation :

{
"aggregate": "book" ,
"pipeline": [
    {
        "$group": {
            "_id": "$owner" ,
            "nbBooks": {
                "$sum": 1
            }
        }
    } , {
        "$sort": {
            "nbBooks": 1
        }
    } , {
        "$group": {
            "_id": null  ,
            "nbBooksArray": {
                "$push": "$nbBooks"
            }
        }
    } , {
        "$project": {
            "nbBooksArray": 1 ,
            "size": {
                "$size": ["$nbBooksArray"]
            }
        }
    } , {
        "$project": {
            "nbBooksArray": 1 ,
            "middleFloat": {
                "$divide": ["$size" , 2]
            }
        }
    } , {
        "$project": {
            "middleFloat": 1 ,
            "nbBooksArray": 1 ,
            "beginMiddle": {
                "$trunc": ["$middleFloat"]
            }
        }
    } , {
        "$project": {
            "beginMiddle": 1 ,
            "middleFloat": 1 ,
            "nbBooksArray": 1 ,
            "endMiddle": {
                "$add": ["$beginMiddle" , 1]
            }
        }
    } , {
        "$project": {
            "beginMiddle": 1 ,
            "endMiddle": 1 ,
            "middleFloat": 1 ,
            "nbBooksArray": 1 ,
            "beginValue": {
                "$arrayElemAt": ["$nbBooksArray" , "$beginMiddle"]
            } ,
            "endValue": {
                "$arrayElemAt": ["$nbBooksArray" , "$endMiddle"]
            }
        }
    } , {
        "$project": {
            "beginMiddle": 1 ,
            "endMiddle": 1 ,
            "middleFloat": 1 ,
            "nbBooksArray": 1 ,
            "beginValue": 1 ,
            "endValue": 1 ,
            "median": {
                "$avg": ["$beginValue" , "$endValue"]
            }
        }
    }
]

}

Murrelet answered 27/12, 2016 at 10:20 Comment(1)
I don't know if this works, just upvoting for the dedicationLordly
K
8

Starting Mongo 4.4, the $group stage has a new aggregation operator $accumulator allowing custom accumulations of documents as they get grouped, via javascript user defined functions.

Thus, in order to find the median:

// { "a" : 25, "b" : 12 }
// { "a" : 89, "b" : 7  }
// { "a" : 25, "b" : 17 }
// { "a" : 25, "b" : 24 }
// { "a" : 89, "b" : 15 }
db.collection.aggregate([
  { $group: {
    _id: "$a",
    median: {
      $accumulator: {
        accumulateArgs: ["$b"],
        init: function() { return []; },
        accumulate: function(bs, b) { return bs.concat(b); },
        merge: function(bs1, bs2) { return bs1.concat(bs2); },
        finalize: function(bs) {
          bs.sort(function(a, b) { return a - b });
          var mid = bs.length / 2;
          return mid % 1 ? bs[mid - 0.5] : (bs[mid - 1] + bs[mid]) / 2;
        },
        lang: "js"
      }
    }
  }}
])
// { "_id" : 25, "median" : 17 }
// { "_id" : 89, "median" : 11 }

The accumulator:

  • accumulates on the field b (accumulateArgs)
  • is initialised to an empty array (init)
  • accumulates b items in an array (accumulate and merge)
  • and finally performs the median calculation on b items (finalize)
Keening answered 15/3, 2020 at 15:27 Comment(1)
hi, this is a very good way to solve the problem. But I'm working on a Java project that needs the same thing. Do you know where I can find related docs about this. Thank youKellie
K
5

While maxiplay's answer is not accurate, it did lead me in the correct direction. The problem with the given solution is that it only works when the number of records is even. Because for odd number of records, one simply needs to take the value at midpoint without having to calculate an average.

This is how I got it to work.

db.collection.aggregate([
{ "$match": { "processingStatus": "Completed" } },
{ "$sort": { "value": 1 } },
{ 
    "$group": {
        "_id": "$userId",
        "valueArray": {
            "$push": "$value"
        }
    } 
},
{
    "$project": {
        "_id": 0,
        "userId": "$_id",
        "valueArray": 1,
        "size": { "$size": ["$valueArray"] }
    }
},
{
    "$project": {
        "userId": 1,
        "valueArray": 1,
        "isEvenLength": { "$eq": [{ "$mod": ["$size", 2] }, 0 ] },
        "middlePoint": { "$trunc": { "$divide": ["$size", 2] } }
    }
},
{
    "$project": {
        "userId": 1,
        "valueArray": 1,
        "isEvenLength": 1,
        "middlePoint": 1,
        "beginMiddle": { "$subtract": [ "$middlePoint", 1] },
        "endMiddle": "$middlePoint"
    }
},
{
    "$project": {
        "userId": 1,
        "valueArray": 1,
        "middlePoint": 1,
        "beginMiddle": 1,
        "beginValue": { "$arrayElemAt": ["$valueArray", "$beginMiddle"] },
        "endValue": { "$arrayElemAt": ["$valueArray", "$endMiddle"] },
        "isEvenLength": 1
    }
},
{
    "$project": {
        "userId": 1,
        "valueArray": 1,
        "middlePoint": 1,
        "beginMiddle": 1,
        "beginValue": 1,
        "endValue": 1,
        "middleSum": { "$add": ["$beginValue", "$endValue"] },
        "isEvenLength": 1
    }
},
{
    "$project": {
        "userId": 1,
        "valueArray": 1,
        "median": { 
            "$cond": { 
                if: "$isEvenLength", 
                then: { "$divide": ["$middleSum", 2] },
                else:  { "$arrayElemAt": ["$valueArray", "$middlePoint"] }
            } 
        }
    }
}
])
Ky answered 24/4, 2019 at 12:26 Comment(4)
there is no stepsArray in the pipeline the result of this is always nullThibaud
There was an issue with the pipeline where $sort was after $group and it should be before. Let me know if you still get null result.Ky
@Thibaud if you change $stepsArray to $valueArray this query should work for youIlonailonka
Using $let could've made using a single $project possibleGusti
A
2

The aggregation framework doesn't support median out-of-the-box. So you will have to write something on your own.

I would recommend you to do this on the application level. Retrieve all your documents with a normal find(), sort the result sets (either on the datbase by using the .sort() function of the cursor or sorting them in the application - your decision) and then getting the element size / 2.

When you really want to do it on the database level, you could do it with map-reduce. The map-function would emit key and an array with a single value - the value you want to get the median of. The reduce-function would just concatenate the arrays of the results it receives, so each key ends up with an array with all values. The finalize-function would then compute the median of that array, again by by sorting the array and then get the element number size / 2.

Acea answered 8/12, 2013 at 18:45 Comment(0)
P
1

Since mongoDB version 7.0 there is a $median accumulator. For example:

db.collection.aggregate([
  {$group: {
      _id: null,
      median: {
        $median: {
          input: "$rating",
          method: "approximate"
        }
      }
    }
  }
])

See how it works on the playground since version 7.0

Presidency answered 24/8, 2023 at 19:0 Comment(0)
G
0

My solution to this problem is quite similar to Taher's answer but with fewer $project stages.

// { "value" : 1 }
// { "value" : 2  }
// { "value" : 4 }
// { "value" : 5 }
db.median_values.aggregate([
  // Sort the values
  { $sort: { value: 1 } },
  // Get an array of all the values
  { $group: { _id: null, valuesArray: { $push: "$value" } } },
  // Get if the array has an even or odd number of elements
  {
    $project: {
      _id: 0,
      valuesArray: 1,
      isEven: { $eq: [{ $mod: [{ $size: "$valuesArray" }, 2] }, 0] },
      dividedByTwoIndex: { $divide: [{ $size: "$valuesArray" }, 2] },
    },
  },
  // Get the left value and right value if the array has an even or odd number of elements
  {
    $project: {
      _id: 0,
      left: {
        $cond: {
          if: "$isEven",
          then: {
            $arrayElemAt: [
              "$valuesArray",
              { $subtract: ["$dividedByTwoIndex", 1] },
            ],
          },
          else: {
            $arrayElemAt: ["$valuesArray", { $floor: "$dividedByTwoIndex" }],
          },
        },
      },
      right: {
        $cond: {
          if: "$isEven",
          then: {
            $arrayElemAt: ["$valuesArray", "$dividedByTwoIndex"],
          },
          else: {
            $arrayElemAt: ["$valuesArray", { $floor: "$dividedByTwoIndex" }],
          },
        },
      },
    },
  },

  // Compute the median value
  { $project: { median: { $avg: ["$left", "$right"] } } },
]);

// Output:
// { "median" : 3 }
Genitals answered 8/10, 2022 at 8:32 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.