Following is the aggregation query :
[
{
"$match": {
"UserId": {
"$in": [
5
]
},
"WorkflowStartTime": {
"$gte": ISODate('2015-04-09T00:00:00.000Z'),
"$lte": ISODate('2015-04-16T00:00:00.000Z')
}
}
},
{
"$group": {
"_id": {
"Task": "$TaskId",
"WorkflowId": "$WorkflowInstanceId"
},
"TaskName": {
"$first": "$Task"
},
"StartTime": {
"$first": "$StartTime"
},
"EndTime": {
"$last": "$EndTime"
},
"LastExecutionTime": {
"$last": "$StartTime"
},
"WorkflowName": {
"$first": "$WorkflowName"
}
}
},
{
"$project": {
"_id": 1,
"LastExecutionTime": 1,
"TaskName": 1,
"AverageExecutionTime": {
"$subtract": [
"$EndTime",
"$StartTime"
]
},
"WorkflowName": 1
}
},
{
"$group": {
"_id": "$_id.Task",
"LastExecutionTime": {
"$last": "$LastExecutionTime"
},
"AverageExecutionTime": {
"$avg": "$AverageExecutionTime"
},
"TaskName": {
"$first": "$TaskName"
},
"TotalInstanceCount": {
"$sum": 1
},
"WorkflowName": {
"$first": "$WorkflowName"
}
}
},
{
"$project": {
"Id": "$_id",
"_id": 0,
"Name": "$TaskName",
"LastExecutionDate": {
"$substr": [
"$LastExecutionTime",
0,
30
]
},
"AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
"TotalInstanceCount": "$TotalInstanceCount",
"WorkflowName": 1
}
}
]
My collection documents are as follows :
{
"_id" : ObjectId("550ff07ce4b09bf056df4ac1"),
"OutputData" : "xyz",
"InputData" : null,
"Location" : null,
"ChannelName" : "XYZ",
"UserId" : 5,
"TaskId" : 95,
"ChannelId" : 5,
"Status" : "Success",
"TaskTypeId" : 7,
"WorkflowId" : 37,
"Task" : "XYZ",
"WorkflowStartTime" : ISODate("2015-03-23T05:09:26Z"),
"EndTime" : ISODate("2015-03-23T05:22:44Z"),
"StartTime" : ISODate("2015-03-23T05:22:44Z"),
"TaskType" : "TRIGGER",
"WorkflowInstanceId" : "23-3-2015-95d17f17-2580-4fe3-b627-12e862af08ce",
"StackTrace" : null,
"WorkflowName" : "XYZ data workflow"
}
I have a index on {WorkflowStartTime:1,UserId:1, StartTime:1}
Their are hardly 900000 records in collection, and as it is i am using a subset of data while quering using date range still it taking around 1.5 to 1.7 seconds. I have used aggregation framework with other collections with huge data and the performance is very good. Don't know what is wrong with this query as its showing very slow output, i expect it to be in mills as its a real time analytics query. Any pointer on it appreciated.
Output when {explain : true } added to aggregation query
{
"stages": [
{
"$cursor": {
"query": {
"UserId": {
"$in": [
5
]
},
"WorkflowStartTime": {
"$gte": "ISODate(2015-04-09T00:00:00Z)",
"$lte": "ISODate(2015-04-16T00:00:00Z)"
}
},
"fields": {
"EndTime": 1,
"StartTime": 1,
"Task": 1,
"TaskId": 1,
"WorkflowInstanceId": 1,
"WorkflowName": 1,
"_id": 0
},
"plan": {
"cursor": "BtreeCursor ",
"isMultiKey": false,
"scanAndOrder": false,
"indexBounds": {
"WorkflowStartTime": [
[
"ISODate(2015-04-16T00:00:00Z)",
"ISODate(2015-04-09T00:00:00Z)"
]
],
"UserId": [
[
5,
5
]
]
},
"allPlans": [
{
"cursor": "BtreeCursor ",
"isMultiKey": false,
"scanAndOrder": false,
"indexBounds": {
"WorkflowStartTime": [
[
"ISODate(2015-04-16T00:00:00Z)",
"ISODate(2015-04-09T00:00:00Z)"
]
],
"UserId": [
[
5,
5
]
]
}
}
]
}
}
},
{
"$group": {
"_id": {
"Task": "$TaskId",
"WorkflowId": "$WorkflowInstanceId"
},
"TaskName": {
"$first": "$Task"
},
"StartTime": {
"$first": "$StartTime"
},
"EndTime": {
"$last": "$EndTime"
},
"LastExecutionTime": {
"$last": "$StartTime"
},
"WorkflowName": {
"$first": "$WorkflowName"
}
}
},
{
"$project": {
"_id": true,
"LastExecutionTime": true,
"TaskName": true,
"AverageExecutionTime": {
"$subtract": [
"$EndTime",
"$StartTime"
]
},
"WorkflowName": true
}
},
{
"$group": {
"_id": "$_id.Task",
"LastExecutionTime": {
"$last": "$LastExecutionTime"
},
"AverageExecutionTime": {
"$avg": "$AverageExecutionTime"
},
"TaskName": {
"$first": "$TaskName"
},
"TotalInstanceCount": {
"$sum": {
"$const": 1
}
},
"WorkflowName": {
"$first": "$WorkflowName"
}
}
},
{
"$project": {
"_id": false,
"Id": "$_id",
"Name": "$TaskName",
"LastExecutionDate": {
"$substr": [
"$LastExecutionTime",
{
"$const": 0
},
{
"$const": 30
}
]
},
"AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
"TotalInstanceCount": "$TotalInstanceCount",
"WorkflowName": true
}
}
],
"ok": 1
}
BtreeCursor
, that should be fine: A query that uses an index has a cursor of type BtreeCursor ( openmymind.net/Speedig-Up-Queries-Understanding-Query-Plans ). Could you execute your query without aggregation framework and tell us how many matching document it finds?...
maybe it could also help to reverse the index to{UserId:1,WorkflowStartTime:1}
, because your$match
query has firstUserId
and thenWorkflowStartTime
. But could also lead to no benefits at all. – Intervalecount()
with your query. just want to see how many documents the aggregation will operate on. – Intervale