Convert timestamps to datetime for use in Elasticsearch aggregations
Asked Answered
U

1

5

I have an index of SendGrid event data:

"_source": {
    "externalId": "9283cc1d-b003-xxxx-a5af-84fcf31c4181",
    "email": "[email protected]",
    "timestamp": 1616515214,
    "event": "processed",
    "uid": null,
    "id": null,
    "sendgridEventId": null,
    "smtpId": null,
    "sgMessageId": null,
    "sgEventId": null,
    "sendgridEvent": null,
    "type": null,
    "category": [],
    "reason": null,
    "status": null,
    "url": null,
    "useragent": null,
    "ip": null,
    "response": null,
    "tls": null,
    "attempt": null,
    "sendAt": null,
    "asmGroupId": null
}

Now I like to aggregate all of these events for a given day using the timestamp attribute.

GET /sendgridevententity/_search
{
  "query":
   {
    "match_all": {}
   },
   "aggs": {
     "amount_per_day": {
       "date_histogram": {
         "field": "timestamp",
         "calendar_interval": "1d"
       }
     }
   }
}  

Unfortunately, this just yields all the single events as they all have a different timestamp and the aggregation does not group them by day.

How can I convert the timestamps to date and then run the aggregation?

Urba answered 26/3, 2021 at 12:47 Comment(0)
P
6

You can take advantage of a multi field mapping. Here's how it works.

  1. Update the existing mapping with a new date "sub-field". I'm assuming timestamp was originally mapped as a long. I'm also assuming the timestamps are in epoch seconds, thereby the explicitly set format:
POST sendgridevententity/_mapping
{
  "properties": {
    "timestamp": {
      "type": "long",
      "fields": {
        "as_date": {
          "type": "date",
          "format": "epoch_second"
        }
      }
    }
  }
}
  1. This new property now needs to be picked up and your data needs to be reindexed. You can trigger a reindex call via a cool little trick — sending an empty _update_by_query request:
POST sendgridevententity/_update_by_query
  1. After the reindex operation finishes, you can target the new date field through the dot notation:
GET /sendgridevententity/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "amount_per_day": {
      "date_histogram": {
        "field": "timestamp.as_date",
        "format": "yyyy-MM-dd", 
        "calendar_interval": "1d"
      }
    }
  }
}

⚠️ Depending your index size and many other factors, the _update_by_query request may appear to time out. It's possible to set wait_for_completion=false which'll trigger an asynchronous background task instead.

💡 Note that I used size: 0 in the final request. It's a handy tool for returning only the aggregation results.

Platitudinous answered 26/3, 2021 at 14:30 Comment(2)
One additional question: I like to use it in a Kibana Visualization. But for the Bucket I can not use the format and it just states that there is no field of type "date". Is there a way to use this field there as well?Urba
Yes, it should be. It may be necessary to recreate the Index Pattern though so that it picks up the date field too. BTW it's not date -- it's timestamp.as_date and that's what should be able to choose from the dropdown when the index pattern is up to date.Platitudinous

© 2022 - 2024 — McMap. All rights reserved.