Upsert in Kusto DB
Asked Answered
H

2

5

I have an ADF which writes output of a Kusto Function to a Kusto Table daily. I need to upsert the data daily into the table. I did not find a way to update the existing data in Kusto DB. Is there any way I can insert row if data not exists with the grain else update the existing row through ADF?

Set or replace will be really expensive for me since the table consist of a huge set of data

If that is not possible Can I truncate the last 2 months data using the datetime stamp I have and reload it from function

Hagerman answered 24/7, 2019 at 7:43 Comment(0)
N
7

ADX/Kusto is built for analytics, rather than OLTP, scenarios. Therefore, its design trade-offs favor very fast bulk Create (supporting high rates of inserts/appends of new records) and very fast bulk Read (supporting queries over large amounts of data). ADX/Kusto's support for Delete scenarios focuses on bulk-delete (mainly for retention period), and per-record deletion is not supported. Likewise, Updates of existing records is not supported in ADX/Kusto.

There are several techniques for dealing with "duplicate" data, which may allow you to simply ingest new records without doing what you call an upsert (which isn't an operation supported by ADX/Kusto, as explained above): https://learn.microsoft.com/en-us/azure/data-explorer/dealing-with-duplicates

Another option for you to consider is replacing data in bulk - by tagging and replacing data shards (extents), using extent-level control commands:

Nephoscope answered 24/7, 2019 at 16:5 Comment(0)
C
0

As others have said, updating individual rows is not necessarily possible, but there are workarounds.

you can .append records to the end of the table, and when you want to retrieve only the latest ones, select the your row based on which one was written most recently. to do this, use an arg_max based on the ingestion_time(). your queries would look something like:

MyRecordsTable
| summarize arg_max(ingestion_time(), RecordIdentifier, RecordValue1, RecordValue2) by RecordIdentifier

If you know that all of the required fields hare updated regularly, you can optimize your query by leveraging Kusto's default time index. Every record is sorted based on time, so adding a

| where ingestion_time> ago(7d)

filter before the 'summarize' filter allows you to keep things efficient

Your final query would look something like

MyRecordsTable
| where ingestion_time> ago(7d)
| summarize arg_max(ingestion_time(), RecordIdentifier, RecordValue1, RecordValue2) by RecordIdentifier
Cassation answered 30/6, 2023 at 19:50 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.