Dynamically return columns from a kusto function
Asked Answered
R

2

3

I have a set of telemetry data stored in a table in the below format. Lets call this table as RawTelemetryData

device_id TIME ABC DEF GHI LMN
123 2021-04-20 00:00:00.0000000 1 2 3 4
121 2021-04-20 00:00:00.0000000 1 2 3 4

The above table consist of telemetry data which is coming from several IoT device every 10 seconds. I have a client program which needs different columns to be selected based on device_id.

For example. For device_id 123, client program needs ABC,GHI column. Where as for device_id 121 client program expects to have only ABC,DEF.

Initially I was thinking of writing a function which accepts device_id and the columns to be selected. like the below. Where the first argumet is the device id and second argument is the columns to be selected. Say I need to select ABC,GHI of device id 123 ill call the function like this.

Approach 1

getDataByDeviceId(123,"ABC,GHI") This function should only project only ABC and GHI

 .create-or-alter function with (folder = "getData", skipvalidation = "true") getDataByDeviceId(device_id:int,columns:string) {  
  
}

But I couldn't get this working.

Approach 2

In this approach I preconfigured columns for each device_id in a separate table. In this case I tried to write a function which accepts only device id, and project columns which are configured in the below table.

DEVICE_COLUMN_MAPPING_TABLE

device_id columns
123 ABC
123 DEF
123 GHI
123 LMN
121 ABC
121 DEF
121 GHI
 .create-or-alter function with (folder = "getData", skipvalidation = "true") getDataByDeviceId(device_id:int) {  
 //program should look at  DEVICE_COLUMN_MAPPING_TABLE for colums for this particular device id and project only that.
}

But unfortunately I couldn't get this also working. :(

I want to understand is it possible to get any of the above approach working. If not i have only 1 solution which I can think of , that is construct the query with proper project statement from the source program(A java program, which calls kusto function) based on device_id and invoke to get the results. I was trying to avoid doing this and make all my logic on ADX itself. Any other approach to solve the problem also welcomed :)

Remunerate answered 5/5, 2021 at 5:39 Comment(2)
I don't quite understand what you're trying to achieve. What's the use for the 1st table? Please elaborate the question (maybe by adding an example), and ping me to look into it by replying to this comment. By the way, please note that you won't be able to use this function in the context of extend, as the function will have to use toscalar (in order to return a scalar), and toscalar can't be used in per-row context.Trish
@Slavik I have edited the question and elaborated. Can you please check and let me know if its making sense now :)Remunerate
S
3

Here is a solution based on your second approach:

let device_mappings = datatable(device_id:long, columns_output:string)[
         123,    "ABC",
         123,   "GHI",
         121,   "ABC",
         121,   "DEF",
         122,    "LMN"];
let dt = datatable(device_id:long,TIME:datetime,ABC:long,DEF:long,GHI:long,LMN:long) 
         [123,datetime(2021-04-20 00:00:00.0000000),1,2,3,4,
          123,datetime(2021-04-21 00:00:00.0000000),10,20,30,40,
          121,datetime(2021-04-20 00:00:00.0000000),1,2,3,4,
          122,datetime(2021-04-20 00:00:00.0000000),1,2,3,4];
let func = view(id:long) {
let idKeys = toscalar(device_mappings | where device_id == id | summarize make_set(columns_output));
let otherKeys = toscalar(device_mappings | where device_id != id | summarize make_set(columns_output));
let keysToRemove = set_difference(otherKeys,idKeys);
dt
| where device_id == id
| extend rec = pack_all()
| project filteredRec = bag_remove_keys(rec, keysToRemove)
| evaluate bag_unpack(filteredRec)
};
func(123)

Output: enter image description here

Schmooze answered 5/5, 2021 at 19:49 Comment(0)
A
3

Elaborating on Avnera's answer: there are several pitfalls to be aware of. Below I discuss them and propose workarounds.

Pitfall 1: pack_all() loses column information on no rows

If you run this:

datatable(strCol:string, realCol:real, longCol:long, dynamicCol:dynamic)
[
    "strVal", 1.2, 33, dynamic({"foo":"bar"})
]
| project packed = pack_all()

you will get one row with packed column having the value of {"strCol":"strVal","realCol":1.2,"longCol":33,"dynamicCol":{"foo":"bar"}}.

But if you run this:

datatable(strCol:string, realCol:real, longCol:long, dynamicCol:dynamic) []
| project packed = pack_all()

You will get a table with column packed and no rows.

Pitfall 2: bag_unpack() loses columns that have only null values

If you run this:

datatable(strCol:string, realCol:real, longCol:long, dynamicCol:dynamic)
[
    "strVal", 1.2, 33, dynamic({"foo":"bar"})
]
| project packed = pack_all()
| evaluate bag_unpack(packed)

you will get

enter image description here

but if you run

datatable(strCol:string, realCol:real, longCol:long, dynamicCol:dynamic)
[
    //"strVal", 1.2, 33, dynamic({"foo":"bar"})
]
| union (datatable(extra:string) ["extra"])
| project packed = pack_all()
| evaluate bag_unpack(packed)

you will get:

enter image description here

Pitfall 3: column order is not retained after bag_unpack()

This is already evident from the examples above.

How to work around these pitfalls

There are two main ideas. First, is to add an extra row to retain the column information even on no rows, and then get rid of it. Second, is to use project with column_ifexists to retain column ordering.

A full example follows:

let data = datatable(strCol:string, realCol:real, longCol:long, dynamicCol:dynamic)
[
    // Empty data, to simulate no rows
    // Uncomment the line below to verify the output table is correct when data is present.
    // "strVal", 1.2, 33, dynamic({"foo":"bar"}),
];
// The extra row that will allow us to retain column information
let columnsPreserver = datatable(strCol:string, realCol:real, longCol:long, dynamicCol:dynamic, columnsPreserverToremove:string)
[
    // Note that real column has 0.0, not 0, so that bag_unpack deduces the type correctly.
    // Similarly, the dynamic column has to have {"a":1}, otherwise bag_unpack will deduce 
    // the type to be string, not dynamic.
    "", 0.0, 0, dynamic({"a":1}), "columnsPreserverToremove"
];
union
data
// Here you can also union more tables conditionally, per https://mcmap.net/q/1430167/-kusto-performing-operations-based-on-a-condition
,columnsPreserver
| project packed = pack_all()
// Here you can also remove the unwanted keys, per Avnera's answer
| evaluate bag_unpack(packed)
| project
column_ifexists("strCol",""),
column_ifexists("realCol",""),
column_ifexists("longCol",""),
column_ifexists("dynamicCol",""),
columnsPreserverToremove
// Here we get rid of the extra columns that came from column_ifexists calls,
// because column_ifexists returns "Column1", "Column2" etc. on missing columns.
| project-away Column*
// Here we get rid of the extra row that came from "columnsPreserver" inclusion in the union.
| where isempty(columnsPreserverToremove)
| project-away columnsPreserverToremove

Alternative to bag_remove_keys

Instead of bag_remove_keys one can first nullify the unwanted columns with a call like:

| extend fooColumn = iff(includeColumn, fooColumn, "")

and then call:

pack_all(ignore_null_empty=true)

This approach might be simpler if you are dealing with e.g. only two possible sets of columns, like here: Kusto, Performing operations based on a condition

Regarding these pitfalls, I filed a related issue against the official doc on union operator and also a user voice item.

Aindrea answered 2/7, 2022 at 21:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.