Export Custom Event Dimensions to SQL from Application Insights using Stream Analytics
Asked Answered
E

5

5

I'm following the example walkthrough Export to SQL from Application Insights using Stream Analytics. I am trying to export custom event dimensions (context.custom.dimensions in the JSON example below) which get added as a nested JSON array in the data file. How do I flatten the dimensions array at context.custom.dimensions for export to SQL?

JSON...

{
  "event": [
    {
      "name": "50_DistanceSelect",
      "count": 1
    }
  ],
  "internal": {
    "data": {
      "id": "aad2627b-60c5-48e8-aa35-197cae30a0cf",
      "documentVersion": "1.5"
    }
  },
  "context": {
    "device": {
      "os": "Windows",
      "osVersion": "Windows 8.1",
      "type": "PC",
      "browser": "Chrome",
      "browserVersion": "Chrome 43.0",
      "screenResolution": {
        "value": "1920X1080"
      },
      "locale": "unknown",
      "id": "browser",
      "userAgent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
    },
    "application": {},
    "location": {
      "continent": "North America",
      "country": "United States",
      "point": {
        "lat": 38.0,
        "lon": -97.0
      },
      "clientip": "0.115.6.185",
      "province": "",
      "city": ""
    },
    "data": {
      "isSynthetic": false,
      "eventTime": "2015-07-15T23:43:27.595Z",
      "samplingRate": 0.0
    },
    "operation": {
      "id": "2474EE6F-5F6F-48C3-BA43-51636928075A"
    },
    "user": {
      "anonId": "BA05C4BE-1C42-482F-9836-D79008E78A9D",
      "anonAcquisitionDate": "0001-01-01T00:00:00Z",
      "authAcquisitionDate": "0001-01-01T00:00:00Z",
      "accountAcquisitionDate": "0001-01-01T00:00:00Z"
    },
    "custom": {
      "dimensions": [
        {
          "CategoryAction": "click"
        },
        {
          "SessionId": "73ef454d-fa39-4125-b4d0-44486933533b"
        },
        {
          "WebsiteVersion": "3.0"
        },
        {
          "PageSection": "FilterFind"
        },
        {
          "Category": "EventCategory1"
        },
        {
          "Page": "/page-in-question"
        }
      ],
      "metrics": []
    },
    "session": {
      "id": "062703E5-5E15-491A-AC75-2FE54EF03623",
      "isFirst": false
    }
  }
}
Exchequer answered 20/7, 2015 at 23:48 Comment(1)
the only way possible right now is writing your own worker Role and parse data each hour. github.com/Azure/azure-content/blob/master/articles/…Waspish
J
6

A slightly more dynamic solution is to set up a temp table:

WITH ATable AS (
SELECT
     temp.internal.data.id as ID
    ,dimensions.ArrayValue.CategoryAction as CategoryAction
    ,dimensions.ArrayValue.SessionId as SessionId 
    ,dimensions.ArrayValue.WebsiteVersion as WebsiteVersion 
    ,dimensions.ArrayValue.PageSection as PageSection 
    ,dimensions.ArrayValue.Category as Category 
    ,dimensions.ArrayValue.Page as Page  
FROM [analyticseventinputs] temp 
CROSS APPLY GetElements(temp.[context].[custom].[dimensions]) as dimensions)

and then doing joins based on a unique key

FROM [analyticseventinputs] Input 
Left JOIN ATable CategoryAction on 
    Input.internal.data.id = CategoryAction.ID AND
    CategoryAction.CategoryAction <> "" AND
     DATEDIFF(day, Input, CategoryAction) BETWEEN 0 AND 5 

The rather annoying bit is the requirement for the datediff, because the joins are intended to combine 2 streams of data but in this case you are only joining on the unique key. So I set it to a large value of 5 days. This really only protects against the custom params not coming in ordered compared to the other solution.

Jandel answered 27/7, 2015 at 16:47 Comment(1)
This works! Thanks. Hopefully the array processing that is coming in August will make this easier.Exchequer
R
5

Most tutorials online use CROSS APPLY or OUTER APPLY however this is not what you are looking for because it will put each property on a different row. To over come this use the functions: GetRecordPropertyValue and GetArrayElement as demoed below. This will flatten out the properties into a single row.

SELECT
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'CategoryAction') AS CategoryAction,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'SessionId') AS SessionId,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 2), 'WebsiteVersion') AS WebsiteVersion,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 3), 'PageSection') AS PageSection,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 4), 'Category') AS Category,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 5), 'Page') AS Page
INTO
  [outputstream]
FROM
  [inputstream] MySource
Rosewater answered 17/12, 2015 at 18:36 Comment(8)
Please, add some comments to your answer.Disapprove
What if order or amount of "dimensions" varies?Lubricator
I marked this as the answer because 1) The Join solution by Xinco24 seems less efficient than this approach - esp if you have many fields, 2) It is more compact than using a CASE statement as in the solution by Konstantin Zoryn and 3) The field order of the dimension array on the JSON input won't change.Dactylology
Eugene D.Gubenkov if the number of dimensions can vary, then you can use a combination of the method I described above along with Konstantin Zoryn case statement example.Rosewater
The problem I'm facing is that the order of the dimensions varies, so I can't access by index.Flinty
@MarioLevrero: Did you find a solution with the varying ordering of the dimensions? I have the same problem!Detrital
@davenewza: When I developed my query initially, the dimensions didn't change order but they did today. I had to use the JOIN approach from Xinco24Dactylology
@davenewza, I use case/else to check that the dimensions are in the right order, but the query becomes ugly exponentially :( I also asked in AI forum without answerFlinty
A
2

What schema do you have in SQL? Do you want a single row in SQL with all the dimensions as columns?

This might not be possible today. However there will be more Array/Record functions in Azure Stream Analytics after July 30.

Then you will be able to do something like this:

SELECT 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 0
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 0), 'CategoryAction')
        ELSE ''
        END AS CategoryAction 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 1
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 1), 'WebsiteVersion')
        ELSE ''
        END AS WebsiteVersion 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 2
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 2), 'PageSection')
        ELSE ''
        END AS PageSection
FROM input

If you want to have separate rows per dimension then you can use CROSS APPLY operator.

Accelerant answered 22/7, 2015 at 21:56 Comment(4)
Yes - I want a single row in SQL with all the dimensions as columns. Guess I'll have to wait :-(Exchequer
With this solution seems that you know the exact position of the custom params in the array. what if custom params are not ordered?Waspish
sorry - had to update the answer - Xinco24 provided a working solution.Exchequer
This might be obvious to some, but it caused me some grief today. If your dimension names have periods in their names, then you need to use the normal SQL syntax where you bracket the names in your call to GetRecordPropertyValue. i.e '[AppName.Metric]'Allgood
C
1

A very convenient way to do this as proposed by Alex Raizman is to do some aggregations to the fields you want to flatten, grouping by the remaining select list, assuming that

  • you know the set of possible objects in dimensions, and
  • you do not have duplicate objects in this array, and
  • there is something you can uniquely identify your initial rows (like the id )

    SELECT
      CategoryAction= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'CategoryAction') AS
      NVARCHAR(MAX))),
      SessionId= min(CAST(GetRecordPropertyValue(d.arrayvalue, 'SessionId') AS
      NVARCHAR(MAX))),
      WebsiteVersion= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'WebsiteVersion') AS
      NVARCHAR(MAX))),
      PageSection= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'PageSection') AS
      NVARCHAR(MAX))),
      Category= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Category') AS
      NVARCHAR(MAX))),    
      Page= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Page') AS NVARCHAR(MAX))) 
    INTO  
      [outputstream] 
    FROM [inputstream] MySource 
    CROSS APPLY GetArrayElements(MySource.[context].[custom].[dimensions]) d 
    GROUP BY System.Timestamp, MySource.id
    

We also group by System.Timestamp to create one temporal window as expected from Stream Analytics to perform set-based operations like counts or aggregations.

Crawly answered 27/9, 2018 at 9:16 Comment(0)
B
0

Although question is old. But this is how achieved the single row for customdimensions. It can get ugly as number of customdimensions increase.

    SELECT
    A.internal.data.id,        
    eventFlat.ArrayValue.name as eventName,
    A.context.operation.name as operation,
    A.context.data.eventTime,
    a1.company,
    a2.userId,
    a3.feature,        
    A.context.device,    
    A.context.location         
FROM [YourInputAlias] A   
OUTER APPLY GetArrayElements(A.event) eventFlat  
LEFT JOIN (
        SELECT 
        A1.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.company
      FROM [YourInputAlias] A1  
      OUTER APPLY GetArrayElements(A1.context.custom.dimensions) customDimensionsFlat   
      where  customDimensionsFlat.ArrayValue.company IS NOT NULL
      ) a1 ON a.internal.data.id = a1.id AND datediff(day, a, a1) between 0 and 5
LEFT JOIN (
        SELECT 
        A2.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.userid     
      FROM [YourInputAlias] A2  
      OUTER APPLY GetArrayElements(A2.context.custom.dimensions) customDimensionsFlat    
      where  customDimensionsFlat.ArrayValue.userid  IS NOT NULL
      ) a2 ON a.internal.data.id = a2.id AND datediff(day, a, a2) between 0 and 5
LEFT JOIN (
        SELECT 
        A3.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.feature     
      FROM [YourInputAlias] A3
      OUTER APPLY GetArrayElements(A3.context.custom.dimensions) customDimensionsFlat    
      where  customDimensionsFlat.ArrayValue.feature  IS NOT NULL
      ) a3 ON a.internal.data.id = a3.id AND datediff(day, a, a3) between 0 and 5
Bidding answered 20/5, 2020 at 3:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.