How can I apply aggregate functions to data extracted from JSON in Google BigQuery?
Asked Answered
R

1

6

I am extracting JSON data out of a BigQuery column using JSON_EXTRACT. Now I want to extract lists of values and run aggregate functions (like AVG) against them. Testing the JsonPath expression .objects[*].v succeeds on http://jsonpath.curiousconcept.com/. But the query:

SELECT
  JSON_EXTRACT(json_column, "$.id") as id,
  AVG(JSON_EXTRACT(json_column, "$.objects[*].v")) as average_value
FROM [tablename]

throws a JsonPath parse error on BigQuery. Is this possible on BigQuery? Or do I need to preprocess my data in order to run aggregate functions against data inside of my JSON?

My data looks similar to this:

# Record 1
{
  "id": "abc",
  "objects": [
    {
      "id": 1,
      "v": 1
    },
    {
      "id": 2,
      "v": 3
    }
  ]
}
# Record 2
{
  "id": "def",
  "objects": [
    {
      "id": 1,
      "v": 2
    },
    {
      "id": 2,
      "v": 5
    }
  ]
}

This is related to another question.

Update: The problem can be simplified by running two queries. First, run JSON_EXTRACT and save the results into a view. Secondly, run the aggregate function against this view. But even then I need to correct the JsonPath expression $.objects[*].v to prevent the JSONPath parse error.

Radioluminescence answered 28/10, 2014 at 18:42 Comment(2)
Anything from jsonpath that can return a list of things (eg: [*], [:2], ...) doesn't seem to work.Whitver
Asking specifically about the *: #28720380Ecclesiasticism
R
6

Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:

SELECT id, AVG(v) as average 
FROM (
SELECT 
    JSON_EXTRACT(json_column, "$.id") as id, 
    INTEGER( 
      REGEXP_EXTRACT(
        SPLIT(
          JSON_EXTRACT(json_column, "$.objects")
          ,"},{"
          )
        ,r'\"v\"\:([^,]+),')) as v FROM [mytable] 
)
GROUP BY id;
Razo answered 28/10, 2014 at 23:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.