I am using Firehose and Glue to ingest data and convert JSON to the parquet file in S3.
I was successful to achieve it with normal JSON (not nested or array). But I am failed for a nested JSON array. What I have done:
the JSON structure
{
"class_id": "test0001",
"students": [{
"student_id": "xxxx",
"student_name": "AAAABBBCCC",
"student_gpa": 123
}]
}
the Glue schema
- class_id : string
- students : array
ARRAY<STRUCT<student_id:STRING,student_name:STRING,student_gpa:INT>>
I receive error:
The schema is invalid. Error parsing the schema: Error: type expected at the position 0 of 'ARRAY<STRUCT<student_id:STRING,student_name:STRING,student_gpa:INT>>' but 'ARRAY' is found.
Any suggestion is appreciated.