I had some problems setting a decimal on a Glue Table Schema recently.
I had to create my schema via the AWS cli.
What I had was a little different, it was a parquet on my s3 datalake.
The following cli command creates the schema based on a json:
aws glue create-table --database-name example_db --table-input file://example.json
The following example.json
references a parquet files on s3://my-datalake/example/{dt}/
where dt
is a partition of my table. And dec_col
is a column with decimal(10,2)
type:
{
"Name": "example",
"Retention": 0,
"StorageDescriptor": {
"Columns": [
{
"Name": "id",
"Type": "int"
},
{
"Name": "dec_col",
"Type": "decimal(10,2)"
}
],
"Location": "s3://my-datalake/example/",
"InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
"Compressed": false,
"NumberOfBuckets": 0,
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
"Parameters": {
"serialization.format": "1"
}
},
"SortColumns": [],
"StoredAsSubDirectories": false
},
"PartitionKeys": [
{
"Name": "dt",
"Type": "date"
}
],
"TableType": "EXTERNAL_TABLE",
"Parameters": {
"classification": "parquet"
}
}
This way you can define the type as decimal
with scale and precision, which is what you're looking for.