Setting the number of decimal places when updating Glue Table Schema
Asked Answered
A

2

6

I'm trying to update a CSV table definition that has been created by the Glue Data Crawler. One of the columns contains decimal data that is currently being classified as double precision.

I'm finding that when I use the console to change the schema, I can't set any additional properties that might be associated with the data types (for example, if I select Decimal I get Decimal(10,0) with no way to change the size of the number or the number of decimal places).

What's the recommended way to update this schema so that it has the correct data-type, including additional properties?

Archaeological answered 8/3, 2020 at 4:29 Comment(0)
H
6

I had some problems setting a decimal on a Glue Table Schema recently. I had to create my schema via the AWS cli.

What I had was a little different, it was a parquet on my s3 datalake.

The following cli command creates the schema based on a json:

aws glue create-table --database-name example_db --table-input file://example.json

The following example.json references a parquet files on s3://my-datalake/example/{dt}/ where dt is a partition of my table. And dec_col is a column with decimal(10,2) type:

{
    "Name": "example",
    "Retention": 0,
    "StorageDescriptor": {
        "Columns": [
          {
            "Name": "id",
            "Type": "int"
        },
        {
            "Name": "dec_col",
            "Type": "decimal(10,2)"
        }
        ],
        "Location": "s3://my-datalake/example/",
        "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
        "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
        "Compressed": false,
        "NumberOfBuckets": 0,
        "SerdeInfo": {
            "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
            "Parameters": {
                "serialization.format": "1"
            }
        },
        "SortColumns": [],
        "StoredAsSubDirectories": false
    },
    "PartitionKeys": [
        {
            "Name": "dt",
            "Type": "date"
        }
    ],
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
        "classification": "parquet"
    }
}

This way you can define the type as decimal with scale and precision, which is what you're looking for.

Hort answered 11/3, 2020 at 17:11 Comment(0)
H
3

I found another way of changing the decimal data type in the Glue console.

When you open an existing Glue table in the Glue console there is an "Edit schema as JSON" button next to the "Edit schema" button. Using that "Edit schema as JSON", button you can directly edit the JSON and change the data type from decimal to decimal(10,2).

Hadley answered 10/1, 2023 at 9:49 Comment(1)
is there any way to do the same thing using aws-cli or boto3?Sherer

© 2022 - 2024 — McMap. All rights reserved.