Glue dynamic frame is not populating from s3 bucket
Asked Answered
U

1

0

I have a glue job that is not working because the dynamic frame is not populating from a parquet in s3.

I have pointed it directly to an object that has data in it, but the dynamic frame is still blank.

Example below

input_dyf = glueContext.create_dynamic_frame.from_options("s3", {
        "paths": ['s3://dev/.test/load_year=2023/load_month=2/load_day=22/.test.parquet'],
        "recurse": False,
        "groupFiles": "inPartition",
    },
    format = "parquet",
    transformation_ctx = "DataSource0"
)

I have similar glue jobs with all the same configurations (and bookmarks off), and this is the only one failing.

Unesco answered 22/3, 2023 at 16:52 Comment(0)
H
2

I've tested this on my end with a similar filename and path name. What I found was that the filename can't include a period (.) in it. The S3 path is fine to have a period in it, but the parquet file itself cannot. Working example:

input_dyf = glueContext.create_dynamic_frame.from_options("s3", {
        "paths": ['s3://dev/.test/load_year=2023/load_month=2/load_day=22/test.parquet'],
        "recurse": False,
        "groupFiles": "inPartition",
    },
    format = "parquet",
    transformation_ctx = "DataSource0"
)

Removing the . from test.parquet seemed to solve this issue. Please test on your end and let me know.

Hunger answered 22/3, 2023 at 16:56 Comment(1)
This seem to have worked, thank you!Unesco

© 2022 - 2024 — McMap. All rights reserved.