Pipeline from AWS RDS to S3 using Glue

You can't pull data from AWS RDS to S3 using Athena. Athena is a query engine over S3 data. To be able to extract data from RDS to S3, you can run a Glue job to read from a particular RDS table and create S3 dump in parquet format which will create another external table pointing to S3 data. Then you can query that S3 data using Athena. A sample code snippet to read from RDS using Glue catalog and write parquet in S3 will look like below. There are some Glue predefined template which you can use to experiment. Start with a small table first. Please let me know if it worked out for you or any further questions/issues.

datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="postgresql", connection_options = 
{"url": "jdbc-url/database",
"user": "user_name",
"password": "password",
"dbtable": "table_name"},
transformation_ctx = "datasource0")

   datasink4 = glueContext.write_dynamic_frame.from_options(frame = datasource0, connection_type = "s3", connection_options = {"path": "s3://aws-glue-tpcds-parquet/"+ tableName + "/"}, format = "parquet", transformation_ctx = "datasink4")

Recommended topics

Hot tags