I have a csv file in S3, which does not have any quotes. eg.
dVsdfsCcn7j6,r:werwerwerwerwerwerwerwer,_User$SSSSSBFwJ,login,password,false,2011-10-27 10:46:55,d24c2465e-9945645c5-4645509-a745741c7-ba9bcd1a7cfd,2046-11-27 15:46:55,2016-10-27 10:46:55
When I give this file as an input to the glue script / job (in which I intend to remove the _User prefix), the ETL output creates a csv file which has quotes attached to some attributes, eg:
dVsdfsCcn7j6,"r:werwerwerwerwerwerwerwer","_User$SSSSSBFwJ",login,password,false,"2011-10-27 10:46:55","d24c2465e-9945645c5-4645509-a745741c7-ba9bcd1a7cfd","2046-11-27 15:46:55","2016-10-27 10:46:55"
I have set the quoteChar to -1 as per the documentation, https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html
datasink2 = glueContext.write_dynamic_frame.from_options(frame = applymapping1, connection_type = "s3", connection_options = {"path": "s3://test-bucket/staging/final_data"}, format = "csv", format_options = {"quoteChar":'-1'}, transformation_ctx = "datasink2")
My end goal is to make this csv data queryable using Athena.
Has anyone faced such an issue during transformation? Is it something related to the SerDe information of the data catalog? Any help will be appreciated!
Thanks.
Link to question posted on AWS forums - https://forums.aws.amazon.com/thread.jspa?threadID=278073&tstart=0
format_options = { "quoteChar": -1 }
with no ' around the -1 – Nympho