I am trying to run batch transform inference job using parquet data file but could not find anything. Everywhere it says batch transform accepts format type as text/csv or json only. For a test purpose, I did try using a lambda function inside AWS account to invoke the parque data but the batch transform job never succeeded. Having ClientError: 400, Error parsing data.
request = \
{
"TransformJobName": batch_job_name,
"ModelName": model_name,
"BatchStrategy": "MultiRecord",
"TransformOutput": {
"S3OutputPath": batch_output
},
"TransformInput": {
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": batch_input
}
},
"ContentType": "application/x-parquet",
"SplitType": "Line",
"CompressionType": "None"
},
"TransformResources": {
"InstanceType": "ml.m4.xlarge",
"InstanceCount": 1
}
}
client.create_transform_job(**request)
return "Done"
Currently I am trying to run the sagemaker batch transform job locally using a parque data file. I have the docker image which I can run to 'serve' in my local terminal and I can call the data using REST API service Postman from "localhost:8080/invocations" using "Binary" input function to upload the parque data file. It's working fine and I can see the data populating in postman body. However, I am not able to use parque data for batch transform.
Has anyone successfully used parquet file to convert and make prediction using sagemaker batch transform?