Amazon's S3 Select allows a user to write SQL against S3 objects, but there's insufficient documentation around what standard SQL functionality is supported.
According to the documentation, Amazon S3 Select supports AVG
, COUNT
, MAX
, MIN
, and SUM
. But when I run any aggregate query I get an error like
botocore.exceptions.ClientError: An error occurred (UnsupportedSqlOperation) when calling the SelectObjectContent operation: Unsupported SQL operation GROUP BY. Please check the service documentation for supported operations.
Here's some code to reproduce:
import boto3
client = boto3.client('s3')
response = client.select_object_content(Bucket='my-bucket', Key='object.csv',
ExpressionType='SQL', Expression="select ID, count(*) from s3object group by s.ID ",
InputSerialization = {'CSV':{"FileHeaderInfo": "Use"}, 'CompressionType': 'GZIP'},
OutputSerialization = {'CSV':{}})
event_stream = response['Payload']
with open('output', 'wb') as f:
for event in event_stream:
if 'Records' in event:
data = event['Records']['Payload']
f.write(data)
GROUP BY
. This would aggregate all the results into just one line. (I haven't tried it.) – Embryologist