I wonder if anyone figured out how to skip rows in S3 Select?
SELECT S.* FROM s3object S SKIP 100 LIMIT 200
--or
SELECT * from s3object s LIMIT 5, 10
--or
SELECT * from s3object s limit 5 OFFSET 10
It looks like you can limit number of records returned
s3 = boto3.client('s3')
bucket = bucket
file_name = file
sql_stmt = """SELECT S.* FROM s3object S LIMIT 10"""
req = s3.select_object_content(
Bucket=bucket,
Key=file,
ExpressionType='SQL',
Expression=sql_stmt,
InputSerialization = {'CSV': {'FileHeaderInfo': 'USE'}},
OutputSerialization = {'CSV': {}},
)
There was also a request made to add OFFSET/SKIP to s3api, but it was closed.
Also you can specify ScanRange in bytes, but what happens if object is compressed?
Is it range in bytes of compressed object or uncompressed?
If uncompressed how does S3 Select handle partial records?
Update: You cannot use ScanRange on gzip
file:
botocore.exceptions.ClientError: An error occurred (UnsupportedScanRangeInput) when calling the SelectObjectContent operation: Scan range queries are not supported on objects with type GZIP.