SKIP/OFFSET/ScanRange of rows in AWS S3 Select
Asked Answered
H

0

12

I wonder if anyone figured out how to skip rows in S3 Select?

SELECT S.* FROM s3object S SKIP 100 LIMIT 200
--or
SELECT * from s3object s LIMIT 5, 10
--or
SELECT * from s3object s limit 5 OFFSET 10

It looks like you can limit number of records returned

s3 = boto3.client('s3')
bucket = bucket
file_name = file

sql_stmt = """SELECT S.* FROM s3object S LIMIT 10"""


req = s3.select_object_content(
    Bucket=bucket,
    Key=file,
    ExpressionType='SQL',
    Expression=sql_stmt,
    InputSerialization = {'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization = {'CSV': {}},
)

There was also a request made to add OFFSET/SKIP to s3api, but it was closed.

Also you can specify ScanRange in bytes, but what happens if object is compressed?

Is it range in bytes of compressed object or uncompressed?

If uncompressed how does S3 Select handle partial records?

Update: You cannot use ScanRange on gzip file:

botocore.exceptions.ClientError: An error occurred (UnsupportedScanRangeInput) when calling the SelectObjectContent operation: Scan range queries are not supported on objects with type GZIP.

Humdinger answered 18/3, 2020 at 15:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.