How do I use awswrangler to read only the first few N rows of a parquet file stored in S3?

I am trying to use awswrangler to read into a pandas dataframe an arbitrarily-large parquet file stored in S3, but limiting my query to the first N rows due to the file's size (and my poor bandwidth).

I cannot see how to do it, or whether it is even possible without relocating.

Could I use chunked=INTEGER and abort after reading the first chunk, say, and if so how?

I have come across this incomplete solution (last N rows ;) ) using pyarrow - Read last N rows of S3 parquet table - but a time-based filter would not be ideal for me and the accepted solution doesn't even get to the end of the story (helpful as it is).

Or is there another way without first downloading the file (which I could probably have done by now)?

Thanks!

import awswrangler as wr df = wr.s3.select_query( sql="SELECT * FROM s3object s limit 5", path="s3://amazon-reviews-pds/parquet/product_category=Gift_Card/part-00000-495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet", input_serialization="Parquet", input_serialization_params={}, use_threads=True, )

Recommended topics

Hot tags