How S3 select pricing works? What is data returned and scanned in s3 select means
Asked Answered
U

1

12

I have a 1M rows of CSV data. select 10 rows, Will I be billed for 10 rows. What is data returned and data scanned means in S3 Select?

There is less documentation on these terms of S3 select

Undersell answered 26/10, 2018 at 4:29 Comment(1)
Tagging this as prestodb, since it applies to Presto itself as well (github.com/prestodb/presto/pull/11033).Nomadize
F
6

To keep things simple lets forget for some time that S3 reads in a columnar way. Suppose you have the following data:

| City       | Last Updated Date   |
|------------|---------------------|
| London     | 1st Jan             |
| London     | 2nd Jan             |
| New Delhi  | 2nd Jan             |

A query for fetching the latest update date

  • forces S3 to scan all 3 records
  • but the returned records are only 2 (when the last updated date is 2nd Jan)

A query of select city where last updated date is 1st Jan,

  • will scan all 3 rows
  • but return only 1 string - "New Delhi".

Hence based on your query, it might scan more data (3 rows) but return less data (2 rows).

I hope you understand the difference between Data Scanned and Data Returned now.

Forecast answered 25/1, 2019 at 7:3 Comment(2)
For what kind of situations would you query your S3 data? Only with Athana? Or are there other situations?Ionia
@Ionia this is for the AWS service called S3 Select docs.aws.amazon.com/AmazonS3/latest/userguide/… This allows you to filter data in buckets using SQLShrunk

© 2022 - 2024 — McMap. All rights reserved.