I am trying to understand what is difference between AWS Athena service and the newly released S3 select (still in preview).
How are use cases different for both of those? It seems both help in selecting partial data from S3.
I am trying to understand what is difference between AWS Athena service and the newly released S3 select (still in preview).
How are use cases different for both of those? It seems both help in selecting partial data from S3.
Also looks like we are missing one major thing:
S3 Select operates on only one object while Athena to run queries across multiple paths, which will include all files within that path.
You can think about AWS S3 Select as a cost-efficient storage optimization that allows retrieving data that matches the predicate in S3 and glacier aka push down filtering.
AWS Athena is fully managed analytical service that allows running arbitrary ANSI SQL compliant queries - group by, having, window and geo functions, SQL DDL and DML.
Amazon Athena : Amazon Athena is a query service that makes it easy to analyze data stored in S3 using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, pay only for the queries. It scales automatically – executing queries in parallel, this makes it to produce faster results, even with large datasets and complex queries.
use cases : Athena can be used to process logs, perform ad-hoc analysis, and run interactive queries and joins. it run queries across multiple paths which include all the files under that path.
S3 Select : S3 Select is an S3 feature designed It works by retrieving a subset of an object’s data (using simple SQL expressions) instead of the entire object, which can be up to 5 terabytes in size. s3 select runs queries on a single object at a time in the s3 bucket.
Conclusion :
Athena can used for complex queries on the files, and span multiple folders under S3 bucket.
S3 Select can used for simple queries based in a single object.
Athena is (from the little I've used it) more intended as a business reporting or analysis tool backed by S3.
S3 select appears to use the same sort of technology, but I would guess it's aimed more at direct use by applications to filter or shard their data sets.
S3 Select makes it easy to retrieve specific data from the contents of an object using simple SQL expressions. There is no need to retrieve the entire object. This can be used with Lambda to build serverless apps and can tied up with Big Data frameworks like Apache Spark and Presto. Can improve the performance up to 400%.
Amazon Athena is an interactive query service. It is serverless. No need to load data into Athena. Built on Presto and runs standard SQL. Mainly used to analyze Big Data.
To give an overview as per my understanding :
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
The Major Advantage of this as of now is :
Athena is out-of-the-box integrated with AWS Glue Data Catalog, you can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve performance.
Now as far the S3 Select Goes :
At present, there is no charge for using S3 Select while it is in preview, and there is no definition of pricing. However, you will need to apply at the reference
While in preview S3 Select supports CSV, JSON, and Parquet files with or without GZIP compression. During the preview objects that are encrypted at rest are not supported.
Because S3 Select is still in preview, AWS doesn't have internal cases to verify how the service is being used. However, I could find a reference from a blog that might interest you.
In my opinion, you can view this Twitch Video that can help you lot.
In addition to @abc123's answer, AWS S3 Select only supports SELECT
.
Reference: SELECT Command
Amazon S3 Select supports only the SELECT SQL command. The following ANSI standard clauses are supported for SELECT:
- SELECT list
- FROM clause
- WHERE clause
- LIMIT clause
Note: Amazon S3 Select queries currently do not support subqueries or joins.
© 2022 - 2024 — McMap. All rights reserved.