What is difference between AWS S3 Select and AWS Athena?
Asked Answered
S

7

68

I am trying to understand what is difference between AWS Athena service and the newly released S3 select (still in preview).

How are use cases different for both of those? It seems both help in selecting partial data from S3.

Scherzo answered 5/3, 2018 at 2:16 Comment(1)
Note that Athena requires to define your data schema before you can issue queries. S3 SELECT queries are ad hoc.Pudgy
B
44

Also looks like we are missing one major thing:

S3 Select operates on only one object while Athena to run queries across multiple paths, which will include all files within that path.

Beauvoir answered 26/8, 2020 at 4:22 Comment(1)
I think that's the fundamental difference that separates them most of all. With the Athena you can perform a bucket-wide search, while S3 Select would require you to know a specific object to query from. I am thinking that S3 Select might find it's application in some serverless apps (while it's no place for Athena there for sure), but it would highly depend on the performance of such query.Catharine
P
17

You can think about AWS S3 Select as a cost-efficient storage optimization that allows retrieving data that matches the predicate in S3 and glacier aka push down filtering.

AWS Athena is fully managed analytical service that allows running arbitrary ANSI SQL compliant queries - group by, having, window and geo functions, SQL DDL and DML.

Ptolemy answered 22/4, 2018 at 19:33 Comment(0)
M
11

Amazon Athena : Amazon Athena is a query service that makes it easy to analyze data stored in S3 using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, pay only for the queries. It scales automatically – executing queries in parallel, this makes it to produce faster results, even with large datasets and complex queries.

use cases : Athena can be used to process logs, perform ad-hoc analysis, and run interactive queries and joins. it run queries across multiple paths which include all the files under that path.

S3 Select : S3 Select is an S3 feature designed It works by retrieving a subset of an object’s data (using simple SQL expressions) instead of the entire object, which can be up to 5 terabytes in size. s3 select runs queries on a single object at a time in the s3 bucket.

Conclusion : Athena can used for complex queries on the files, and span multiple folders under S3 bucket.
S3 Select can used for simple queries based in a single object.

Mini answered 5/4, 2021 at 16:38 Comment(0)
M
7

Athena is (from the little I've used it) more intended as a business reporting or analysis tool backed by S3.

S3 select appears to use the same sort of technology, but I would guess it's aimed more at direct use by applications to filter or shard their data sets.

Maragretmarala answered 5/3, 2018 at 9:34 Comment(0)
C
5

S3 Select makes it easy to retrieve specific data from the contents of an object using simple SQL expressions. There is no need to retrieve the entire object. This can be used with Lambda to build serverless apps and can tied up with Big Data frameworks like Apache Spark and Presto. Can improve the performance up to 400%.

Amazon Athena is an interactive query service. It is serverless. No need to load data into Athena. Built on Presto and runs standard SQL. Mainly used to analyze Big Data.

Cantus answered 1/6, 2018 at 3:41 Comment(0)
F
3

To give an overview as per my understanding :

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

The Major Advantage of this as of now is :

Athena is out-of-the-box integrated with AWS Glue Data Catalog, you can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve performance.

Now as far the S3 Select Goes :

  • At present, there is no charge for using S3 Select while it is in preview, and there is no definition of pricing. However, you will need to apply at the reference

  • While in preview S3 Select supports CSV, JSON, and Parquet files with or without GZIP compression. During the preview objects that are encrypted at rest are not supported.

  • Because S3 Select is still in preview, AWS doesn't have internal cases to verify how the service is being used. However, I could find a reference from a blog that might interest you.

In my opinion, you can view this Twitch Video that can help you lot.

Frazzle answered 5/3, 2018 at 11:24 Comment(1)
Would you know of any link showing the capabilities of each product side-by-side in some detail. Amazon is using non-meaningful names for the services (as well as many other companies)....Thanks.Qianaqibla
K
3

In addition to @abc123's answer, AWS S3 Select only supports SELECT.

Reference: SELECT Command

Amazon S3 Select supports only the SELECT SQL command. The following ANSI standard clauses are supported for SELECT:

  • SELECT list
  • FROM clause
  • WHERE clause
  • LIMIT clause

Note: Amazon S3 Select queries currently do not support subqueries or joins.

Konrad answered 26/11, 2020 at 11:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.