I'll give you my two cents about this because I've been wondering the same thing.
Glue
As per AWS documentation, AWS Glue is "Simple, scalable, and serverless data integration". Glue can be used for a variety of things: as a metadata repository, automatic schema discovery, code generation, and run ETL pipelines to prepare data. Glue takes care of providing and managing the computation resources needed to run your data pipelines. Glue is a serverless service, so you don't need to create and manage the infrastructure, because Glue does it for you.
If we focus only on the processing feature and discard the Glue-specific features (schema discovery, code generation, etc) then EMR Serverless and Glue services look almost identical. One of the key advantages of both services is the ability to run Spark or Hive serverless applications.
What advantage will EMR Serverless have over Glue Spark jobs?
To run Glue, you must either specify MaxCapacity
(for Glue version 1.0 or earlier jobs) or Worker type
and the Number of workers
(for Glue version 2.0 jobs). Both options assume, first, that there is some understanding of the data and workload per cluster, and second, that the workload during job execution will be uniform, i.e., there will be no over- or under- utilization of the provisioned resources.
EMR Serverless
EMR Serverless is a new deployment option for AWS EMR. With EMR Serverless, you don't need to configure, optimize, protect, or manage clusters to run applications on these platforms. EMR Serverless helps you avoid over- or under-allocation of resources to process jobs at the individual stage level.
EMR Serverless automatically identifies the resources needed by jobs, provisions those resources to run the jobs, and releases them when the jobs are completed. In cases where applications require a response within seconds, such as interactive data analysis, the engineer can pre-initialize the necessary resources during application creation. This provides easy initialization, fast job startup, automatic capacity management, and simple cost control.
More info: https://luminousmen.com/post/emr-serverless-a-400level-guide