Am I fully utilizing my EMR cluster?
Asked Answered
S

1

7
  • Total Instances: I have created an EMR with 11 nodes total (1 master instance, 10 core instances).
  • job submission: spark-submit myApplication.py

enter image description here

  • graph of containers: Next, I've got these graphs, which refer to "containers" and I'm not entirely what containers are in the context of EMR, so this isn't obvious what its telling me:

enter image description here

  • actual running executors: and then I've got this in my spark history UI, which shows that I only have 4 executors ever got created.
  • Dynamic Allocation: Then I've got spark.dynamicAllocation.enabled=True and I can see that in my environment details.
  • Executor memory: Also, the default executor memory is at 5120M.

  • Executors: Next, I've got my executors tab, showing that I've got what looks like 3 active and 1 dead executor: enter image description here

So, at face value, it appears to me that I'm not using all my nodes or available memory.

  1. how do I know if i'm using all the resources I have available?
  2. if I'm not using all available resources to their full potential, how do I change what I'm doing so that the available resources are being used to their full potential?
Subcartilaginous answered 22/1, 2017 at 1:8 Comment(2)
Correctly sizing a cluster is a trade-off between providing services to your users and the cost of the services. Why did you configure it as you did (10 nodes, chosen instance type)? Are your users complaining about it being too slow at times? If you were to down-size count type or count, would your users be negatively impacted? Have you tried Spark's standard monitoring tools (Accessing the Spark Web UIs)?Back
yes, the executors table screenshot is from the Spark Web UI on EMR, and the other screenshots are from the EMR monitoring pane. Also, this question is purely regarding the utilization of the nodes within the cluster. Over the last hour, i've been going through what it means to enable maximizeResourceAllocation and the 4 settings that provide defaults for were completely untouched by me, so the answer to my question is "No". Also, it seems clear now that if I don't manually set those settings and do not enable maximizeResourceAllocation, then my cluster is being used like a 2 node cluster.Subcartilaginous
T
2

Another way to go to see how many resources are being used by each of the nodes of the cluster is to use the web tool of Ganglia.

This is published on the master node and will show a graph of each node's resource usage. The issue will be if you have not enable Ganglia at the time of cluster creation as one of the tools available on the EMR cluster.

Once enable however you can go to the web page and see how much each node is being utilized.

Tussock answered 15/2, 2017 at 13:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.