The Redshift leader node is the same size and class of compute as the compute nodes. Typically this means that the leader is over provisioned for the role it plays but since its role is so important and impactful if things slows down, it is good that it is over provisioned. The leader needs to compile and optimized the queries and perform final steps in queries (final sort for example). It communicates with the session clients and handles all their requests. If the leader becomes overloaded all these activities slow down creating significant performance issues. It is not good that your leader is hitting 100% CPU often enough for you to notice. I bet the seems sluggish when this happens.
There are a number of ways I've seen "leader abuse" and it usually becomes a problem when bad patterns are copied between users. In no particular order:
- Large data literals in queries (INSERT ... VALUES ...). This puts your data through the query compiler on the leader node. This is not what it is design to do and is very expensive for the leader. Use the COPY command to bring data into the cluster. (Just bad, don't do this)
- Over use of COMMIT. A commits cause an update to the coherent state of the database and needs to run through the "commit queue" and creates work for the leader and the compute nodes. Having COMMITs every other statement can cause this queue to back up and work to generally back up.
- Too many slots defined in the WLM. Redshift can typically only efficiently run between 1 and 2 dozen queries at once. Setting the total slot count very high (like 50) can lead to very inefficient operation and high CPU loads. Depending on workload this can show up for compute or occasionally the lead node.
- Large data output through SELECT statements. SELECTs return data but when this data is many GBs in size the management of this data movement (and sorting) is done by the leader node. If large amounts of data need to be extracted from Redshift it should be done with an UNLOAD statement.
- Overuse of large cursors. Cursors can be an important tool and needed for many BI tools but cursors are located on the leader and overuse can lead to reduced leader attention on other tasks.
- Many / large UNLOADs with parallel off. UNLOADs generally come from the compute nodes straight to S3 but with "parallel off" all the data is routed to the leader node where it is combined (sorted) and sent to S3.
While none of the above of problems in and of themselves, it is when these are overused, used in ways they are not intended, or all at once that the leader starts to be impacted. It also comes down to what you intend to do with your cluster - if it support BI tools then you may have a lot of cursors but this load on the leader is part of the cluster's intent. Issue often arise when the cluster's intent is to all things to everybody.
If your workload for Redshift is leader function heavy and you are efficiently using the leader node (no large literals, using COPY and UNLOAD, etc.) then high leader workload is what you want. You're getting the most out of the critical resource. However, most use Redshift to perform analytics on large data which is the function of the compute nodes. A highly loaded leader can detract significantly from this mission and needs to be addressed.
Another way that leader can get stressed is when clusters are configured with many smaller node types instead of fewer bigger nodes. Since the leader is the same size as the compute nodes many smaller nodes means you have a small leader doing the work. Something to consider but I'd make sure you don't have unneeded leader node stressers before investing in a resize.