The following query returns global memory usage for all the running pods in K8S:
sum(container_memory_usage_bytes{container!=""})
This query uses sum() aggregate function for summing memory usage across all the containers, which run in K8S.
The container!=""
filter is needed for filtering out redundant metrics related to cgroups
hierarchy. See this answer for details.
The following query returns global memory usage for k8s cluster in percentage:
100 * (
sum(container_memory_usage_bytes{container!=""})
/
sum(kube_node_status_capacity{resource="memory"})
)
Note that some nodes in K8S can have much higher memory usage in percentage than the other nodes because of scheduling policies. The following query allows determining top 3 nodes with the maximum memory usage in percentage:
topk(3,
100 * (
sum(container_memory_usage_bytes{container!=""}) by (node)
/ on(node)
kube_node_status_capacity{resource="memory"}
)
)
This query uses topk function for limiting the number of returned time series to 3. Note that the query may return more than 3 time series on a graph in Grafana, since topk
returns up to k
unique time series per each point on the graph. If you need a graph with no more than k
time series with the maximum values, then take a look at topk_*
functions at MetricsQL such as topk_max, topk_avg or topk_last.
The query also uses on()
modifier for /
operation. This modifier limits the set of labels, which is used for finding time series pairs on the left and the right side of /
with identical label values. Then Prometheus applies the /
operation individually per each such pair. See these docs for details.
The following query returns the number of CPU cores used by all the pods in Kubernetes:
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m]))
The following query returns global CPU usage for k8s cluster in percentage:
100 * (
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m]))
/
sum(kube_node_status_capacity{resource="cpu"})
)
Some nodes may be loaded much more than the rest of nodes in Kubernetes cluster. The following query returns top3 nodes with the highest CPU load:
topk(3,
100 * (
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (node)
/ on(node)
kube_node_status_capacity{resource="cpu"})
)
sum(avg(kube_node_status_allocatable_memory_bytes) by (node))
could be better to get whole memory resource that k8s use. – Brookite