count k8s cluster cpu/memory usage with prometheus

Asked 25/2, 2019 at 12:58 Answered 28/4, 2022 at 11:5

I want to count k8s cluster cpu/memory usage (not k8s pod usage) with prometheus, so that i can show in grafana.

I use sum (container_memory_usage_bytes{id="/"}) to get k8s cluster used memory, and topk(1, sum(kube_node_status_capacity_memory_bytes) by (instance)) to get whole k8s cluster memory, but they can not divide since topk function does not return value but vector.

How can i do this?

Brookite answered 25/2, 2019 at 12:58 Comment(0)

My main qustion is that topk(1, sum(kube_node_status_capacity_memory_bytes) by (instance)) can not return a value, but now i find that use sum() to covert it can work, whole query as following:

sum(sum (container_memory_usage_bytes{id="/"})by (instance))/sum(topk(1, sum(kube_node_status_capacity_memory_bytes) by (instance)))*100

Brookite answered 26/2, 2019 at 2:28 Comment(1)

use sum(avg(kube_node_status_allocatable_memory_bytes) by (node)) could be better to get whole memory resource that k8s use. – Brookite 5/3, 2019 at 1:46

I have installed Prometheus on google Cloud through the gcloud default applications. The dashboards automatically got deployed with the installation. The following queries are what was used for memory and CPU usage of the cluster:

CPU usage by namespace:

sum(irate(container_cpu_usage_seconds_total[1m])) by (namespace)

Memory usage (no cache) by namespace:

sum(container_memory_rss) by (namespace)

CPU request commitment:

sum(kube_pod_container_resource_requests_cpu_cores) / sum(node:node_num_cpu:sum)

Memory request commitment:

sum(kube_pod_container_resource_requests_memory_bytes) / sum(node_memory_MemTotal)

Partida answered 25/2, 2019 at 16:17 Comment(1)

I can not use node_memory_MemTotal since 1) I have some nodes, some in k8s cluster, some not, so i can not use 2) it still not k8s cluster use, it mean host level total – Brookite 26/2, 2019 at 2:2

My main qustion is that topk(1, sum(kube_node_status_capacity_memory_bytes) by (instance)) can not return a value, but now i find that use sum() to covert it can work, whole query as following:

sum(sum (container_memory_usage_bytes{id="/"})by (instance))/sum(topk(1, sum(kube_node_status_capacity_memory_bytes) by (instance)))*100

Brookite answered 26/2, 2019 at 2:28 Comment(1)

use sum(avg(kube_node_status_allocatable_memory_bytes) by (node)) could be better to get whole memory resource that k8s use. – Brookite 5/3, 2019 at 1:46

The following query returns global memory usage for all the running pods in K8S:

sum(container_memory_usage_bytes{container!=""})

This query uses sum() aggregate function for summing memory usage across all the containers, which run in K8S.

The container!="" filter is needed for filtering out redundant metrics related to cgroups hierarchy. See this answer for details.

The following query returns global memory usage for k8s cluster in percentage:

100 * (
  sum(container_memory_usage_bytes{container!=""})
    /
  sum(kube_node_status_capacity{resource="memory"})
)

Note that some nodes in K8S can have much higher memory usage in percentage than the other nodes because of scheduling policies. The following query allows determining top 3 nodes with the maximum memory usage in percentage:

topk(3,
  100 * (
    sum(container_memory_usage_bytes{container!=""}) by (node)
      / on(node)
    kube_node_status_capacity{resource="memory"}
  )
)

This query uses topk function for limiting the number of returned time series to 3. Note that the query may return more than 3 time series on a graph in Grafana, since topk returns up to k unique time series per each point on the graph. If you need a graph with no more than k time series with the maximum values, then take a look at topk_* functions at MetricsQL such as topk_max, topk_avg or topk_last.

The query also uses on() modifier for / operation. This modifier limits the set of labels, which is used for finding time series pairs on the left and the right side of / with identical label values. Then Prometheus applies the / operation individually per each such pair. See these docs for details.

The following query returns the number of CPU cores used by all the pods in Kubernetes:

sum(rate(container_cpu_usage_seconds_total{container!=""}[5m]))

The following query returns global CPU usage for k8s cluster in percentage:

100 * (
  sum(rate(container_cpu_usage_seconds_total{container!=""}[5m]))
    /
  sum(kube_node_status_capacity{resource="cpu"})
)

Some nodes may be loaded much more than the rest of nodes in Kubernetes cluster. The following query returns top3 nodes with the highest CPU load:

topk(3,
  100 * (
    sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (node)
      / on(node)
    kube_node_status_capacity{resource="cpu"})
)

Joub answered 28/4, 2022 at 11:5 Comment(0)

Recommended topics

Hot tags