Time series cardinality is the number of unique time series actually stored in the database. That's it!
Let's start with basics. A time series contains a series of (timestamp, value)
pairs ordered by timestamp
. Each time series has a name (the name is constructed from measurement + field name in InfluxDB line protocol). Additionally, time series can have a set of key=value
tags (they are named labels in some systems such as Prometheus). Every field in the InfluxDB line protocol share the same set of tags defined in the same line. A time series is uniquely identified by its name plus a set of tags. For example, temperature{city="Paris",country="France"}
and temperature{city="Marseille",country="France"}
are different time series, since they contain different values for the tag city
.
Let's calculate the maximum possible cardinality for time series with temperature
name given the following restrictions:
- The number of cities in the world is 10000
- The number of countries in the world is 250
Then the maximum possible cardinality would be 10000*250=2.5 millions
. But this is incorrect calculations, since each city belongs exactly to a single country. So the maximum possible cardinality is limited by the number of cities, e.g. 10000. In practice the cardinality is usually lower, since it is limited by actual cities stored in the database.
There are two types of time series cardinality:
- The number of active time series, e.g. time series with recently ingested samples.
- The total number of time series stored in the database.
Some time series databases may consume memory proportional to the total numer of time series (for example, InfluxDB). Others may consume memory only for active time series (for example, VictoriaMetrics). There are also databases, which consume zero additional memory for each new time series (for example, TimescaleDB or ClickHouse). All these databases have various tradeoffs, performance characteristics and resource usage (cpu, disk, ram). So it is recommended evaluating them for a particular use case before selecting the best one for the given workload.