What is the definition of cardinality in SQL
Asked Answered
E

4

33

My school book Database Systems defines cardinality as follows:

The cardinality of a relation is the number of tuples it contains. By contrast, the number of tuples is called the cardinality of the relation and this changes as tuples are added or deleted. High-cardinality - many tuples, low-cardinality - few tuples.

While the Wikipedia article on Cardinality (SQL statements), defines it as follows:

Cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table. The lower the cardinality, the more duplicated elements in a column. There are 3 types of cardinality: high-cardinality, normal-cardinality, and low-cardinality.

They might both be right, but i can't connect the two definitions as related definitions. A rephrase would be appriciated!

Elston answered 28/8, 2014 at 11:47 Comment(2)
Cardinality is the number of unique values/number of records.I think that book simplifies the issue.Maximum cardinality is 1.Coridon
Does this answer your question? What is cardinality in Databases?Aq
A
41

They are speaking the same thing and it has to do with tuples (relational algebra) or rows (layman's term).

When it says high-cardinality are possible values of particular attribute (or field) that are unique and therefore the number of rows or tuples are higher:

Example:

 StudentID   Lastname Firstname  Gender
 101         Smith    John       M
 102         Jones    James      M
 103         Mayo     Ann        F
 104         Jones    George     M
 105         Smith    Suse       F

As far as as StudentID the cardinality is high because it is unique. In this it has five (5) tuples/rows.

On the other hand Lastname has normal cardinality, in particular there are only three (3) unique tuples/rows. Thus it has normal cardinality.

And finally Gender has only two possible unique tuples thus Low Cardinality.

You probably confuse Cardinality here with Degree of a relation which has something to do of the number of attributes/fields in a relation (or table).

On the other hand the textbook for Database when speaking of Cardinality normally has to do with an entity in relation to another entity, that is, the number of possible relations occurences for an entity participating in a given relationship type. Thus for example for a binary relationship cardinality could be either one-to-one, one-to-many or many-to-many.

Anabatic answered 28/8, 2014 at 12:18 Comment(2)
Regarding the Lastname column, don't you mean there are only (3) distinct rows? learnsql.com/blog/unique-vs-distinctKanishakanji
@ShawnEary Yes, three (3) distinct rows.Anabatic
M
5

Both definitions are trying to say that cardinality is the "number of rows". The difference is whether the comparison is "in the table" or "in a particular column".

The version in your database text book focuses on relational algebra and the structure of tables ("relations" in that lingo).

The Wikipedia entry is more practical. It encompasses the textbook definition, assuming the table has a primary key (the cardinality of the primary key is the same as the table). However, it can be applied to, say, a flag column as well. If the flag only takes on two values (0 versus 1), then we can say that the cardinality of the column is 2.

This is important for optimizing queries. Cardinality is one component of choosing the best methods for joining, aggregating, and selecting data. In practice, most databases use more information than the cardinality, so-called "statistics" about columns and their values for optimization.

Madelinemadella answered 28/8, 2014 at 11:51 Comment(0)
S
2

There are two concepts

  1. Index cardinality
  2. Cardinality

This I believe is referring to index cardinality, which is VERY different https://www.ibm.com/developerworks/data/library/techarticle/dm-1309cardinal/

  • Index cardinality is considered the number of unique values in the index
  • The term is used to discuss creating indexes, table scans, index access vs table access, how it affects inserts, updates, deletes, storage space

Here is another example, https://en.wikipedia.org/wiki/Cardinality_(SQL_statements)

  • In SQL (Structured Query Language), the term cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table.
  • The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row. SQL databases use cardinality to help determine the optimal query plan for a given query.

Just the word cardinality, I believe focuses on relationships between tables In particular, it is not a term used to discuss a single table or uniqueness of data

IBM documentation (if you search for the word unique, it is not mentioned) https://www.ibm.com/support/knowledgecenter/en/SSEP7J_10.2.2/com.ibm.swg.ba.cognos.ug_cog_rlp.10.2.2.doc/c_cog_rlp_rel_cardinality.html When you interpret cardinality, you must consider the notation that displays at both ends of the relationship. Possible end labels are shown in the following list:

  • 0..1 (zero or one match)
  • 1..1 (exactly one match)
  • 0..n (zero or more matches)
  • 1..n (one or more matches)

In mathematics, the cardinality of a set is a measure of the "number of elements of the set". (no mention of unique btw) https://en.wikipedia.org/wiki/Cardinality

In database design, the cardinality or fundamental principle of one data aspect with respect to another is a critical feature. The relationship of one to the other must be precise and exact between each other in order to explain how each aspect links together. In the relational model, tables can be related as any of "one-to-many", "many-to-many" "one-to-zero-or-one", etc.. This is said to be the cardinality of a given table in relation to another. https://en.wikipedia.org/wiki/Cardinality_(data_modeling)

Stringhalt answered 24/6, 2019 at 14:39 Comment(0)
B
0

If we have tables A and B, think of cardinality as the number of rows of table B that would be related to a row from table A. If the tables are PERSON and VEHICLE, and the relation is RODE_ON then the cardinality is high because most persons rode on lots of different vehicles in the past, and most vehicles drove many persons. If the relation is OWNS then the cardinality is low - most persons own one vehicle, some own none, and a vehicle usually has one or two owners, not more.

Note that cardinality from one side of a relation is not equal to cardinality from the other side. If the tables are PERSON and FINGER, and the relation is BELONG TO, then a person has many fingers, but each finger belongs to only one person.

Bumper answered 28/8, 2014 at 11:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.