Does an empty SQL table have a superkey? Does every SQL table have one?
Asked Answered
H

3

0

I know what a superkey means in SQL but I have the following questions:

  1. Does an empty SQL table always have a superkey?
  2. Does every SQL table always have a superkey?
Holston answered 3/9, 2017 at 15:20 Comment(4)
A key is not a property of a record, it's a property of the table.Faveolate
SQL doesn't demand keys, allowing tables with duplicate records to be created.Aube
Please edit a definition of "superkey" you have tried to use into your question. (For SQL and/or the relational model.) Maybe you don't really want to know about superkeys, but other things; and it can help us show you how to read/remember/write/use definitions.Velvety
Superkey is an abstract concept. SQL is just an implementation. You are mixing layers of abstraction here, IMO.Stumper
V
4

TL;DR "Superkey" is a RM (Relational Model of Data) term. There's no standard use in SQL. The superkeys of an SQL table might reasonably informally be said to be the column sets that you could declare primary key or unique not null, plus maybe {} when a table holds at most one row (although you can't declare it). "Reasonably informally" because SQL tables are not RM relations. But if a table holds no duplicate rows and no nulls then we can reasonably say that it is a relation, and that, like every relation, it has one or more superkeys. The definition of superkey of a base relation or relation expression takes into account all possible values it can hold, so its current value doesn't affect what its superkeys are. Per the definition of superkey, in an empty relation value every subset of attributes is a superkey.

Relational "superkey"

In mathematics one meaning of "relation" is a table-like set of row-like "tuples" that are lists of values. It represents a relation(ship)/association--also called a "relation" in mathematics. That is where the "R" in "RM" comes from, which is where the term "relational database" comes from. (Codd 1970) (Date 2015) Similarly "ERM" (Entity-Relationship Model) comes from "relationship" as relation/association. (Chen 1976) In a RM context a "relation" is also table-like but usually holds a set of "tuples" that are sets of pairs of "attribute" names & values. (Or it might be a math relation or a mix.) There are two RM senses of "superkey"--of a relation value & of a relation variable or expression. A superkey of a relation value is a set of attributes where a relation does not contain two rows with that subtuple. A superkey of a relation variable or expression is a set of attributes where in every situation/state it does not contain two rows with that subtuple. So a variable has a certain superkey when all the values it can hold have that superkey.

(Find a definition in a published academic textbook. Note that when definitions say "for every" or "for all" values for a name they mean that such a condition is satisfied when there is no such value. Similarly when "for some" & "there exist(s)" refer to named values they don't mean that the names necessarily name different values.)

An empty value happens to have every subset of attributes as a superkey. The definition of superkey of a variable or expression involving variables takes into account all possible values it can evaluate to, so its current value doesn't affect what its superkeys are.

Every relation has one or more superkeys: A relation holds a set of tuples, so a tuple value appears at most once, so a value for a subtuple on all attributes appears at most once, so the set of all attributes is a superkey.

SQL vs Relational

An SQL table is not a relation. It is reminiscent of a jumble of math & attribute relations with duplicates and nulls allowed. So SQL databases are called "relational" but they poorly embody the RM.

Because of the resemblance of SQL tables to relations, terms that involve relations get sloppily applied to tables. But although you can borrow terms and give them SQL meanings--value, table, FD (functional dependency), superkey, CK (candidate key), PK (primary key), FK (foreign key), join, and, predicate, NF (normal form), normalize, 1NF, etc--you can't just substitute those SQL meanings for those words in RM definitions, theorems or algorithms and get something sensible or true. Moreover SQL presentations of RM notions almost never actually tell you how to soundly apply RM notions to an SQL database. They just parrot RM presentations, oblivious to whether their use of SQL meanings for terms makes things nonsensical or invalid. ("Almost" because I hope there are some.)

If you replace "relation" by "table" (duplicates and/or nulls allowed) in certain RM superkey definitions then you get a definition for SQL superkey as a column set that satisfies a primary key or unique not null constraint. For certain other RM superkey definitions you get those sets plus {} when a table holds at most one row. (Since it "identifies" any row.) (You will probably only find people who use a 2nd-style phrasing yet think it defines what a 1st-style phrasing does. And they won't know they are misusing definitions by misinterpreting terms.) Some might just use the constraint definition. You might find "UK" (unique key) used per any of the three.

When a table holds neither duplicate rows nor nulls we can interpret it as a relation, with rows as tuples & columns as attributes. Then we can reasonably say the table's superkeys are the relation's superkeys.

"1NF" has no single meaning. Nor does "normalized" or "unnormalized" or "UNF" or "0NF" or for that matter "relation".
What to do with null values when modeling and normalizing?

PS: "CK" Don't confuse superkeys with CKs. A CK is a superkey that contains no smaller superkey. (Hence, we say that a CK is a "minimal" or "irreducible" superkey.) A relation can have multiple superkeys & CKs. A PK is some CK chosen to be distinguished as PK. SQL primary key & unique not null declare what we could call a SQL superkey, but not necessarily a minimal one, what we could call an SQL CK. So when you hear "PK" in an SQL context you have to find out whether it means "(SQL superkey) column list declared via primary key (maybe or maybe not a SQL CK)" and/or "distinguished SQL superkey (maybe or maybe not declared via primary key)" and/or "distinguished SQL CK (maybe or maybe not declared via primary key)". And you always have to ask what "key" means. Usually, SQL superkey--whatever that means.

PS: "Relation(ship)" Get straight what you mean by each of "relation" & "relationship"--association? table? FK? In a RM database every relation value (of a variable or expression) represents a relation(ship)/association. But "relationship" (sometimes, "relation") is also (mis)used (in an entrenched way) for FK--not in the RM or ERM, but in pseudo-RM & -ERM methods that misinterpret/misunderstand/misrepresent them, whose roots predate them. (Unfortunately there is very poor RM education in the database industry.) FKs, PKs, CKs, superkeys & other constraints are not needed to query & update. (They are for integrity.)

Velvety answered 4/9, 2017 at 1:5 Comment(5)
This is a strange definition for "relation". In a database (and the question is about SQL, in other words: relational databases) a relation usually describes how two tables are related. Your description "when an SQL table value has no duplicate rows [...] then we can reasonably call it a relation" is very different from the generally accepted definition and hence quite confusing.Elodiaelodie
Please see my extensively renovated answer. I hope that it (more) clearly, simply & directly addresses the SQL view & terms but also their connection to, dependence on & difference from the original RM (Relational Model) ones. If you dispute RM content, I hope you will follow the links & otherwise learn more about the RM & its poor embodiment in SQL DBMSs & many (non-"fact oriented") information modeling & database design methods.Velvety
That's a very thorough exlanation now. I must admit, while I find it easy to build proper normalized databases with appropriate primary and foreign keys, I don't bother much with the theories on how to get from NF2 to NF3, how to find super and candidate keys and all this. I also use "relations" rather than "relationships" for relations between entities (tables usually), as this corresponds much better with how we use the words in English. This is also how I understand the term ERM - the relations between entities. Few database people would use the (confusing) term "relation" to mean a table.Elodiaelodie
I'd agree mathematicians, at least, & I, use "relation" as relationship/association represented by a math relation-as-table. If you read the early parts of the original RM (Codd) & ERM (Chen) papers it is quite clear that relation = table representing relation(ship)/association and relation(ship) <> FK. That should give you some idea that there has been extensive ongoing use (& abuse) since 1970 & 1976. You just learned via a pseuto-ERM. See my answer links.Velvety
@ThorstenKettner, “In a database (and the question is about SQL, in other words: relational databases) a relation usually describes how two tables are related.” — That's incorrect. What you describe is a “relationship” between tables. The “relation” is one table, a term from the relational model of databases. philipxy is using the term correctly.Rentier
E
0

The superkey is simply the column or group of columns that uniquely identify a record. Such as the employee number in an employee table.

  1. When that table is empty (i.e. no employees have been entered, yet), the table's key is still the employee number. A key is a property of the table regardless of the records within.
  2. There are situations when a table doesn't have a superkey. These situations are rare. Most often these are lists, like "at what hour did an employee call which number". You'd store the employee number, the hour and the phone number. And if an employee calls the same number within the same hour twice, there are two similar records, hence no unique key. Most often, however, such situations can be avoided (in the example given, we could store the complete time down to seconds or even milliseconds instead of hours and thus get a unique key consisting of employee number plus call time).
Elodiaelodie answered 3/9, 2017 at 16:38 Comment(2)
@philipxy: I know "superkey" to mean the minimal unique key for a database table, i.e. the key that uniquely identifies a row in a database table. All Internet sites I checked agree on this definition. Maybe you know the term "superkey" to mean something else, something about relations, but this is not the common definition then. I think my answer covers the two questions completely and you are only complicating things with explaning things that have not been asked.Elodiaelodie
Your answer many times refers to 'the' superkey/column/group as if there were always just one. The only time an SQL table can have just one is when there is one column--every superset of a superkey is a superkey. See my edited answer for table vs relation vs relationship vs FK and re SQL vs relational.Velvety
C
0

Here are a few precise definitions from Ronald Fagin’s 1981 A Normal Form for Relational Databases that Is Based on Domains and Keys (the paper which defined domain–key normal form, D.K.N.F.):

  • Attributes: a set X. E.g. {city, country}.
  • X-tuple: a function of domain the set of attributes X. E.g. {(city, Paris), (country, France)}.
  • X-relation: a set of X-tuples. E.g. {{(city, Paris), (country, France)}, {(city, Berlin), (country, Germany)}}.
  • X-constraint: a function of domain the set of X-relations and codomain {0, 1}. For an X-constraint σ that maps an X-relation R to 1, it is said that R obeys σ and that σ holds in R.
  • Relation schema: a tuple of 1st entry a set of attributes X and 2nd entry a set of X-constraints.
  • Database schema: a set of relation schemata.
  • Relation schema instance: an X-relation that has the attributes and obeys the constraints of a relation schema.
  • Functional dependency (FD) constraint: AB where A, BX holds in X-relations R such that for all t1, t2 in R, (t1[A] = t2[A] ⇒ t1[B] = t2[B]).
  • Key dependency (KD) constraint: KEY(A) where AX holds in X-relations R such that AX holds in R. A is said to be a superkey (also called a key when it is irreducible) of the relation schemata (and of their instances) that have the key dependency constraint KEY(A).
  1. Does an empty SQL table always have a superkey?
  2. Does every SQL table always have a superkey?
  1. Yes, because the empty X-relation obeys any key dependency constraints KEY(A) where AX.

    Proof. For all t1, t2 in {}, (t1[A] = t2[A] ⇒ t1[X] = t2[X]), which is equivalent to for all t1, t2, (t1, t2 in {} ⇒ (t1[A] = t2[A] ⇒ t1[X] = t2[X])), which is a vacuous truth as the antecedent t1, t2 in {} of the first conditional is false.

  2. Yes, because every X-relation obeys the key dependency constraint KEY(X).

    Proof. Let R be an X-relation. For all t1, t2 in R, (t1[X] = t2[X] ⇒ t1[X] = t2[X]).

Caddie answered 3/1, 2021 at 13:11 Comment(4)
But SQL does not have relations, so this doesn't (directly) apply to SQL. PS This paraphrase of the paper is not quite self-contained & clear. PS Anyway there are many variations on what a DB relation is. And the question doesn't make clear whether 'table' means a value or variable. (The paraphrase, even though it distinguishes between relation value & schema, only talks about values, not variables. But we use varible & schemas to address variables. The paper introduces notions applicable to variables later re anomalies: "if there is a valid instance R of [schema] R*".)Velvety
"A superkey is a key dependency constraint." No, a superkey is a paper/paraphrase key, ie an attribute set for which KEY holds. (And so your 1 & 2 don't answer (the relational versions of) the (unclear) questions.)Velvety
@Velvety "No, a superkey is a paper/paraphrase key, ie an attribute set for which KEY holds." Well spotted. I have just updated the answer with the correct definition of Fagin’s article that I had overlooked (though he uses the name key for both reducible and irreducible superkeys).Tempa
@Velvety "The paraphrase, even though it distinguishes between relation value & schema, only talks about values, not variables." Fagin defines keys for both relation schemata and relation instances (cf. the updated definition in my answer) so I don’t think we need to introduce Date’s relation variables.Tempa

© 2022 - 2024 — McMap. All rights reserved.