Difference between 3NF and BCNF in simple terms (must be able to explain to an 8-year old)
Asked Answered
L

6

188

I have read the quote : data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF].

However, I am having trouble understanding 3.5NF or BCNF as it's called. Here is what I understand :

  • BCNF is stricter than 3NF
  • left side of any FD in the table must be a superkey (or at least a candidate key)

So why is it then, that some 3NF tables are not in BCNF? I mean, the 3NF quote explicitly says "nothing but the key" meaning that all attributes depend solely on the primary key. The primary key is, after all, a candidate key until it is chosen to be our primary key.

If anything is amiss regarding my understanding so far, please correct me and thanks for any help you can provide.

Lowrie answered 8/12, 2011 at 21:30 Comment(1)
Just where is it that you think non-textbook sources get their information from? There are a lot of poor textbooks too, but textbooks are reviewed by multiple people with academic apprenticeship & are much more likely to be not nonsense than others' interpretations of textbooks. High ratings by uninformed & misinformed people do not make something correct. I put that comment there for people who arrived at your question. That "nothing but the key" phrase is less than useless. Having a correct definition is certainly the issue, because "understanding the concept" is impossible without one.Fauman
F
178

Your pizza can have exactly three topping types:

  • one type of cheese
  • one type of meat
  • one type of vegetable

So we order two pizzas and choose the following toppings:

Pizza    Topping     Topping Type
-------- ----------  -------------
1        mozzarella  cheese
1        pepperoni   meat
1        olives      vegetable
2        mozzarella  meat
2        sausage     cheese
2        peppers     vegetable

Wait a second, mozzarella can't be both a cheese and a meat! And sausage isn't a cheese!

We need to prevent these sorts of mistakes, to make mozzarella always be cheese. We should use a separate table for this, so we write down that fact in only one place.

Pizza    Topping
-------- ----------
1        mozzarella
1        pepperoni
1        olives
2        mozzarella 
2        sausage
2        peppers

Topping     Topping Type
----------  -------------
mozzarella  cheese
pepperoni   meat
olives      vegetable
sausage     meat
peppers     vegetable

That was the explanation that an 8 year-old might understand. Here is the more technical version.

BCNF acts differently from 3NF only when there are multiple overlapping candidate keys.

The reason is that the functional dependency X -> Y is of course true if Y is a subset of X. So in any table that has only one candidate key and is in 3NF, it is already in BCNF because there is no column (either key or non-key) that is functionally dependent on anything besides that key.

Because each pizza must have exactly one of each topping type, we know that (Pizza, Topping Type) is a candidate key. We also know intuitively that a given topping cannot belong to different types simultaneously. So (Pizza, Topping) must be unique and therefore is also a candidate key. So we have two overlapping candidate keys.

I showed an anomaly where we marked mozarella as the wrong topping type. We know this is wrong, but the rule that makes it wrong is a dependency Topping -> Topping Type which is not a valid dependency for BCNF for this table. It's a dependency on something other than a whole candidate key.

So to solve this, we take Topping Type out of the Pizzas table and make it a non-key attribute in a Toppings table.

Fulllength answered 8/12, 2011 at 22:50 Comment(11)
"It's a dependency on something other than a whole candidate key." - Thank youIselaisenberg
"So in any table that has only one candidate key and is in 3NF" -- Not quite. The example you give does meet this condition. However, it is not a 3NF example because it is not 2NF. The key (1NF), the whole key (2NF), and nothing but the key (3NF). The key is (Pizza, Topping), and the column ToppingType is dependent upon the key and nothing but the key, but it is not dependent on the whole key. Hence it is not 2NF, and thus not 3NF or BCNF. It is 1NF. Making it 2NF would bypass the problem you are trying to illustrate.Traveled
@DanielBarbalace, The point of this table is that it has an alternative candidate key for this table: (Pizza, ToppingType). Since ToppingType is a subset of that candidate key, it satisfies 2NF.Fulllength
Sorry I had to downvote it. The example you showed is not in 3NF. To understand the purpose of BCNF, I must see an example where it is in 3NF but not i BCNF. Right now, I don't see the purpose of BCNF.Idyllist
Why is this NOT handled in 2NF already? From my viewpoint, the original table's primary key is Pizza + Topping, and Topping Type is dependent on Topping, so isn't that a partial dependency which should be taken care of in the 2NF stage?Shammer
@DanielBarbalace 2NF violation can be characterized as when a CK partially determines a non-prime (aka non-CK) attribute. But here all the attributes are prime. So there cannot be a 2NF violation. Also you are not quoting definitions of 1NF, 2NF or 3NF--in case you don't know.Fauman
In the pizza example, doesn't the decomposition remove the constraint that each pizza can only have exactly one topping of each type? As now the same pizza can have more than one topping of the same type. It is in BCNF though. Seems analogous to this example but with the columns in a different order.Ian
It's true that "BCNF acts differently from 3NF only when there are multiple overlapping candidate keys". But that is not helpful as a characterization of BCNF, 3NF-but-not-BCNF or 3NF vs BCNF. 3NF + overlapping CKs is a condition between 3NF & BCNF--it's necessary but not sufficient for BCNF & it doesn't preclude 3NF-but-not-BCNF. You don't clearly characterize 3NF vs BCNF. The claims & reasoning of the next paragraph aren't clear either. (Your presentations re NFs are often fuzzy.) (Like most.) PS "(Pizza, Topping Type) is a candidate key" contradicts "(Pizza, Topping) must be unique".Fauman
@Fauman I don't understand "(Pizza, Topping Type) is a candidate key" contradicts "(Pizza, Topping) must be unique". They're both CK so they're both unique.Glynisglynn
@Glynisglynn That PS by me is wrong & I don't know whether I was trying to say something else. I suspect I mistook (Pizza, Topping Type) for (Pizza, Topping, Topping Type).Fauman
@Iselaisenberg Unfortunately, that quote is wrong. It should be "on something other than a superset of a CK", not just a whole CK.Fauman
B
101

The subtle difference is that 3NF makes a distinction between key and non-key attributes (also called non-prime attributes) whereas BCNF does not.

This is best explained using Zaniolo's definition of 3NF, which is equivalent to Codd's:

A relation, R, is in 3NF iff for every nontrivial FD (X->A) satisfied by R at least ONE of the following conditions is true:

(a) X is a superkey for R, or

(b) A is a key attribute for R

BCNF requires (a) but doesn't treat (b) as a special case of its own. In other words BCNF requires that every nontrivial determinant is a superkey even its dependent attributes happen to be part of a key.

A relation, R, is in BCNF iff for every nontrivial FD (X->A) satisfied by R the following condition is true:

(a) X is a superkey for R

BCNF is therefore more strict.

The difference is so subtle that what many people informally describe as 3NF is actually BCNF. For example, you stated here that 3NF means "data depends on the key[s]... and nothing but the key[s]", but that is really an informal description of BCNF and not 3NF. 3NF could more accurately be described as "non-key data depends on the keys... and nothing but the keys".

You also stated:

the 3NF quote explicitly says "nothing but the key" meaning that all attributes depend solely on the primary key.

That's an oversimplification. 3NF and BCNF and all the Normal Forms are concerned with all candidate keys and/or superkeys, not just one "primary" key.

Barina answered 9/12, 2011 at 16:5 Comment(9)
Wow. Prof. Zaniolo actually teaches my class (CS 143, UCLA), and I stumbled upon this answer while preparing for the final exam. Great to see my prof's name, and thanks for the detailed answer!Mimi
could you give an example of a relation which is in 3NF but not in BCNF? its hard for me to imagine...Glandule
R{A,B,C} where {A,B} is a key. Given the dependency C->B, R satisfies the requirements of 3NF but not BCNF.Barina
What exactly "key" means when it is used in definition? Is it just the primary key, any of candidate keys or all the candidate keys?Flofloat
Key means candidate key. Key attribute means an attribute that is part of a candidate key, AKA a prime attribute.Barina
What does it mean prime (or non-prime) attribute ?Hessian
An attribute is prime if it is part of any candidate key; non-prime if it is not part of any candidate key.Barina
This definition doesn't make sense when you consider attribute sets. Does "key attribute" mean "a set consisting of only key attributes"?Charette
@AleksandrH An attribute that is an element of a CK is a called a key attribute or prime attribute. So "key attribute" usually means an attribute that is an element of a CK. Zaniolo defines X->A to mean X->{A}. Also when we are talking about an attribute set with exactly one element we sloppily use "attribute" to mean "attribute set". So "key attribute" could mean an attribute set that is a CK & has eactly one attribute. But this answer isn't concerned with CK cardinality, so it doesn't use it in that sense.Fauman
M
40

The difference between BCNF and 3NF

Using the BCNF definition

If and only if for every one of its dependencies X → Y, at least one of the following conditions hold:

  • X → Y is a trivial functional dependency (Y ⊆ X), or
  • X is a super key for schema R

and the 3NF definition

If and only if, for each of its functional dependencies X → A, at least one of the following conditions holds:

  • X contains A (that is, X → A is trivial functional dependency), or
  • X is a superkey, or
  • Every element of A-X, the set difference between A and X, is a prime attribute (i.e., each attribute in A-X is contained in some candidate key)

We see the following difference, in simple terms:

  • In BCNF: Every partial key (prime attribute) can only depend on a superkey,

whereas

  • In 3NF: A partial key (prime attribute) can also depend on an attribute that is not a superkey (i.e. another partial key/prime attribute or even a non-prime attribute).

Where

  1. A prime attribute is an attribute found in a candidate key, and
  2. A candidate key is a minimal superkey for that relation, and
  3. A superkey is a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples (rows) that have the same values for the attributes in this set.Equivalently a superkey can also be defined as a set of attributes of a relation schema upon which all attributes of the schema are functionally dependent. (A superkey always contains a candidate key/a candidate key is always a subset of a superkey. You can add any attribute in a relation to obtain one of the superkeys.)

That is, no partial subset (any non trivial subset except the full set) of a candidate key can be functionally dependent on anything other than a superkey.

A table/relation not in BCNF is subject to anomalies such as the update anomalies mentioned in the pizza example by another user. Unfortunately,

  • BNCF cannot always be obtained, while
  • 3NF can always be obtained.

3NF Versus BCNF Example

An example of the difference can currently be found at "3NF table not meeting BCNF (Boyce–Codd normal form)" on Wikipedia, where the following table meets 3NF but not BCNF because "Tennis Court" (a partial key/prime attribute) depends on "Rate Type" (a partial key/prime attribute that is not a superkey), which is a dependency we could determine by asking the clients of the database, the tennis club:

Today's Tennis Court Bookings (3NF, not BCNF)

Court   Start Time  End Time    Rate Type
------- ----------  --------    ---------
1       09:30       10:30       SAVER
1       11:00       12:00       SAVER
1       14:00       15:30       STANDARD
2       10:00       11:30       PREMIUM-B
2       11:30       13:30       PREMIUM-B
2       15:00       16:30       PREMIUM-A

The table's superkeys are:

S1 = {Court, Start Time}
S2 = {Court, End Time}
S3 = {Rate Type, Start Time}
S4 = {Rate Type, End Time}
S5 = {Court, Start Time, End Time}
S6 = {Rate Type, Start Time, End Time}
S7 = {Court, Rate Type, Start Time}
S8 = {Court, Rate Type, End Time}
ST = {Court, Rate Type, Start Time, End Time}, the trivial superkey

The 3NF problem: The partial key/prime attribute "Court" is dependent on something other than a superkey. Instead, it is dependent on the partial key/prime attribute "Rate Type". This means that the user must manually change the rate type if we upgrade a court, or manually change the court if wanting to apply a rate change.

  • But what if the user upgrades the court but does not remember to increase the rate? Or what if the wrong rate type is applied to a court?

(In technical terms, we cannot guarantee that the "Rate Type" -> "Court" functional dependency will not be violated.)

The BCNF solution: If we want to place the above table in BCNF we can decompose the given relation/table into the following two relations/tables (assuming we know that the rate type is dependent on only the court and membership status, which we could discover by asking the clients of our database, the owners of the tennis club):

Rate Types (BCNF and the weaker 3NF, which is implied by BCNF)

Rate Type   Court   Member Flag
---------   -----   -----------
SAVER       1       Yes
STANDARD    1       No
PREMIUM-A   2       Yes
PREMIUM-B   2       No

Today's Tennis Court Bookings (BCNF and the weaker 3NF, which is implied by BCNF)

Member Flag     Court     Start Time   End Time
-----------     -----     ----------   --------
Yes             1         09:30        10:30
Yes             1         11:00        12:00
No              1         14:00        15:30
No              2         10:00        11:30
No              2         11:30        13:30
Yes             2         15:00        16:30

Problem Solved: Now if we upgrade the court we can guarantee the rate type will reflect this change, and we cannot charge the wrong price for a court.

(In technical terms, we can guarantee that the functional dependency "Rate Type" -> "Court" will not be violated.)

Mozarab answered 27/10, 2015 at 22:13 Comment(0)
L
11

This is an old question with valuable answers, but I was still a bit confused until I found a real life example that shows the issue with 3NF. Maybe not suitable for an 8-year old child but hope it helps.

Tomorrow I'll meet the teachers of my eldest daughter in one of those quarterly parent/teachers meetings. Here's what my diary looks like (names and rooms have been changed):

Teacher   | Date             | Room
----------|------------------|-----
Mr Smith  | 2018-12-18 18:15 | A12 
Mr Jones  | 2018-12-18 18:30 | B10 
Ms Doe    | 2018-12-18 18:45 | C21 
Ms Rogers | 2018-12-18 19:00 | A08 

There's only one teacher per room and they never move. If you have a look, you'll see that: (1) for every attribute Teacher, Date, Room, we have only one value per row. (2) super-keys are: (Teacher, Date, Room), (Teacher, Date) and (Date, Room) and candidate keys are obviously (Teacher, Date) and (Date, Room).

(Teacher, Room) is not a superkey because I will complete the table next quarter and I may have a row like this one (Mr Smith did not move!):

Teacher  | Date             | Room
---------|------------------| ----
Mr Smith | 2019-03-19 18:15 | A12

What can we conclude? (1) is an informal but correct formulation of 1NF. From (2) we see that there is no "non prime attribute": 2NF and 3NF are given for free.

My diary is 3NF. Good! No. Not really because no data modeler would accept this in a DB schema. The Room attribute is dependant on the Teacher attribute (again: teachers do not move!) but the schema does not reflect this fact. What would a sane data modeler do? Split the table in two:

Teacher   | Date
----------|-----------------
Mr Smith  | 2018-12-18 18:15
Mr Jones  | 2018-12-18 18:30
Ms Doe    | 2018-12-18 18:45
Ms Rogers | 2018-12-18 19:00

And

Teacher   | Room
----------|-----
Mr Smith  | A12
Mr Jones  | B10
Ms Doe    | C21
Ms Rogers | A08

But 3NF does not deal with prime attributes dependencies. This is the issue: 3NF compliance is not enough to ensure a sound table schema design under some circumstances.

With BCNF, you don't care if the attribute is a prime attribute or not in 2NF and 3NF rules. For every non trivial dependency (subsets are obviously determined by their supersets), the determinant is a complete super key. In other words, nothing is determined by something else than a complete super key (excluding trivial FDs). (See other answers for formal definition).

As soon as Room depends on Teacher, Room must be a subset of Teacher (that's not the case) or Teacher must be a super key (that's not the case in my diary, but thats the case when you split the table).

To summarize: BNCF is more strict, but in my opinion easier to grasp, than 3NF:

  • in most of cases, BCNF is identical to 3NF;
  • in other cases, BCNF is what you think/hope 3NF is.
Leacock answered 17/12, 2018 at 15:50 Comment(0)
H
7

All good answers. To put it in simple language [BCNF] No partial key can depend on a key.

i.e No partial subset ( i.e any non trivial subset except the full set ) of a candidate key can be functionally dependent on some candidate key.

Hex answered 13/2, 2013 at 17:38 Comment(1)
Why not? Let's say there's a relation R(A, B, C, D, E) and (A, B) and (C, D) are candidate keys. Then AB->D. Since AB is a superkey of R, so R should be in BCNF, right? (Just a question, trying to understand this.)Mentality
E
5

Answers by ‘smartnut007’, ‘Bill Karwin’, and ‘sqlvogel’ are excellent. Yet let me put an interesting perspective to it.

Well, we have prime and non-prime keys.

When we focus on how non-primes depend on primes, we see two cases:

Non-primes can be dependent or not.

  • When dependent: we see they must depend on a full candidate key. This is 2NF.
  • When not dependent: there can be no-dependency or transitive dependency

    • Not even transitive dependency: Not sure what normalization theory addresses this.
    • When transitively dependent: It is deemed undesirable. This is 3NF.

What about dependencies among primes?

Now you see, we’re not addressing the dependency relationship among primes by either 2nd or 3rd NF. Further such dependency, if any, is not desirable and thus we’ve a single rule to address that. This is BCNF.

Referring to the example from Bill Karwin's post here, you’ll notice that both ‘Topping’, and ‘Topping Type’ are prime keys and have a dependency. Had they been non-primes with dependency, then 3NF would have kicked in.

Note:

The definition of BCNF is very generic and without differentiating attributes between prime and non-prime. Yet, the above way of thinking helps to understand how some anomaly is percolated even after 2nd and 3rd NF.

Advanced Topic: Mapping generic BCNF to 2NF & 3NF

Now that we know BCNF provides a generic definition without reference to any prime/non-prime attribues, let's see how BCNF and 2/3 NF's are related.

First, BCNF requires (other than the trivial case) that for each functional dependency X -> Y (FD), X should be super-key. If you just consider any FD, then we've three cases - (1) Both X and Y non-prime, (2) Both prime and (3) X prime and Y non-prime, discarding the (nonsensical) case X non-prime and Y prime.

For case (1), 3NF takes care of.

For case (3), 2NF takes care of.

For case (2), we find the use of BCNF

Enki answered 3/10, 2016 at 11:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.