What is a good KISS description of Boyce-Codd normal form?
Asked Answered
G

6

24

What is a KISS (Keep it Simple, Stupid) way to remember what Boyce-Codd normal form is and how to take a unnormalized table and BCNF it?

Wikipedia's info: not terribly helpful for me.

Geny answered 11/2, 2009 at 23:21 Comment(1)
@Smith325, the acronym KISS means "Keep it Simple Stupid." He's not calling anyone stupid. :)Antiworld
B
50

Chris Date's definition is actually quite good, so long as you understand what he means:

Each attribute

Your data must be broken into separate, distinct attributes/columns/values which do not depend on any other attributes. Your full name is an attribute. Your birthdate is an attribute. Your age is not an attribute, it depends on the current date which is not part of your birthdate.

must represent a fact

Each attribute is a single fact, not a collection of facts. Changing one bit in an attribute changes the whole meaning. Your birthdate is a fact. Is your full name a fact? Well, in some cases it is, because if you change your surname your full name is different, right? But to a genealogist you have a surname and a family name, and if you change your surname your family name does not change, so they are separate facts.

about the key,

One attribute is special, it's a key. The key is an attribute that must be unique for all information in your data and must never change. Your full name is not a key because it can change. Your Social Insurance Number is not a key because they get reused. Your SSN plus birthdate is not a key, even if the combination can never be reused, because an attribute cannot be a combination of two facts. A GUID is a key. A number you increment and never reuse is a key.

the whole key,

The key alone must be sufficient [and necessary!] to identify your values; you cannot have the same data represented by different keys, nor can a subset of the key columns be sufficient to identify the fact. Suppose you had an address book with a GUID key, name and address values. It is OK to have the same name appearing twice with different keys if they represent different people and are not the "same data". If Mary Jones in accounting changes her name to Mary Smith, Mary Jones in Sales does not change her name as well. On the other hand, if Mary Smith and John Smith have the same street address and it really is the same place, this is not allowed. You have to create a new key/value pair with the street address and a new key.

You are also not allowed to use the key for this new single street address as a value in the address book since now the same street address key would be represented twice. Instead, you have to make a third key/value pair with values of the address book key and the street address key; you find a person's street address by matching their book key and address key in this group of values.

and nothing but the key

There must be nothing other than the key that identifies your values. For example, if you are allowed an address of "The Taj Mahal" (assuming there is only one) you are not allowed a city value in the same record, since if you know the address you would also know the city. This would also open up the possibility of there being more than one Taj Mahal in a different city. Instead, you have to again create a secondary Location key with unique values like the Taj, the White House in DC, and so on, and their cities. Or forbid "addresses" that are unique to a city.

So help me, Codd.

Baiss answered 12/2, 2009 at 4:15 Comment(7)
Informative answer, but do you think you could elaborate on the difference between BCNF and 3NF? This description sounds exactly like what I've read for 3NFGallegos
@Graphics, all BCNF tables are in 3NF, but the reverse is not true if a key has a dependency on another key. Bill Karwin's answer mentions this. My example is of an "address" that can exist in only certain cities. This is allowed in 3NF but not in BCNF. Detecting this requires knowing the meaning of the data, you cannot detect it from the schema. en.wikipedia.org/wiki/BCNF has an example of turning 3NF into BCNF.Baiss
First of all, I love the signature "So help me, Codd" but I do have one thing to point out something, not to say you are incorrect. From my experience Guids are overused whether it be the appearance of security through obfuscation or object mapping. In either sense I have found no need for them. You emphasized the use of Guids frequently so i merely wanted to point out that there are, I am not going to say never, often better keys. Identities, sequential Guids, INT NOT NULL, VARCHAR if it guaranteed unique. This will allow for clustered index seeks on joins and therefore be faster.Irresolution
It's not just the sig. It's the entire phrase.. "Each attribute must..... so help me Codd". It comes from Kent back before 1989.Insert
It's worth pointing out that a key is a set of n attributes, with a special case of n=1. dpxo.net/articles/whycompositekeys.htmIllfavored
The purpose of GUID keys is not obfuscation but fault-tolerance. Sequential IDs and VarChars have too many collisions; it is highly likely for a mistake in a sequential ID to match another valid ID, for keys in one table to match foreign keys in an unrelated table, merged tables to have collisions. GUIDs or sparse keys significantly reduce this. Sequential GUIDs are a useful compromise.Baiss
@AnthonyMason guids have nothing to do with security. Using a guid as a key ensures that a relational join between two keys that are not related will not product data. If you use integers as a primary key value (never do this!) you will get data on nearly any join. The greatest advantage in using guids as keys is scalability- disconnected shards can assign keys and be assured of no collisions. Virtually any type of replication schema will require a guid be added to every record. INT NOT NULL Is the worst option and is never appropriate for any enterprise data system.Scale
T
13

Here are some helpful excerpts from the Wikipedia page on Third Normal Form:

Bill Kent defines Third Normal Form this way:

Each non-key attribute "must provide a fact about the key, the whole key, and nothing but the key."

Requiring that non-key attributes be dependent on "the whole key" ensures that a table is in 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures that the table is in 3NF.

Chris Date adapts Kent's mnemonic to define Boyce-Codd Normal Form:

"Each attribute must represent a fact about the key, the whole key, and nothing but the key." Here the requirement is concerned with every attribute in the table, not just non-key attributes.

This comes into play when a table has multiple compound candidate keys, and an attribute within one candidate keys has a dependency on a part of another candidate key. Third Normal Form wouldn't prohibit this, because it excludes key attributes. But BCNF applies the rule to key attributes as well.

As for how to make a table satisfy BCNF, you need to represent the extra dependency, with another attribute and possibly by splitting attributes into another table.

Trickster answered 12/2, 2009 at 0:6 Comment(2)
it's the best explanation of BCNF vs 3NF I've ever seenWitless
Thank @PiotrekDe! I tried to write up clear descriptions of all the normal forms in the appendix to my book SQL Antipatterns: Avoiding the Pitfalls of Database ProgrammingTrickster
A
1

I googled "boyce codd normal form" and after wikipedia this is the second result. My textbook gives a very simple definition in terms of relational database management systems:

The left side of every nontrivial FD must be a superkey.

-"Database Systems The Complete Book" by Garcia-Molina, Ullman and Widom.

Advertising answered 10/10, 2012 at 5:46 Comment(4)
ah, but what's a superkey? :-)Geny
Wow. I guess that works as a definition, but certainly not a general explanation or way to remember BCNF. Any explanation that I have to pick apart word by word to try and figure out isn't doing its job very well.Rosner
@Rosner What others explained in several paragraphs this quote explains in one sentence. The English is pretty straight forward. If you are concerning yourself with BCNF you probably already know about functional dependencies and superkeys and the rest is plain English.Advertising
You need to explain FD & superkey.Boadicea
T
0

The best informal answer I've read is that, in BCNF, every "arrow" in every functional dependency is an "arrow" out of a candidate key. I don't recall the source, but it was probably something Chris Date wrote.

Tania answered 2/2, 2013 at 16:8 Comment(1)
In every non-trivial FD. Then this needs simple definitions of FD & CK.Boadicea
S
0

Basically Boyce-Codd is "fifth normal form". It is visually recognizable by the existance of "Attributive entities" in the data model, for things like Types (e.g. roles, status, process state, location-type, phone-type, etc). The attributive entities (sub-subtypes) are lists of finite sets of values that further categorize a class level entity. So you may have a phone-type ('mobile', ' desk', 'VOIP') email account type ('business', 'personal', 'gaming'), role (project manager, data modeler, super model) etc. Another morphological clue is the existance of super-types, (aka. master-classes, super-classes, meta-entities) such as Parties (subtypes being company, person, etc.).

It's basically Taxonomy gone wild (..no the video is not that exciting) to the atomic or leaf-level; see Bill Karwin's comment above for a more technical explanation.

Boyce-Codd level models are essentially highly detailed logical models, derived from more simplistic business-based conceptual models. **They are typically NOT implemented ver batim in the PHYSICAL model, because PDM optimization for performance (or functional simplicity) may result in the super-types and attributive entities being managed as drop-down lists in UIs, or in behind the scenes logic in the application, or in database constraints and methods to enforce referential integrity. (i.e. they may end up as look-up tables in the PDM schema, or they may be handled by code and not represented in the database).

So - why do them if they may not end up in the PDM? For the same reason you build a good 3NF model before you 'optimize', so that the database structure reflects the real world and is hence more stable than the typical kludges we inherit and have to do heroic acts to make work as our business/clients requirements change.

Silkstocking answered 9/4, 2014 at 0:22 Comment(1)
5NF is 5NF & BCNF is not 5NF.Boadicea
I
-1

Often times it is easiest to listen to your gut and this will come naturally. Generally speaking, if you meet 3NF you have met BCNF. This doesn't cover detailed analysis of an ERD or have examples but there are thirteen rules according to Codd. I find it best to follow these rules but always remember there is no one correct way to do things so follow them loosely. So regarding the RDBMS, here are the rules:

http://www.87android.com/12-rules-of-relational-database-model-by-codd/

This may not answer the question directly, but if you are asking about how to get to BCNF or an easy way to remember it then you don't understand normalization well enough. That is of no concern though. Relational databases take many forms and very few are done well. The best thing you can do is know what it means to be relational, follow the rules above, and do not worry about the level of normalization. The process of normalization eliminates the duplication of data. Each level more so by moving into migration of functional dependencies. Keep that in mind and you will be fine, your gut and intellect will do the rest.

Irresolution answered 15/7, 2014 at 21:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.