(De)Normalization of two relations
Asked Answered
E

4

9

People who read C.J.Date's Introduction to Database System or books of similar level should not have problems with definition of normalization and denormalization.

However, memory is not what it used to be and I find myself often looking at some design and saying that it is not normalized even though I can not find which of the normal forms it is breaking.

The actual example that illustrate it is:

If we have relations

r1 (A, B, C) and r2 (A, D)

with FDs: AB->C and A->D

and r1 represent detailed data, while r2 is summary of that data (in another words each instance of D is a function of values in r1. in this example let it be subtotal of values C according to A from r1).

Example instance

r1 = 
A  B  C  
1  1  10
1  2  20
2  1  10
2  2  25

r2 =
A  D
1  30
2  35

So, even though I can not say that it breaks for example 2NF or 3NF, I seem to be stuck on the idea that the design is still denormalised in the following sense (from Codd, E.F. "Further Normalization of the Data Base Relational Model", p. 34, commenting on the reasons to normalize beyond 1NF):

  1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
  2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs;
  3. To make the relational model more informative to users;
  4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

As I can say, that if we define D as a sum of all Cs from r1 where A from r1 is equal to A from r2 then, if we update C in r1 and we don't update D in r2, we can end up with undesirable update dependency and the data ends up in inconsistent state I find this reason to call r1 and r2 denormalized and to think of them as denormalized. (In fact whole r2 is a function of r1 and bring zero new facts into the model; r2 = f(r1))

So the questions are

  1. can we call r1 and r2 denormalized?
  2. if yes, why? if not, why? (according to which rule? or according to which definition?)

NOTE:
To those who find the question(s) interesting enough to put in an answer, I kindly ask to provide either something quotable or to put it in a form of specific assumptions and conclusions (or in another words, if you are going to put in your opinion, please follow it with some reasoning).

EDIT I accepted dportas answer. I'll try to add a bit to it here: C.J.Date can makes a clear and strict distinction:

Much of design theory has to do with reducing redundancy; normalization reduces redundancy within relvars, orthogonality reduces it across relvars.

quoted from Database in depth: relational theory for practitioners

and on the next page

just as a failure to normalize all the way implies redundancy and can lead to certain anomalies, so too can a failure to adhere to orthogonality.

Epirogeny answered 30/11, 2010 at 9:15 Comment(0)
V
5

Assuming AB is a key in r1 and A is a key in r2 then it seems that the schema is in 6NF. The Relational Database Dictionary (Date) defines denormalization as:

Replacing a set of relvars R1, R2, . . ., Rn by their join R, such that for all i the projection of R on the attributes of Ri is guaranteed to be equal to Ri (i = 1, 2, . . ., n).

Fundamentally, normalization/denormalization is about composition and nonloss decomposition using projection and join operators. In this example you have redundancy caused by a different operator: summation. I expect it would be quite possible in principle to form a theory of "normalization" for operators other than projection and join, perhaps even for non-relational functions like summation. That's not how normalization is conventionally defined however and in the absence of any sound basis for doing otherwise I think we ought to apply the technical meaning denormalization as defined by Date in the above quotation.

Vereeniging answered 30/11, 2010 at 10:27 Comment(3)
This sounds coherent +1 and I am considering accepting it. However, I am not convinced that the above definition is both sufficient and necessary. For example under the above definition of denormalization the relation r (A, B, C) with FD: A->B, A->C is denormalized (if we had relations r1 (A, B) and r2 (A, C) and r is join of r1 and r2), but I fail to see which normal form they break (which brings me to the initial question).Epirogeny
@Epirogeny : "... I fail to see which normal form they break". r (A, B, C) with FD: A->B, A->C would violate 6NF.Vereeniging
Ok, I accept the quote and definition. Also refreshed my memory a bit and indeed normalization always talks about single relvar, so under strict definition I can not speak of normalization between two relvars. (I am not completely sold on the issue of scalar operator vs. relational operator, but I don't see it as relevant any more for this question)Epirogeny
U
4

Your definition for column D in r2, "a sum of all Cs from r1 where A from r1 is equal to A from r2", is a constraint on D. More formally, where Σ is summation, π is projection and σ is selection,

(a,d) ∈ r2 ⇔ (a, d) = (a, Σ c), a ∈ πA(r1), c ∈ πCA=a(r1))

Since this constraint is neither a domain constraint nor a key constraint, r2 is not in Domain/Key Normal Form (DKNF).

DKNF is the only normal form of which I'm aware that isn't defined in terms of a single relation, chiefly because it's defined in terms of constraints rather than dependencies.

Upbringing answered 25/1, 2012 at 6:17 Comment(0)
C
2

so r2 is a function of r1 which mean r2 is a materialized view of that function of r1

and in that example the it would be a view of select A, sum(C) from r1 group by A

views are not covered in codd's work on normalization, but i think he did write about them

materializing a view is normally done for caching reasons which some might think of as a form of denormalization so there were papers on automatically deciding which view to materialize thus making it just something the database could do with view to make them faster sometimes

but as updates to views are normally not permitted although i think i read that codd said something like all views that can be update-able should be and there were papers on getting that to work in some complex cases

Carbrey answered 30/11, 2010 at 9:51 Comment(5)
It might be a view but normalization is about semantics, not implementation. In this case it seems that the relevant fact is that A determines D, regardless of whether r1 and or r2 are base relations or derived relations.Vereeniging
but in a database without views, r2 would still be a materialized view of r1 and yes the functional A->D would hold on r2 but that's not the issue as what op was asking was how are r2 and r1 related and that is via that r2 is a materialized view of a function of r1Carbrey
I have a problem with your first sentence - yes if we look at r2 as materialized view then the rest of your answer is at least connected. But in my question I don't stipulate that it is a materialized view - or in another terms what rule gets broken if I make r2 not a materialized view, but I insist that it is a relation in its own right?Epirogeny
it is a table so it's got to be a relation but when looking at the two for normalization you can't do anything to normalize them but realize that r2 is a materialized view of r1; otherwise which ends up i think with the answer is that it would be denormalized following what they said about storing computed columns which are always a function of other columns in a rowCarbrey
the example they gave was sales table (quantity, price) they stated that a product of the two as a total wasn't something you should store as it is fully determined by the other data in the row the op's example the same thing but with tables instead of rowsCarbrey
L
1

I think the pair of relations violates fifth normal form.


R2 is a projection of R1. Some argue that SUM is outside the scope of the relational model. In this case, SUM is a trivial extension of COUNT, which is within the scope of the relational model.

Latter answered 30/11, 2010 at 12:24 Comment(3)
as I said, I am not looking just for an opinion or in another words, why do you think so?Epirogeny
Edited the response based on your comment.Latter
+1 (since the answer is useful to me). Re SUM/COUNT yes, that was my feeling to, but actually the definition mentions no operators except join and projection. This seems coherent; I've came to accept the fact that normalization talks only about single relvar. So, there can not be denormalization between two tables.Epirogeny

© 2022 - 2024 — McMap. All rights reserved.