Relational Algebra - Cartesian Product vs Natural Join?

Asked 2/1, 2013 at 18:38 Answered 23/2, 2020 at 21:29

Solved sql database-design relational relation relational-algebra

I am studying for exams and am failing to find a solid criteria by which I can determine if the Cartesian Product x is to be used or if Natural Join |X| is to be used.

I had come up with a rough guide that:

"If you need to project an attribute that has the same name as an attribute in the table to be joined you must use x and state the table names to be projected: tableA.colname1 = tableB.colname1"

This however doesn't follow some of the solutions in my notes and my lecturer seems to use x with the above convention or |x| interchangeably.

Does anyone have a rule that can be followed to define use of one over the other?

Take for example this schema (only schema related to the question quoted for brevity):

takes(ID, course_id, sec_id, semester, year, grade)
student(ID, name, dept_name, tot_cred)

Q) Find the name of all students who took courses that were taught in either Spring 2011 or Autumn 2011.

My answer attempt:

π name(σ semester="Spring" ^ year=2011(takes ⋈ student)) ∪ π name(σ semester="Autumn" ^ year=2011(takes ⋈ student))

Actual answer:

π name(σ semester="Spring" ^ year=2011 ^ takes.ID=student.ID(takes x student)) ∪ π name(σ semester="Autumn" ^ year=2011 ^ takes.ID=student.ID(takes x student))

Can anyone provide a reason as to why?

In my mind the Natural Join would take care of the takes.ID=student.ID?

Pretender answered 2/1, 2013 at 18:38 Comment(3)

Real world databases almost never use a cartesian product. – Schiro 27/11, 2016 at 18:25

Joel, I disagree. I've often found cartesian product useful "in the real world". – Equanimity 27/11, 2016 at 18:52

@sqlvogel I've used them, too... but it's rare. – Schiro 12/12, 2016 at 23:31

A natural join, as I understand it, is a projected, filtered Cartesian product:

You take the Cartesian product, then
select it, so that the values in columns of the same name have the same value, and
project it, so that all columns have distinct names.

Under this assumption, your answer is isomorphic to the actual answer.

To see this, you might want to expand the natural join to the above sequence of operators, and float them around using the laws of relational algebra. You'll see that the projection disappears due to the projection to name, and the selection criterion is fused with the selection above. You'll end up with exactly the same tree as the actual answer, even though you never changed the meaning of your own answer!

I can think of one reason why your lecturer uses these concepts interchangeably: your lecturer wants you to understand that these concepts can be used interchangeably, because "the natural join is just a shortcut" (though that's debatable).

Nonmaterial answered 2/1, 2013 at 18:55 Comment(9)

So you mean like so: r ⋈ s == π r.A, r.B, r.C, r.D, s.E (σ r.B = s.B ^ r.D = s.D (r x s)) for say schema: R(A,B,C,D) and S(B,D,E)? – Pretender 2/1, 2013 at 19:5

Yes! :) At least, that's how I understand it. – Nonmaterial 2/1, 2013 at 19:6

Okay that makes sense to me, just worried as the "answers" only show 1 solution to the problem, when in fact there ARE 2 and I would be marked incorrect. – Pretender 2/1, 2013 at 19:7

There are too many possible answers, and only the smallest would be called correct. Your solution is smaller than the actual solution, so if you were allowed to use ⋈, your answer is "more correct". There are even shorter solutions possible, though. Float the projections over the set union, and you'll save yourself one operator. – Nonmaterial 2/1, 2013 at 19:10

I was thinking this: π name(σ (semester="Spring" ^ year=2011) ∨ (semester="Autumn" ^ year=2011)(takes ⋈ student)) but i'm not sure if that's allowed... – Pretender 2/1, 2013 at 19:14

Good point, but I don't know that either. That really depends on what operators you're allowed to use. If you can't use logical disjunction, you simply have to use set union. But if you can, then that's almost the shortest possible answer. – Nonmaterial 2/1, 2013 at 19:17

What IS the shortest answer by way of interest? – Pretender 2/1, 2013 at 19:25

π name(σ (semester="Spring" ∨ semester="Autumn") ^ year=2011 (takes ⋈ student)), I think. Unless you're also allowed to test for set membership, then it would be π name(σ (semester in {"Spring", "Autumn"}) ^ year=2011 (takes ⋈ student)). – Nonmaterial 2/1, 2013 at 19:27

Membership tests are not allowed, however logical disjunction has not been excluded. Thanks for your help @Tinctorius – Pretender 2/1, 2013 at 19:31

Cartesian product is just a special case of natural join where the joined relations don't have any attribute names in common. In Codd's original algebra renaming was a separate operation altogether. To obtain a true cartesian product of two relations that have some attributes in common you would have to rename those attributes before doing the (natural) join.

In the interests of brevity the renamings are sometimes left out of written examples and the product symbol used instead. Unfortunately this obscures the important point, that there is only one kind of join.

Equanimity answered 27/11, 2016 at 18:59 Comment(0)

-1

I think there are two extreme situation:

there are no duplicate rows in inner join: inner join is equal to intersect (i mean only the result). distinct inner join ~ intersect
there are no common feature in the inner join: inner join is equal to Cartesian product.

Lanti answered 23/2, 2020 at 21:29 Comment(0)

Recommended topics

Hot tags