The database designer gives each base table a predicate (sentence template parameterized by column names).
-- person PERSON is parent of person CHILD
-- shorthand Parent(PERSON, CHILD)
create table Parent(person, child)
-- person PERSON likes food FOOD
-- shorthand Likes(PERSON, FOOD)
create table Likes(person, food)
Relational algebra is designed so that the value of a relational expression (base table name or operator call) holds the rows that make a true proposition (statement) from its predicate.
/* (PERSON, CHILD) rows where
person PERSON is parent of person CHILD
i.e. where
Parent(PERSON, CHILD)
*/
Parent
The predicate of an expression that is a call to operator NATURAL JOIN is the AND of the predicates of its inputs.
/* (PERSON, CHILD, FOOD) rows where
person PERSON is parent of person CHILD AND person PERSON likes food FOOD
i.e. where
Parent(PERSON, CHILD) AND Likes(PERSON, FOOD)
*/
Parent NATURAL JOIN Likes
Ditto for UNION and OR, MINUS and AND NOT, PROJECT kept column(s) and EXISTS the dropped column(s), RESTRICT condition and AND condition, and RENAME of a column & rename of a parameter.
/* (CHILD, FOOD) rows where
there EXISTS a value for PERSON such that
person PERSON is parent of person CHILD AND person CHILD likes food FOOD
i.e. where
EXISTS PERSON:
Parent(PERSON, CHILD) AND Likes(CHILD, FOOD)
*/
PROJECT child, food (Parent NATURAL JOIN (RENAME person:=child Likes))
So every query expression's value holds the rows that make its predicate into a true statement.
Suppose we define algebraic self-join of a table as NATURAL JOIN of two tables got from an original via sequences of zero or more renamings. Per the above we NATURAL JOIN for rows that satisfy the AND of predicates.
A self-join arises when we want the rows that satisfy a result predicate that we expressed via multiple uses of a predicate that differ only in parameters/columns.
/* (PERSON, FOOD, CHILD) rows where
person PERSON and person CHILD like the same food FOOD
i.e. where
person PERSON likes food FOOD AND person CHILD likes food FOOD
i.e. where
Likes(PERSON, FOOD) AND Likes(CHILD, FOOD)
*/
Likes NATURAL JOIN (RENAME person:=child Likes)
There's nothing special about a self-join arising in a given query in a given application other than that.
SQL SELECT DISTINCT statements can be described via algebraic operators. That determines query predicates. First FROM table columns that aren't SQL NATURAL JOIN common columns are RENAMEd by prefixing a table alias (correlation name) and a dot. Then for CROSS & INNER JOIN the new tables are CROSS JOINed, but assuming each input has no duplicate column names, after dotting that's NATURAL JOIN. Then ON and WHERE RESTRICT per a condition. Then SELECT DISTINCT uses RENAME or = to give dotless returned columns and PROJECTS away unwanted dotted columns. SQL accomplishes what algebra RENAMEs do but it is via SELECT AS or implicitly dotting.
We can convert SQL to predicates directly: A FROM table contributes a predicate like its predicate but with the parameters dotted with its alias; if it's a subquery, rename conflicting uses of that alias within its predicate. CROSS & INNER JOIN give AND. ON and WHERE give AND with a condition. Each dot-free result column gives an AND that it equals its dotted version. Finally dropping all dotted columns gives EXISTS.
/* same as above */
/* (PERSON, FOOD, CHILD) rows where
person PERSON and person CHILD like the same food FOOD
i.e. where
there EXISTS values for P.* & C.* such that
PERSON = P.PERSON AND CHILD = C.PERSON AND FOOD = P.FOOD
AND person P.CHILD likes food P.FOOD
AND person C.CHILD likes food C.FOOD
AND P.FOOD = C.FOOD
i.e. where
EXISTS P.*, C.*:
PERSON = P.PERSON AND CHILD = C.PERSON AND FOOD = P.FOOD
AND Likes(P.CHILD, P.FOOD)
AND Likes(C.CHILD, C.FOOD)
AND P.FOOD = C.FOOD
*/
SELECT DISTINCT p.person AS person, c.person AS child, p.food AS food
FROM Likes p INNER JOIN Likes c
ON p.food = c.food
Again: In SQL we say there is a self-join when multiple table aliases of a JOIN are associated with the same table value; in application terms that means we can express a query meaning in terms of predicates differing in some parameters/columns; there's nothing special about applications or table meanings for this to arise.
A query with scalar subqueries in SELECT and/or WHERE can be rearrranged to only involve JOINs of the subqueries in a FROM. So we often call such a query a self-join. Correspondingly, such a query's predicate has parameters and quantifiers (names introduced by EXISTS and FORALL) appearing as or equal to arguments to the subquery predicates, and the query can be rearranged to instead AND those subquery predicates in the part of a rearranged query predicate that corresponds to the original's FROM.
Re SQL query semantics see my answer at Is there any rule of thumb to construct SQL query from a human-readable description?. And it happens to give a self-join as an example.
For another self-join see my answer at How to get matching data from another SQL table for two different columns: Inner Join and/or Union?.
Re SQL NATURAL JOINs see my answer at Inner Join vs Natural Join vs USING clause: are there any advantages?.
, col2β¦col2
(tableX) Query2 = Ο col1 = col1 ` (Ο col2 = col2'(Query1)) would"return full list of unique lines in tableX" β Desma