What is the difference between theta join and inner join?
Asked Answered
U

1

3

theta join and inner join look identical to me: they are Cartesian product followed by an arbitrary selection. Or am I missing their differences? Thanks.

The above wikipedia link says a theta-join takes a comparison & two attributes. But that is not correct. Instead, a theta-join can take any selection condition.

From Database System Concept which is supposed to follow the SQL standard and more coherent and reliable than wikipedia:

The theta join operation is a variant of the natural-join operation that allows us to combine a selection and a Cartesian product into a single operation. Consider relations r(R) and s(S), and let theta be a predicate on attributes in the schema R ∪ S. The theta join operation r join_theta s is defined as follows:

r join_theta s = sigma_theta(r × s)

....

the join operations we studied earlier that do not preserve nonmatched tuples are called inner join operations, to distinguish them from the outer-join operations.

and I still feel they are the same concept.

Difference between a theta join, equijoin and natural join doesn't explain the difference between a theta join and an inner join.

Urano answered 17/6, 2018 at 13:41 Comment(1)
The relational model (including the notion of relational algebras) predates SQL & SQL is a poor reflection of it.Edom
E
4

There is no single "relational algebra". They differ even in what a relation is. Codd originally defined theta join as taking a binary operator (the theta) & two attributes. That is what people usually mean by the term. From Codd's 1992 book The Relational Model for Database Management Version 2:

RB-14-RB-23 The Theta-Join Operator

The theta-join operator employs two R-tables as its operands. It generates as a result an R-table that contains rows of one operand (say S) concatenated with rows of the second operand (say T), but only where the specified condition is found to hold true. For brevity, this operator is often referred to as join.

The condition expressed in the join operator involves comparing each value from a column of S with each value from a column of T. The columns to be compared are indicated explicitly in the join command; these columns are called the comparand columns. This condition can involve any of the 10 comparators cited in the list [...]

(Often the term is used for an operator like that but where a value can take the place of an attribute.)

But Codd contradicts himself there, because there are "two operands" but also "the specified condition". Later he contradicts himself again, saying there are a bunch of joins collectively called the theta-joins, and equi-join is one of them:

RB-25 The Natural Join Operator

As described in the last section, an equi-join generates a result in which two of the columns are identical in values, although different in column names. These two columns are derived from the comparand columns of the operands; of course, the columns may be either simple or composite. Of the 10 types of theta-join, equi-join is the only one that yields a result in which the comparand columns are completely redundant, one with the other. The natural join behaves just like the equi-join except that one of the redundant columns, simple or composite, is omitted from the result.

The textbook you quote uses the term to refer to a different thing, where theta is a boolean expression combining comparisons. That is reminiscent of SQL INNER JOIN ON complex conditions. For Codd it is:

RB-24 The Boolean Extension of Theta-Join

Observe that your quoted textbook's theta-join is not an algebraic operator (mapping argument values to a result value)! It is a language non-terminal (in which a comparison expression does not denote a value). The textbook doesn't actually define sufficient algebra operators to implement its algebra-style query language. It's alleged set of basic algebra operators includes select but it actually only defines that as a language non-terminal since one operand is a condition expression.

(This sort of sloppiness is typical of relational database textbooks. There is a culture of fuzziness & confusion in the relational database field. Which, like the field, was begun by Codd.)

Edom answered 17/6, 2018 at 17:31 Comment(3)
Thanks. How about in Database System Concepts? It is much more standardized and modern.Urano
I don't understand. That is the textbook you quoted & that I just referred to in the answer & comment. If there was one decent textbook one could use I'd suggest it, but there isn't.Edom
"a culture of fuzziness & confusion ... begun by Codd" - that's true :) The man was a genius to invent the RM but it didn't emerge fully formed and -- gasp -- he even made mistakes. But do we still need to learn theta-join now that it has been 'superseded' by natural join? Still need to learn nulls when we know they are a disaster in practice? Other than to avoid repeating mistakes of the past, that is.Jeopardous

© 2022 - 2024 — McMap. All rights reserved.