Using the same table alias twice in a query
Asked Answered
P

2

7

My coworker, who is new to ANSI join syntax, recently wrote a query like this:

SELECT count(*)
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t3 ON
             (t3.col_c = t1.col_c);

Note that table3 is joined to both table1 and table2 on different columns, but the two JOIN clauses use the same table alias for table3.

The query runs, but I'm unsure of it's validity. Is this a valid way of writing this query?

I thought the join should be like this:

SELECT count(*)
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b AND
             t3.col_c = t1.col_c);

Are the two versions functionally identical? I don't really have enough data in our database yet to be sure.

Thanks.

Percept answered 3/1, 2018 at 21:52 Comment(4)
It probably runs because both alias point to the same table, so there's no ambiguity, which is what the query analyzer / optimizer always complains about.Ursal
By the way the brackets aren't needed (they aren't part of the syntax so they are ignored). In my experience they confuse the heck out of code formatters.Palmore
I must admit I was surprised that this works. I agree with you that the construction seems perverse and hard to validate, even if it is in fact valid.Palmore
I agree that the parens are unnecessary, and I wouldn't have used them, but my coworker did. We're all Oracle programmers, so none of us are expert at the ANSI syntax, but I've been using it for a few years now, and everybody else is just starting because it just recently became an official standard for us.Percept
M
8

The first query is a join of 4 tables, the second one is a join of 3 tables. So I don't expect that both queries return the same numbers of rows.

SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t3 ON
             (t3.col_c = t1.col_c);
             

The alias t3 is only used in the ON clause. The alias t3 refers to the table before the ON keyword. I found this out by experimenting. So the pervious query is equvivalent to

SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t4 ON
             (t4.col_c = t1.col_c);

and this can be transfotmed in a traditional join

SELECT *
  FROM table1 t1,
       table2 t2,
       table3 t3,
       table3 t4
where (t1.col_a = t2.col_a)
    and  (t2.col_b = t3.col_b)
    and (t4.col_c = t1.col_c);
    

The second query is

SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b AND
             t3.col_c = t1.col_c);
             

This can also transformed in a traditional join

SELECT *
  FROM table1 t1,
    table2 t2,
    table3 t3
where (t1.col_a = t2.col_a)
    and (t2.col_b = t3.col_b)
    AND (t3.col_c = t1.col_c);
    

These queries seem to be different. To proof their difference we use the following example:

create table table1(
    col_a number,
    col_c number
);

create table table2(
    col_a number,
    col_b number
);

create table table3(
    col_b number,
    col_c number
);

insert into table1(col_a, col_c) values(1,3);
insert into table1(col_a, col_c) values(4,3);
insert into table2(col_a, col_b) values(1,2);
insert into table2(col_a, col_b) values(4,2);
insert into table3(col_b, col_c) values(2,3);
insert into table3(col_b, col_c) values(2,5);
insert into table3(col_b, col_c) values(7,9);

commit;

We get the following output

SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t3 ON
             (t3.col_c = t1.col_c)


| COL_A | COL_C | COL_A | COL_B | COL_B | COL_C | COL_B | COL_C |
|-------|-------|-------|-------|-------|-------|-------|-------|
|     1 |     3 |     1 |     2 |     2 |     3 |     2 |     3 |
|     4 |     3 |     4 |     2 |     2 |     3 |     2 |     3 |
|     1 |     3 |     1 |     2 |     2 |     5 |     2 |     3 |
|     4 |     3 |     4 |     2 |     2 |     5 |     2 |     3 |



             
SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b AND
             t3.col_c = t1.col_c)

| COL_A | COL_C | COL_A | COL_B | COL_B | COL_C |
|-------|-------|-------|-------|-------|-------|
|     4 |     3 |     4 |     2 |     2 |     3 |
|     1 |     3 |     1 |     2 |     2 |     3 |

The number of rows retrieved is different and so count(*) is different.

The usage of the aliases was surprising. at least for me.

The following query works because t1 in the where_clause references table2.

select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_b<0;

The following query works because t1 in the where_clause references table1.

select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_c<0;

The following query raises an error because both table1 and table2 contain a column col_a.

select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_a<0;

The error thrown is

ORA-00918: column ambiguously defined

The following query works, the alias t1 refers to two different tables in the same where_clause.

select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_b<0 and t1.col_c<0;

These and more examples can be found here: http://sqlfiddle.com/#!4/84feb/12

The smallest counter example

The smallest counter example is

table1
col_a  col_c
    1      2

table2
col_a  col_b
    1      3

table3
col_b  col_c
    3      5
    6      2    

Here the second query has an empty result set and the first query returns one row. It can be shown that the count(*) of the second query never exeeds the count(*)of the first query.

A more detailed explanation

This behaviour will became more clear if we analyze the following statement in detail.

SELECT t.col_b, t.col_c
  FROM table1 t
       JOIN table2 t ON
            (t.col_b = t.col_c) ;
        

Here is the reduced syntax for this query in Backus–Naur form derived from the syntax descriptions in the SQL Language Reference of Oracle 12.2. Note that under each syntax diagram there is a link to the Backus–Naur form of this diagram, e.g Description of the illustration select.eps. "reduced" means that I left out all the possibilities that where not used, e,g. the select is defined as

select::=subquery [ for_update_clause ] ;

Our query does not use the optional for_update_clause, so I reduced the rule to

select::=subquery

The only exemption is the optional where-clause. I didn't remove it so that this reduced rules can be used to analyze the above query even if we add a where_clause.

These reduced rule will define only a subset of all possible select statements.

select::=subquery 
subquery::=query_block
query_block::=SELECT select_list FROM join_clause [ where_clause ]
join_clause::=table_reference  inner_cross_join_clause ...  
table_reference::=query_table_expression  t_alias query_table_expression::=table
inner_cross_join_clause::=JOIN table_reference ON condition

So our select statement is a query_block and the join_clause is of type

table_reference inner_cross_join_clause

where table_reference is table1 t and inner_cross_join_clause is JOIN table2 t ON (t.col_b = t.col_c). The ellipsis ... means that there could be additional inner_cross_join_clauses, but we do not need this here.

in the inner_cross_join_clause the alias t refers to table2. Only if these references cannot be satisfied the aliasmust be searched in an outer scope. So all the following expressions in the ONcondition are valid:

t.col_b = t.col_c

Here t.col_b is table2.col_b because t refers to the alias of its inner_cross_join_clause, t.col_c is table1.col_c. t of the inner_cross_join_clause (refering to table2) has no column col_c so the outer scope will be searched and an appropriate alias will be found.

If we have the clause

t.col_a = t.col_a

the alias can be found as alias defined in the inner_cross_join_clause to which this ON-condition belongs so t will be resolved to table2.

if the select list consists of

t.col_c, t.col_b, t.col_a

instead of * then the join_clause will be searched for an alias and t.col_c will be resolved to table1.col_c (table2 does not contain a column col_c), t.col_b will be resolved to table2.col_b (table1 does not contain a col_b) but t.col_a will raise the error

ORA-00918: column ambiguously defined

because for the select_list none of the aias definition has a precedenve over the other. If our query also has a where_clause then the aliases are resolved in the same way as if they are used in the select_list.

Mosher answered 4/1, 2018 at 2:27 Comment(1)
That might be the most comprehensive answer I've ever seen to any question ever posted on Stackoverflow! Thanks!Percept
C
4

With more data, it will produce different results. Your colleagues query is same as this.

select * from table3 where t3.col_b = 'XX'
union
select * from table3 where t3.col_c = 'YY'

or

select * from table3 where t3.col_b = 'XX' or t3.col_c = 'YY'

while your query is like this.

select * from table3 where t3.col_b ='XX' and t3.col_c='YY'

First one is like data where (xx or yy) while second one is data where ( xx and yy)

Cardigan answered 3/1, 2018 at 22:3 Comment(8)
Thanks. I've been looking at the tables more closely, and it looks like the clause joining table3 to table1 is not necessary in either version, due to the way table2 joins to both of the others. The specifics of that don't really come through in the generic version I posted above.Percept
Do you have a link to documentation for this answer? I've never seen this syntax before, but I don't think it is exactly equivalent to a union all (since I get additional columns joining with the same alias) and I don't understand the more general case (since I can left outer join the same table with the same alias which confuses me). I'd like to read more about this.Rhabdomancy
Thanks for pointing out. You can check sqlfiddle.com/#!4/b4a8cc/6 for different queries. union all can produce duplicate rows depending on data conditions. So, if you use just union to eliminate duplicate, results will be same and logical operations are basically the same. and looks like assigning same alias only works in Oracle.Cardigan
Above comment was about union all and union how it produce different number of rows because of duplicate rows elimination and regarding the different columns you see when using join vs union or union all, I used union to show the different data sets each query selects from the table3, the selection criteria.Cardigan
Definitely not equivalent to a union all - it's equivalent to a Cartesian join.Brachypterous
Because union all is confusing people, I added simple select query. They produce same dataset from the table3, which has same effect as colleague's joining same table twice with different column. You can see that on sqlfiddle link. @mathguy, I don't think you are right about Cartesian Join. The colleague's query still is a inner join. To be a Cartesian, on clause shouldn't have any condition.Cardigan
It is a Cartesian join to the second instance of table3 only. First the inner join of table1, table2 and first instance of table3 is performed; the result of this three-table inner join is then cross-joined (Cartesian join) to the second instance of table3, because there are no conditions on this second instance. All conditions apply only to the first instance of table3 - even those that seem to be in the last join. The last set of join conditions simply further filter the result of the join of the "first three" tables (excluding the second instance of table3).Brachypterous
I think I kinda simplified it in sqlfiddle link provided. The result was not total rows times total rows.Cardigan

© 2022 - 2024 — McMap. All rights reserved.