Is there a difference between OUTER & FULL_OUTER in Spark SQL?
Asked Answered
I

3

14

Spark SQL documentation specifies that join() supports the following join types:

Must be one of: inner, cross, outer, full, full_outer, left, left_outer, right, right_outer, left_semi, and left_anti.

Spark SQL Join()

Is there any difference between outer and full_outer? I suspect not, I suspect they are just synonyms for each other, but wanted to get clarity.

Insurmountable answered 2/10, 2017 at 9:37 Comment(0)
Z
9

There is no difference between outer and full_outer - they are the same. See the following answer for a demonstration: What are the various join types in Spark?

Zabaglione answered 20/6, 2018 at 7:13 Comment(0)
T
6

Spark v2.4.0 join code (the _ has been suppressed):

case "inner" => Inner
case "outer" | "full" | "fullouter" => FullOuter
case "leftouter" | "left" => LeftOuter
case "rightouter" | "right" => RightOuter
case "leftsemi" => LeftSemi
case "leftanti" => LeftAnti
case "cross" => Cross

So Spark really supports: Inner, FullOuter, LeftOuter, RightOuter, LeftSemi, LeftAnti, and Cross.

Quick example, given:

+---+-----+
| id|value|
+---+-----+
|  1|   A1|
|  2|   A2|
|  3|   A3|
|  4|   A4|
+---+-----+

and:

+---+-----+
| id|value|
+---+-----+
|  3|   A3|
|  4|   A4|
|  4| A4_1|
|  5|   A5|
|  6|   A6|
+---+-----+

You get:

OUTER JOIN

+----+-----+----+-----+
|  id|value|  id|value|
+----+-----+----+-----+
|null| null|   5|   A5|
|null| null|   6|   A6|
|   1|   A1|null| null|
|   2|   A2|null| null|
|   3|   A3|   3|   A3|
|   4|   A4|   4|   A4|
|   4|   A4|   4| A4_1|
+----+-----+----+-----+

FULL_OUTER JOIN

+----+-----+----+-----+
|  id|value|  id|value|
+----+-----+----+-----+
|null| null|   5|   A5|
|null| null|   6|   A6|
|   1|   A1|null| null|
|   2|   A2|null| null|
|   3|   A3|   3|   A3|
|   4|   A4|   4|   A4|
|   4|   A4|   4| A4_1|
+----+-----+----+-----+
Thinskinned answered 1/2, 2019 at 15:28 Comment(0)
S
0

create table test1 (c1 nvarchar(1), c2 nvarchar(1));

create table test2 (c2 nvarchar(1), c3 nvarchar(1));

insert test1 values ('A','1');

insert test1 values ('B','2');

insert test2 values ('2','b');

insert test2 values ('3','c');

select * from test1;

select * from test2;

select * from test1 full outer join test2 on test1.c2 = test2.c2

select * from test1 left join test2 on test1.c2 = test2.c2

Servomechanism answered 16/8 at 20:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.