Spark INLINE Vs. LATERAL VIEW EXPLODE differences?

WITH sample AS ( SELECT 1 AS id, array(NAMED_STRUCT('name', 'frank', 'age', 40, 'state', 'Texas' ), NAMED_STRUCT('name', 'maria', 'age', 51, 'state', 'Georgia' ) ) AS array_of_structs ), inline_data AS ( SELECT id, INLINE(array_of_structs) FROM sample ) SELECT id, name AS person_name, age AS person_age FROM inline_data

WITH sample AS ( SELECT 1 AS id, array(NAMED_STRUCT('name', 'frank', 'age', 40, 'state', 'Texas' ), NAMED_STRUCT('name', 'maria', 'age', 51, 'state', 'Georgia' ) ) AS array_of_structs ) SELECT id, person.name, person.age FROM sample LATERAL VIEW EXPLODE(array_of_structs) exploded_people as person

EXPLODE UDTF will generate rows of struct (single column of type struct), and to get person name you need to use person.name:

WITH sample AS (
 SELECT 1 AS id,
        array(NAMED_STRUCT('name', 'frank',
                           'age', 40,
                           'state', 'Texas'
                           ),
              NAMED_STRUCT('name', 'maria',
                           'age', 51,
                           'state', 'Georgia'
                           )
              )            
            AS array_of_structs
)

SELECT  id,
        person.name,
        person.age
FROM sample
LATERAL VIEW explode(array_of_structs) exploded_people as person

Result:

id,name,age
1,frank,40
1,maria,51

And INLINE UDTF will generate a row-set with N columns (N = number of top level elements in the struct), so you do not need to use dot notation person.name because name and other struct elements are already extracted by INLINE:

WITH sample AS (
 SELECT 1 AS id,
        array(NAMED_STRUCT('name', 'frank',
                           'age', 40,
                           'state', 'Texas'
                           ),
              NAMED_STRUCT('name', 'maria',
                           'age', 51,
                           'state', 'Georgia'
                           )
              )            
            AS array_of_structs
)

SELECT  id,
        name,
        age
FROM sample
LATERAL VIEW inline(array_of_structs) exploded_people as name, age, state

Result:

id,name,age
1,frank,40
1,maria,51

Both INLINE and EXPLODE are UDTFs and require LATERAL VIEW in Hive. In Spark it works fine without lateral view. The only difference is that EXPLODE returns dataset of array elements(struct in your case) and INLINE is used to get struct elements already extracted. You need to define all struct elements in case of INLINE like this: LATERAL VIEW inline(array_of_structs) exploded_people as name, age, state

From performance perspective both INLINE and EXPLODE work the same, you can use EXPLAIN command to check the plan. Extraction of struct elements in the UDTF or after UDTF does not affect performance.

INLINE requires to describe all struct elements (in Hive) and EXPLODE does not, so, explode may be more convenient if you do not need to extract all struct elements of if you do not need to extract elements at all. INLINE is convenient when you need to extract all or most of struct elements.

Your first code example works only in Spark. In Hive 2.1.1 it throws an exception because lateral view required.

In Spark this will work also:

inline_data AS (
SELECT id,
        EXPLODE(array_of_structs) as person
FROM sample
)

And to get age column you need to use person.age

Recommended topics

Hot tags