Hive collect_list() does not collect NULL values
Asked Answered
P

2

8

I am trying to collect a column with NULLs along with some values in that column...But collect_list ignores the NULLs and collects only the ones with values in it. Is there a way to retrieve the NULLs along with other values ?

SELECT col1, col2, collect_list(col3) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2;

Actual col3 values

0.9
NULL
NULL
0.7
0.6 

Resulting col3 values

[0.9, 0.7, 0.6]

I was hoping that there is a hive solution that looks like this [0.9, NULL, NULL, 0.7, 0.6] after applying the collect_list.

Pensile answered 12/8, 2015 at 4:56 Comment(0)
R
8

This function works like this, but I've found the following workaround. Add a case when statement to your query to check and keep NULLs.

SELECT col1, 
    col2, 
    collect_list(CASE WHEN col3 IS NULL THEN 'NULL' ELSE col3 END) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2

Now, because you had a string element ('NULL') the whole result set is an array of strings. At the end just convert the array of strings to an array of double values.

Robotize answered 12/8, 2015 at 14:44 Comment(2)
I used a different method that does the same thing. Its still probably an array of strings which is an issue. What happens when I convert the null's to doubles ? SELECT col1, col2, collect_list(coalesce(col3, "NULL") as col3 FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3) GROUP BY col1, col2Pensile
If you convert null's to doubles the result is None (with this statement: cast(null as double)). However the collect_list is ignoring both NULL and None values.Robotize
U
0

Note: If your column is STRING it won't be having a NULL value even though your external file does not have any data for that column

you can a where condition with validation check like "col3 is NULL and col3 is not NULL"

Underlayer answered 12/8, 2015 at 7:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.