I want to group by a given field and get the output with grouped fields. Below is an example of what I am trying to achieve:-
Imagine a table named 'sample_table' with two columns as below:-
F1 F2
001 111
001 222
001 123
002 222
002 333
003 555
I want to write Hive Query that will give the below output:-
001 [111, 222, 123]
002 [222, 333]
003 [555]
In Pig, this can be very easily achieved by something like this:-
grouped_relation = GROUP sample_table BY F1;
Can somebody please suggest if there is a simple way to do so in Hive? What I can think of is to write a User Defined Function (UDF) for this but this may be a very time consuming option.
collect_list
function which would return duplicates. – Featly