HIVE Pivot and Sum
Asked Answered
A

2

5

I have a table that I am trying to figure out how to pivot and sum based on the values in a second column.

Example input:

|own|pet|qty|
|---|---|---|
|bob|dog| 2 |
|bob|dog| 3 |
|bob|dog| 1 |
|bob|cat| 1 |
|jon|dog| 1 |
|jon|cat| 1 |
|jon|cat| 1 |
|jon|cow| 4 |
|sam|dog| 3 |
|sam|cow| 1 |
|sam|cow| 2 |

Example output:

|own|dog|cat|cow|
|---|---|---|---|
|bob| 6 | 1 |   |
|jon| 1 | 2 | 4 |
|sam| 1 |   | 3 |
Androw answered 26/6, 2017 at 23:18 Comment(1)
Please provide some code snippets of what you have tried, it will be helpful for others. See stackoverflow.com/help/how-to-ask for helpful tips.Lukey
V
9

Use case and sum():

select own, sum(case when pet='dog' then qty end) as dog,
            sum(case when pet='cat' then qty end) as cat,
            sum(case when pet='cow' then qty end) as cow
  from your_table
 group by own;
Vanny answered 27/6, 2017 at 5:22 Comment(2)
This works well when the values are known, but do you know if there is a way to do this without having to specify each of the fields? Some of the datasets I am working with the data is dynamic.Androw
In HiveQL you should know columns. If you need dynamic SQL then use shell/java/etc, select distinct query values, build query and execute in HiveVanny
D
2

For dynamic data you can use MAP

select      own
           ,str_to_map(concat_ws(',',collect_list(concat(pet,':',cast(qty as string))))) as pet_qty

from       (select      own,pet
                       ,sum(qty) qty 

            from        mytable

            group by    own,pet
            ) t

group by    own
;

+-----+---------------------------------+
| own |             pet_qty             |
+-----+---------------------------------+
| bob | {"cat":"1","dog":"6"}           |
| jon | {"cat":"2","cow":"4","dog":"1"} |
| sam | {"cow":"3","dog":"3"}           |
+-----+---------------------------------+
Disherison answered 27/6, 2017 at 18:53 Comment(6)
Hi, did you take a look on this suggestion?Avlona
I like your solution. Is there a way to get it back inot column format?Fiann
If you know the pets values (cat, dog etc.) In advanceAvlona
No I have a table with more thant 4000 valuesFiann
@Duesentrieb, so what's the point turning all these values to columns?Avlona
I would like to prepare a table customer X items to calculate correlations between them and I'm not able to use spark for it. So I have to build up my table and then export this table to a local machine and calculate the correlationsFiann

© 2022 - 2024 — McMap. All rights reserved.