Pivot for redshift database
Asked Answered
M

6

22

I know this question has been asked before but any of the answers were not able to help me to meet my desired requirements. So asking the question in new thread

In redshift how can use pivot the data into a form of one row per each unique dimension set, e.g.:

id         Name               Category         count
8660     Iced Chocolate         Coffees         105
8660     Iced Chocolate         Milkshakes      10
8662     Old Monk               Beer            29
8663     Burger                 Snacks          18

to

id        Name              Cofees  Milkshakes  Beer  Snacks
8660    Iced Chocolate       105       10        0      0
8662    Old Monk             0         0        29      0
8663    Burger               0         0         0      18

The category listed above gets keep on changing. Redshift does not support the pivot operator and a case expression would not be of much help (if not please suggest how to do it)

How can I achieve this result in redshift?

(The above is just an example, we would have 1000+ categories and these categories keep's on changing)

Manchester answered 9/3, 2017 at 11:24 Comment(1)
Check out this workaround. Not ideal, but does allow for arbitrary numbers of categories. #42684679Lupercalia
A
10

i don't think there is a easy way to do that in Redshift,

also you say you have more then 1000 categories and the number is growing you need to taking in to account you have limit of 1600 columns per table,

see attached link [http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html][1]

you can use case but then you need to create case for each category

select id,
       name,
       sum(case when Category='Coffees' then count end) as Cofees,       
       sum(case when Category='Milkshakes' then count end) as Milkshakes,
       sum(case when Category='Beer' then count end) as Beer,
       sum(case when Category='Snacks' then count end) as Snacks
from my_table
group by 1,2

other option you have is to upload the table for example to R and then you can use cast function for example.

cast(data, name~ category)

and then upload the data back to S3 or Redshift

Atmospherics answered 9/3, 2017 at 13:41 Comment(2)
Thanks for reply and informing me about limitations of redshiftManchester
Amazon](500310) Invalid operation: syntax error at or near ")"Translate
J
8

We do a lot of pivoting at Ro - we built python based tool for autogenerating pivot queries. This tool allows for the same basic options as what you'd find in excel, including specifying aggregation functions as well as whether you want overall aggregates.

Joub answered 12/12, 2018 at 16:11 Comment(0)
S
6

Redshift released a Pivot/Unpivot functionality on last re:Invent 2021 (December 2021): https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause-pivot-unpivot-examples.html

SELECT *
FROM (SELECT id, Name, Category, count FROM my_table) PIVOT (
    SUM(count) FOR Category IN ('Coffees', 'Milkshakes', 'Beer', 'Snacks')
);
Suffice answered 18/1, 2022 at 22:16 Comment(1)
Came here to answer this but found out it's already here :PLori
L
2

If you will typically want to query specific subsets of the categories from the pivot table, a workaround based on the approach linked in the comments might work.

You can populate your "pivot_table" from the original like so:

insert into pivot_table (id, Name, json_cats) (
    select id, Name,
        '{' || listagg(quote_ident(Category) || ':' || count, ',')
               within group (order by Category) || '}' as json_cats
    from to_pivot
    group by id, Name
)

And access specific categories this way:

select id, Name,
    nvl(json_extract_path_text(json_cats, 'Snacks')::int, 0) Snacks,
    nvl(json_extract_path_text(json_cats, 'Beer')::int, 0) Beer
from pivot_table

Using varchar(max) for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.

Lupercalia answered 9/3, 2017 at 17:31 Comment(3)
Thanks for the help systemjack , but i think even extracting a json i need to manually write tht category name and then extract tht , Since the menu keep and changing and category are bound to change with time so have to manually edit the pivot_table query to extracted updated category and also when further restaurants are being added the count of columns wuld increase query has to be edited for a change in categoryManchester
and also as was pointed out in above answer redshift limits a column to 1600 for a single table So i think what i am trying to implement using redshift wuld not be feasible If wrng plz suggest how else to do itManchester
You can not do select * with this approach, so that's a limitation that might make it unworkable for some. You do have to know and specify the categories. It doesn't have the 1600 column issue since it's storing everything in only 3 columns. The category data is all being packed into a single string that happens to be JSON formatted.Lupercalia
M
2

@user3600910 is right with the approach however 'END' is required else '500310' invalid operation would occur.

select id,
       name,
       sum(case when Category='Coffees' then count END) as Cofees,       
       sum(case when Category='Milkshakes' then count END) as Milkshakes,
       sum(case when Category='Beer' then count END) as Beer,
       sum(case when Category='Snacks' then count END) as Snacks
from my_table
group by 1,2
Marchese answered 2/8, 2019 at 10:3 Comment(0)
R
1

The answer given above worked for me after switching count to 1

select id,
       name,
       sum(case when Category='Coffees' then 1 end) as Cofees,       
       sum(case when Category='Milkshakes' then 1 end) as Milkshakes,
       sum(case when Category='Beer' then 1 end) as Beer,
       sum(case when Category='Snacks' then 1 end) as Snacks
from my_table
group by 1,2
Romanfleuve answered 26/10, 2021 at 16:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.