How to implement `pivot` in clickhouse just like in dolphindb
Asked Answered
A

1

6

I want to do some pivot ops to some data. Just like following.

>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
...                            'two'],
...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
...                    'baz': [1, 2, 3, 4, 5, 6],
...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t
>>> df.pivot(index='foo', columns='bar', values='baz')
bar  A   B   C
foo
one  1   2   3
two  4   5   6

I know DolphinDB can do pivot in sql.

    dateValue=2007.08.01
    num=500
    syms = (exec count(*) from taq 
    where 
        date = dateValue, 
        time between 09:30:00 : 15:59:59, 
        0<bid, bid<ofr, ofr<bid*1.2
    group by symbol order by count desc).symbol[0:num]

    priceMatrix = exec avg(bid + ofr)/2.0 as price from taq 
    where 
        date = dateValue, Symbol in syms, 
        0<bid, bid<ofr, ofr<bid*1.2, 
        time between 09:30:00 : 15:59:59 
    pivot by time.minute() as minute, Symbol

but how to do pivot in clickhouse? Should I use client API to get data ? But there are too many rows, it's too difficult to deal with many rows. And if I can't use pandas, how to implement pivot operation easily?

Aleciaaleck answered 10/5, 2019 at 9:5 Comment(0)
O
7

It is the preliminary implementation that can help you make a start.

Remarks:

  • 'holes' in rows is not supported (each column should contain value)

  • the types of all column casted to common type (String)

  • introduced the field orderNum. It is the order number of source column in result (for example, 'bar'-column be 2nd)

  • the result represented as rows with one column with Array-type. The order of array items is defined by orderNum.

Prepare test data:

CREATE TABLE test.pivot_test
(    
    orderNum Int,
    s String,
    values Array(String)
) ENGINE = Memory;

INSERT INTO test.pivot_test
VALUES 
  (1, 'foo', ['one', 'one', 'one', 'two', 'two', 'two']),
  (3, 'baz', ['1', '2', '3', '4', '5', '6']),
  (4, 'zoo', ['x', 'y', 'z', 'q', 'w', 't']),
  (2, 'bar', ['A', 'B', 'C', 'A', 'B', 'C']);

/* 
The content of table test.pivot_test:

┌─orderNum─┬─s───┬─values────────────────────────────────┐
│        1 │ foo │ ['one','one','one','two','two','two'] │
│        3 │ baz │ ['1','2','3','4','5','6']             │
│        4 │ zoo │ ['x','y','z','q','w','t']             │
│        2 │ bar │ ['A','B','C','A','B','C']             │
└──────────┴─────┴───────────────────────────────────────┘
*/

Pivot-emulation:

SELECT arrayMap(x -> x.1, arraySort(x -> x.2, groupArray(value_ordernum))) as row
FROM
(
    SELECT
        (value, orderNum) AS value_ordernum,
        value_index
    FROM test.pivot_test
    ARRAY JOIN
        values AS value,
        arrayEnumerate(values) AS value_index
  /*
    The result of execution the nested query:

    ┌─value_ordernum─┬─value_index─┐
    │ ('one',1)      │           1 │
    │ ('one',1)      │           2 │
    │ ('one',1)      │           3 │
    │ ('two',1)      │           4 │
    │ ('two',1)      │           5 │
    │ ('two',1)      │           6 │
    │ ('1',3)        │           1 │
    │ ('2',3)        │           2 │
    │ ('3',3)        │           3 │
    │ ('4',3)        │           4 │
    │ ('5',3)        │           5 │
    │ ('6',3)        │           6 │
    │ ('x',4)        │           1 │
    │ ('y',4)        │           2 │
    │ ('z',4)        │           3 │
    │ ('q',4)        │           4 │
    │ ('w',4)        │           5 │
    │ ('t',4)        │           6 │
    │ ('A',2)        │           1 │
    │ ('B',2)        │           2 │
    │ ('C',2)        │           3 │
    │ ('A',2)        │           4 │
    │ ('B',2)        │           5 │
    │ ('C',2)        │           6 │
    └────────────────┴─────────────┘  
  */
)
GROUP BY value_index;

/*
The final result:

┌─row─────────────────┐
│ ['two','A','4','q'] │
│ ['one','C','3','z'] │
│ ['one','B','2','y'] │
│ ['two','B','5','w'] │
│ ['one','A','1','x'] │
│ ['two','C','6','t'] │
└─────────────────────┘
*/
Oleta answered 11/5, 2019 at 0:20 Comment(2)
how to do it by using ordinary table. foo, bar, baz, zoo as columns not valuesAleciaaleck
No way to generate dynamic columns using SQL so splitting by column not here. As I understood you will process this result on the server side so it does not be a problem to handle one column with array type instead of several ones. The problem is how to identify the items in an array - which one is 'bar' or 'foo'? When columns 'foo', 'bar' etc have some unique ID then it may define the order of an item in an array (in this case the column 'orderNum' in the example should be replaced with ID).Oleta

© 2022 - 2024 — McMap. All rights reserved.