Pandas setting multi-index on rows, then transposing to columns
Asked Answered
R

4

14

If I have a simple dataframe:

print(a)

  one  two three
0   A    1     a
1   A    2     b
2   B    1     c
3   B    2     d
4   C    1     e
5   C    2     f

I can easily create a multi-index on the rows by issuing:

a.set_index(['one', 'two'])

        three
one two      
A   1       a
    2       b
B   1       c
    2       d
C   1       e
    2       f

Is there a similarly easy way to create a multi-index on the columns?

I'd like to end up with:

    one A       B       C   
    two 1   2   1   2   1   2
    0   a   b   c   d   e   f

In this case, it would be pretty simple to create the row multi-index and then transpose it, but in other examples, I'll be wanting to create a multi-index on both the rows and columns.

Revolving answered 16/8, 2016 at 17:8 Comment(2)
It looks like a.pivot(index='one', columns='two', values='three') is getting closer to what I want (extracting info from the df and turning them into columns), though I haven't quite figured out how to make the multi-index.Revolving
I don't think you want to "set multi-index on columns", I think you want to set it on rows, then transpose rows to columns? Please edit your question to be clearerMallorymallow
S
9

Yes! It's called transposition.

a.set_index(['one', 'two']).T

enter image description here


Let's borrow from @ragesz's post because they used a much better example to demonstrate with.

df = pd.DataFrame({'a':['foo_0', 'bar_0', 1, 2, 3], 'b':['foo_0', 'bar_1', 11, 12, 13],
    'c':['foo_1', 'bar_0', 21, 22, 23], 'd':['foo_1', 'bar_1', 31, 32, 33]})

df.T.set_index([0, 1]).T

enter image description here

Schmid answered 16/8, 2016 at 17:41 Comment(4)
OP, does not want to use transpose because they desire to have multi-index on columns and rows.Calie
Maybe a .reset_index(drop=True) at the and would be necessary, and in a new line a df.columns.name = ['first', 'second'] to rename column headers.Soutine
@Calie That's ok. You can transpose then set_index, transpose then set_index. If OP put's an example they'd like for both, I'm happy to show how it's done. I'd make one right now, but I have to run for a while.Schmid
@piRSquared, I was thinking along those lines as well.Calie
P
4

You could use pivot_table followed by a series of manipulations on the dataframe to get the desired form:

df_pivot = pd.pivot_table(df, index=['one', 'two'], values='three', aggfunc=np.sum)

def rename_duplicates(old_list):    # Replace duplicates in the index with an empty string
    seen = {}
    for x in old_list:
        if x in seen:
            seen[x] += 1
            yield " " 
        else:
            seen[x] = 0
            yield x

col_group = df_pivot.unstack().stack().reset_index(level=-1)
col_group.index = rename_duplicates(col_group.index.tolist())
col_group.index.name = df_pivot.index.names[0]
col_group.T

one  A     B     C   
two  1  2  1  2  1  2
0    a  b  c  d  e  f
Prehension answered 16/8, 2016 at 18:29 Comment(0)
S
1

I think the short answer is NO. To have multi-index columns, the dataframe should have two (or more) rows to be converted into headers (like columns for multi-index rows). If you have this kind of dataframe, creating multi-index header is not so difficult. It can be done in a very long line of code, and you can reuse it at any other dataframe, only the row numbers of the headers should be kept in mind & change if differs:

df = pd.DataFrame({'a':['foo_0', 'bar_0', 1, 2, 3], 'b':['foo_0', 'bar_1', 11, 12, 13],
    'c':['foo_1', 'bar_0', 21, 22, 23], 'd':['foo_1', 'bar_1', 31, 32, 33]})

The dataframe:

       a      b      c      d
0  foo_0  foo_0  foo_1  foo_1
1  bar_0  bar_1  bar_0  bar_1
2      1     11     21     31
3      2     12     22     32
4      3     13     23     33

Creating multi-index object:

arrays = [df.iloc[0].tolist(), df.iloc[1].tolist()]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

df.columns = index

Multi-index header result:

first   foo_0         foo_1       
second  bar_0  bar_1  bar_0  bar_1
0       foo_0  foo_0  foo_1  foo_1
1       bar_0  bar_1  bar_0  bar_1
2           1     11     21     31
3           2     12     22     32
4           3     13     23     33

Finally we need to drop 0-1 rows then reset the row index:

df = df.iloc[2:].reset_index(drop=True)

The "one-line" version (only thing you have to change is to specify header indexes and the dataframe itself):

idx_first_header = 0
idx_second_header = 1

df.columns = pd.MultiIndex.from_tuples(list(zip(*[df.iloc[idx_first_header].tolist(),
    df.iloc[idx_second_header].tolist()])), names=['first', 'second'])

df = df.drop([idx_first_header, idx_second_header], axis=0).reset_index(drop=True)
Soutine answered 16/8, 2016 at 17:39 Comment(0)
M
1

Message From The Future

To any lost souls who came across these questions and answers from 2016, there is a hugely simpler solution that also works with multiindexes:

The Setup

id1 = ['A', 'B', 'C']
id2 = [1, 2]
identifiers = list(itertools.product(id1,id2))
identifier_names = ['one', 'two']
df = pd.DataFrame(identifiers, columns=identifier_names)
df['three'] = ['a','b','c','d','e','f']
df.set_index(identifier_names, inplace=True)
print(df)
        three
one two      
A   1       a
    2       b
B   1       c
    2       d
C   1       e
    2       f

The Solution

df = df.stack().unstack(identifier_names)
one    A     B     C   
two    1  2  1  2  1  2
three  a  b  c  d  e  f

Hope that saves somebody the 3 hours it took me to discover!

Merilee answered 4/10, 2021 at 4:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.