Here is an attempt at a canonical answer on the differences between pivot
and unstack
. For a complete guide on reshaping, pandas's official documentation on reshaping and pivot tables is a must read.
pivot
and unstack
perform roughly the same operation, but they operate on different logical levels: columns and index levels, respectively.
I will use this example dataframe as input:
df = pd.DataFrame({'col1': list('ABCABC'),
'col2': list('aaabbb'),
'col3': list('uvwxyz'),
})
col1 col2 col3
0 A a u
1 B a v
2 C a w
3 A b x
4 B b y
5 C b z
Using pivot
on columns
pandas.DataFrame.pivot
operates on columns
NB. when the index
argument if left unused, it will use the current index.
df.pivot(index='col1', columns='col2', values='col3')
col2 a b
col1
A u x
B v y
C w z
Using unstack
on MultiIndexes
There are two use cases here whether the input is a Series or a DataFrame.
We will generate first a Series with MultIndex from the initial DataFrame:
series = df.set_index(['col1', 'col2'])['col3']
col1 col2
A a u
B a v
C a w
A b x
B b y
C b z
Name: col3, dtype: object
We see that the data is very similar to the original DataFrame, but col1
and col2
are now index levels, and the data itself is now one-dimensional (i.e., a Series)
Now, we can apply unstack
to pivot by default the right-most (last) index level as columns to generate a DataFrame. There are many ways to specify the index level to unstack so all these options are equivalent:
series.unstack()
series.unstack('col2') # by level name
series.unstack(1) # by level position from the left
series.unstack(-1) # by level position from the end (-1 = last)
col2 a b
col1
A u x
B v y
C w z
This means that df.pivot(index='col1', columns='col2', values='col3')
and df.set_index(['col1', 'col2'])['col3'].unstack()
are logically equivalent.
The DataFrame version of unstack
is very similar to the Series's one, with the exception that, as the data is already two-dimensional, it will create an extra level of index for the columns.
df.set_index(['col1', 'col2']).unstack(level='col2')
col3
col2 a b
col1
A u x
B v y
C w z
Here again, the same output can be obtained using pivot
, by passing a list-encapsulated column name to values
:
df.pivot(index='col1', columns='col2', values=['col3'])
col3
col2 a b
col1
A u x
B v y
C w z