why and when should use a stack() and unstack() methods?

I'm very confused about these two methods which are: stack() and unstack() I know that I should use them in the case of multi-Indexes however, I need to know the following:

1- I don't know where I should use stack or unstack
2- why I should use them

when I use "pivot" what I understand is that the pivot converts Dataframe to be the unstack form, if that is correct so, I need to know why when I use the following line code it raises an error:

data.stack(level=1) # IndexError: Too many levels: Index has only 1 level, not 2

but when I do that following it runs:

data.unstack().stack(level=1)

sometimes, I see that stack has kwargs like so, level=-1 I don't know when I have to place "-1" and what does that mean

I know that I misunderstand a lot of stuff but I'm very confused

so, any help to understand these terms, please?

thx in advance

Here is an attempt at a canonical answer on the differences between pivot and unstack. For a complete guide on reshaping, pandas's official documentation on reshaping and pivot tables is a must read.

pivot and unstack perform roughly the same operation, but they operate on different logical levels: columns and index levels, respectively.

I will use this example dataframe as input:

df = pd.DataFrame({'col1': list('ABCABC'),
                   'col2': list('aaabbb'),
                   'col3': list('uvwxyz'),
                  })

  col1 col2 col3
0    A    a    u
1    B    a    v
2    C    a    w
3    A    b    x
4    B    b    y
5    C    b    z

Using `pivot` on columns

pandas.DataFrame.pivot operates on columns

NB. when the index argument if left unused, it will use the current index.

df.pivot(index='col1', columns='col2', values='col3')

col2  a  b
col1      
A     u  x
B     v  y
C     w  z

Using `unstack` on MultiIndexes

There are two use cases here whether the input is a Series or a DataFrame.

`pandas.Series.unstack`

We will generate first a Series with MultIndex from the initial DataFrame:

series = df.set_index(['col1', 'col2'])['col3']

col1  col2
A     a       u
B     a       v
C     a       w
A     b       x
B     b       y
C     b       z
Name: col3, dtype: object

We see that the data is very similar to the original DataFrame, but col1 and col2 are now index levels, and the data itself is now one-dimensional (i.e., a Series)

Now, we can apply unstack to pivot by default the right-most (last) index level as columns to generate a DataFrame. There are many ways to specify the index level to unstack so all these options are equivalent:

series.unstack()

series.unstack('col2') # by level name

series.unstack(1) # by level position from the left

series.unstack(-1) # by level position from the end (-1 = last)

col2  a  b
col1      
A     u  x
B     v  y
C     w  z

This means that df.pivot(index='col1', columns='col2', values='col3') and df.set_index(['col1', 'col2'])['col3'].unstack() are logically equivalent.

`pandas.DataFrame.unstack`

The DataFrame version of unstack is very similar to the Series's one, with the exception that, as the data is already two-dimensional, it will create an extra level of index for the columns.

df.set_index(['col1', 'col2']).unstack(level='col2')

     col3   
col2    a  b
col1        
A       u  x
B       v  y
C       w  z

Here again, the same output can be obtained using pivot, by passing a list-encapsulated column name to values:

df.pivot(index='col1', columns='col2', values=['col3'])

     col3   
col2    a  b
col1        
A       u  x
B       v  y
C       w  z

Using `pivot` on columns

Using `unstack` on MultiIndexes

`pandas.Series.unstack`

`pandas.DataFrame.unstack`

Recommended topics

Hot tags

Using pivot on columns

Using unstack on MultiIndexes

pandas.Series.unstack

pandas.DataFrame.unstack

Recommended topics

Hot tags

Using `pivot` on columns

Using `unstack` on MultiIndexes

`pandas.Series.unstack`

`pandas.DataFrame.unstack`