why and when should use a stack() and unstack() methods?
Asked Answered
B

1

6

I'm very confused about these two methods which are: stack() and unstack() I know that I should use them in the case of multi-Indexes however, I need to know the following:

1- I don't know where I should use stack or unstack
2- why I should use them

when I use "pivot" what I understand is that the pivot converts Dataframe to be the unstack form, if that is correct so, I need to know why when I use the following line code it raises an error:

data.stack(level=1) # IndexError: Too many levels: Index has only 1 level, not 2

but when I do that following it runs:

data.unstack().stack(level=1)

sometimes, I see that stack has kwargs like so, level=-1 I don't know when I have to place "-1" and what does that mean

I know that I misunderstand a lot of stuff but I'm very confused

so, any help to understand these terms, please?

thx in advance

Bilberry answered 11/9, 2021 at 0:3 Comment(3)
Have you had a look at the pandas docs regarding these methods?Quality
yes I did but I still confusedBilberry
sure, no worries. Kindly provide reproducible data you were practising with and the community will attempt to provide answers that should make things clearerQuality
A
11

Here is an attempt at a canonical answer on the differences between pivot and unstack. For a complete guide on reshaping, pandas's official documentation on reshaping and pivot tables is a must read.

pivot and unstack perform roughly the same operation, but they operate on different logical levels: columns and index levels, respectively.

I will use this example dataframe as input:

df = pd.DataFrame({'col1': list('ABCABC'),
                   'col2': list('aaabbb'),
                   'col3': list('uvwxyz'),
                  })
  col1 col2 col3
0    A    a    u
1    B    a    v
2    C    a    w
3    A    b    x
4    B    b    y
5    C    b    z

Using pivot on columns

pandas.DataFrame.pivot operates on columns

NB. when the index argument if left unused, it will use the current index.

df.pivot(index='col1', columns='col2', values='col3')
col2  a  b
col1      
A     u  x
B     v  y
C     w  z

Using unstack on MultiIndexes

There are two use cases here whether the input is a Series or a DataFrame.

pandas.Series.unstack

We will generate first a Series with MultIndex from the initial DataFrame:

series = df.set_index(['col1', 'col2'])['col3']
col1  col2
A     a       u
B     a       v
C     a       w
A     b       x
B     b       y
C     b       z
Name: col3, dtype: object

We see that the data is very similar to the original DataFrame, but col1 and col2 are now index levels, and the data itself is now one-dimensional (i.e., a Series)

Now, we can apply unstack to pivot by default the right-most (last) index level as columns to generate a DataFrame. There are many ways to specify the index level to unstack so all these options are equivalent:

series.unstack()
series.unstack('col2') # by level name
series.unstack(1) # by level position from the left
series.unstack(-1) # by level position from the end (-1 = last)
col2  a  b
col1      
A     u  x
B     v  y
C     w  z

This means that df.pivot(index='col1', columns='col2', values='col3') and df.set_index(['col1', 'col2'])['col3'].unstack() are logically equivalent.

pandas.DataFrame.unstack

The DataFrame version of unstack is very similar to the Series's one, with the exception that, as the data is already two-dimensional, it will create an extra level of index for the columns.

df.set_index(['col1', 'col2']).unstack(level='col2')
     col3   
col2    a  b
col1        
A       u  x
B       v  y
C       w  z

Here again, the same output can be obtained using pivot, by passing a list-encapsulated column name to values:

df.pivot(index='col1', columns='col2', values=['col3'])
     col3   
col2    a  b
col1        
A       u  x
B       v  y
C       w  z
Astrea answered 11/9, 2021 at 5:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.