I have this data set:

np.random.seed(0)

test = pd.DataFrame({
    'a' : np.random.randint(0, 10, size=(10,)),
    'b' : np.random.randint(0, 10, size=(10,)),
    'c' : np.random.randint(0, 10, size=(10,)),
    'd' : np.random.randint(0, 10, size=(10,)),
})

print(test)

   a  b  c  d
0  5  7  5  2
1  0  6  9  3
2  3  8  8  8
3  3  8  9  1
4  7  1  4  3
5  9  6  3  3
6  3  7  0  3
7  5  7  3  7
8  2  8  5  0
9  4  1  0  1

When I run the following code I get far more columns than I thought I should.

tp = test.pivot_table(index=[
    'a',
], columns=[
    'b',
], values=[
    'c',
], aggfunc=[
    'nunique'
])

print(tp)

  nunique                                                       
        a                   b                   c               
b       1    6    7    8    1    6    7    8    1    6    7    8
a                                                               
0     NaN  1.0  NaN  NaN  NaN  1.0  NaN  NaN  NaN  1.0  NaN  NaN
2     NaN  NaN  NaN  1.0  NaN  NaN  NaN  1.0  NaN  NaN  NaN  1.0
3     NaN  NaN  1.0  1.0  NaN  NaN  1.0  1.0  NaN  NaN  1.0  2.0
4     1.0  NaN  NaN  NaN  1.0  NaN  NaN  NaN  1.0  NaN  NaN  NaN
5     NaN  NaN  1.0  NaN  NaN  NaN  1.0  NaN  NaN  NaN  2.0  NaN
7     1.0  NaN  NaN  NaN  1.0  NaN  NaN  NaN  1.0  NaN  NaN  NaN
9     NaN  1.0  NaN  NaN  NaN  1.0  NaN  NaN  NaN  1.0  NaN  NaN

I would expect to only get the subset of c columns, not the a and b columns as well. If I run this next code:

tp1 = test.pivot_table(index=[
    'a',
], columns=[
    'b',
], values='c', aggfunc=[
    'nunique'
])

print(tp1)

  nunique               
b       1    6    7    8
a                       
0     NaN  1.0  NaN  NaN
2     NaN  NaN  NaN  1.0
3     NaN  NaN  1.0  2.0
4     1.0  NaN  NaN  NaN
5     NaN  NaN  2.0  NaN
7     1.0  NaN  NaN  NaN
9     NaN  1.0  NaN  NaN

I get what I would've expected with the previous code. I can also get the expected output if I modify 'nunique' to pd.Series.nunique:

tp2 = test.pivot_table(index=[
    'a',
], columns=[
    'b',
], values=[
    'c',
], aggfunc=[
    pd.Series.nunique
])

print(tp2)

  nunique               
        c               
b       1    6    7    8
a                       
0     NaN  1.0  NaN  NaN
2     NaN  NaN  NaN  1.0
3     NaN  NaN  1.0  2.0
4     1.0  NaN  NaN  NaN
5     NaN  NaN  2.0  NaN
7     1.0  NaN  NaN  NaN
9     NaN  1.0  NaN  NaN

Question

Is this a bug? Or is there some underlying code that causes this? Shouldn't all three versions of the code produce the same (aside from column levels) output?

Another example with different `aggfunc`

When I run similar code but use count instead of nunique I get the expected results every time:

cp = test.pivot_table(index=[
    'a',
], columns=[
    'b',
], values=[
    'c',
], aggfunc=[
    'count'
])

cp2 = test.pivot_table(index=[
    'a',
], columns=[
    'b',
], values=[
    'c',
], aggfunc=[
    pd.Series.count
])

# both return the same thing
  count               
      c               
b     1    6    7    8
a                     
0   NaN  1.0  NaN  NaN
2   NaN  NaN  NaN  1.0
3   NaN  NaN  1.0  2.0
4   1.0  NaN  NaN  NaN
5   NaN  NaN  2.0  NaN
7   1.0  NaN  NaN  NaN
9   NaN  1.0  NaN  NaN

Question

Another example with different `aggfunc`

Solution

Recommended topics

Hot tags

Question

Another example with different aggfunc

Solution

Recommended topics

Hot tags

Another example with different `aggfunc`