Pandas: Create a tuple column from multiple columns
Asked Answered
P

3

5

I have the following data frame my_df:

Person       event         time
---------------------------------
John          A        2017-10-11
John          B        2017-10-12
John          C        2017-10-14
John          D        2017-10-15
Ann           X        2017-09-01
Ann           Y        2017-09-02
Dave          M        2017-10-05
Dave          N        2017-10-07
Dave          Q        2017-10-20

I want to create a new column, which is the (event, time) pair. It should look like:

Person       event         time        event_time
------------------------------------------------------
John          A        2017-10-11     (A, 2017-10-11)
John          B        2017-10-12     (B, 2017-10-12)
John          C        2017-10-14     (C, 2017-10-14)
John          D        2017-10-15     (D, 2017-10-15)
Ann           X        2017-09-01     (X, 2017-09-01)
Ann           Y        2017-09-02     (Y, 2017-09-02)
Dave          M        2017-10-05     (M, 2017-10-05)
Dave          N        2017-10-07     (N, 2017-10-07)
Dave          Q        2017-10-20     (Q, 2017-10-20)

Here is my code:

my_df['event_time'] = my_df.apply(lambda row: (row['event'] , row['time']), axis=1)

But I got the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4309         blocks = form_blocks(arrays, names, axes)
-> 4310         mgr = BlockManager(blocks, axes)
   4311         mgr._consolidate_inplace()

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2794         if do_integrity_check:
-> 2795             self._verify_integrity()
   2796 

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in _verify_integrity(self)
   3005             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3006                 construction_error(tot_items, block.shape[1:], self.axes)
   3007         if len(self.items) != tot_items:

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 

ValueError: Shape of passed values is (128, 2), indices imply (128, 3)

Any idea what I did wrong in my code? Thanks!

Petua answered 23/10, 2017 at 17:35 Comment(0)
T
7

You can use:

my_df['event_time'] = my_df[['event','time']].apply(tuple, axis=1)

Or:

my_df['event_time'] = tuple(zip(my_df['event'], my_df['time']))

Or:

my_df['event_time'] = [tuple(x) for x in my_df[['event','time']].values.tolist()]

All return:

print (my_df)
  Person event        time       event_time
0   John     A  2017-10-11  (A, 2017-10-11)
1   John     B  2017-10-12  (B, 2017-10-12)
2   John     C  2017-10-14  (C, 2017-10-14)
3   John     D  2017-10-15  (D, 2017-10-15)
4    Ann     X  2017-09-01  (X, 2017-09-01)
5    Ann     Y  2017-09-02  (Y, 2017-09-02)
6   Dave     M  2017-10-05  (M, 2017-10-05)
7   Dave     N  2017-10-07  (N, 2017-10-07)
8   Dave     Q  2017-10-20  (Q, 2017-10-20)
Tinware answered 23/10, 2017 at 17:38 Comment(3)
I tried the first approach, but got error: ValueError: Wrong number of items passed 2, placement implies 1 What did I miss here?Petua
Is possible some NaNs? I am going test it.Tinware
Yes, I have some event as 'None', but still with timestamp. I was hoping the corresponding tuple could be (None, timestamp)Petua
G
2

Without apply

df.assign(event_time=list(zip(df.event,df.time)))
Out[1011]: 
  Person event        time        event_time
0   John     A  2017-10-11  (A, 2017-10-11)
1   John     B  2017-10-12  (B, 2017-10-12)
2   John     C  2017-10-14  (C, 2017-10-14)
3   John     D  2017-10-15  (D, 2017-10-15)
4    Ann     X  2017-09-01  (X, 2017-09-01)
5    Ann     Y  2017-09-02  (Y, 2017-09-02)
6   Dave     M  2017-10-05  (M, 2017-10-05)
7   Dave     N  2017-10-07  (N, 2017-10-07)
8   Dave     Q  2017-10-20  (Q, 2017-10-20)
Glindaglinka answered 23/10, 2017 at 17:40 Comment(0)
E
0
my_df['event_time'] = my_df.apply(lambda x: tuple(x[['event','time']]),axis = 1)

This will be my approach, if you you want to use lambda for running efficiency

Employer answered 25/1, 2022 at 15:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.