Copying a column from one DataFrame to another gives NaN values?
Asked Answered
D

3

79

I have two DataFrames with the same number of rows - df1 like so:

        date     hour      var1
a   2017-05-01  00:00:00   456585
b   2017-05-01  01:00:00   899875
c   2017-05-01  02:00:00   569566
d   2017-05-01  03:00:00   458756
e   2017-05-01  04:00:00   231458
f   2017-05-01  05:00:00   986545

and df2 like so:

      MyVar1     MyVar2 
 0  6169.719338 3688.045368
 1  5861.148007 3152.238704
 2  5797.053347 2700.469871
 3  5779.102340 2730.471948
 4  6708.219647 3181.298291
 5  8550.380343 3793.580394

I want to merge the data from the date and hour columns of df1 into df2, to get a result like:

       MyVar1    MyVar2        date        hour
 0  6169.719338 3688.045368  2017-05-01  00:00:00
 1  5861.148007 3152.238704  2017-05-01  01:00:00
 2  5797.053347 2700.469871  2017-05-01  02:00:00
 3  5779.102340 2730.471948  2017-05-01  03:00:00
 4  6708.219647 3181.298291  2017-05-01  04:00:00
 5  8550.380343 3793.580394  2017-05-01  05:00:00

I tried simply assigning the columns like so:

df2['date'] = df1['date']
df2['hour'] = df1['hour']

but I get a result with NaN values in the date and hour columns instead:

       MyVar1    MyVar2      date       hour
 0  6169.719338 3688.045368  NaN        NaN
 1  5861.148007 3152.238704  NaN        NaN
 2  5797.053347 2700.469871  NaN        NaN

Why does this happen? How can I simply assign the values such that the data from the first row of df1 is shown in the first row of df2, etc.?

Diskson answered 18/8, 2017 at 2:27 Comment(0)
C
88

The culprit is unalignable indexes

Your DataFrames' indexes are different (and correspondingly, the indexes for each columns), so when trying to assign a column of one DataFrame to another, pandas will try to align the indexes, and failing to do so, insert NaNs.

Consider the following examples to understand what this means:

# Setup
A = pd.DataFrame(index=['a', 'b', 'c']) 
B = pd.DataFrame(index=['b', 'c', 'd', 'f'])                                  
C = pd.DataFrame(index=[1, 2, 3])
# Example of alignable indexes - A & B (complete or partial overlap of indexes)
A.index B.index
      a        
      b       b   (overlap)
      c       c   (overlap)
              d
              f
# Example of unalignable indexes - A & C (no overlap at all)
A.index C.index
      a        
      b        
      c        
              1
              2
              3

When there are no overlaps, pandas cannot match even a single value between the two DataFrames to put in the result of the assignment, so the output is a column full of NaNs.

If you're working on an IPython notebook, you can check that this is indeed the root cause using,

df1.index.equals(df2.index)
# False
df1.index.intersection(df2.index).empty
# True

You can use any of the following solutions to fix this issue.

Solution 1: Assign the underlying NumPy arrays (bypassing pandas index alignment)

This solution will only work if the lengths of the two DataFrames match (i.e. have same number of rows).

df2['date'] = df1['date'].to_numpy()

# (In pandas < 0.24, do df2['date'] = df1['date'].values )

To assign multiple columns easily, use,

df2[['date', 'hour']] = df1[['date', 'hour']].to_numpy()

Solution 2: Reset both DataFrames' indexes

For when the two DataFrames have different lengths. Also maybe preferable if you didn't mean to have different indices in the first place, or if you don't particularly care about preserving the index.

# Optional, if you want a RangeIndex => [0, 1, 2, ...]
# df1.index = pd.RangeIndex(len(df))
# Homogenize the index values,
df2.index = df1.index
# Assign the columns.
df2[['date', 'hour']] = df1[['date', 'hour']]

If you want to keep the existing index, but as a column, you may use reset_index() instead.

Corunna answered 18/8, 2017 at 2:33 Comment(5)
@Diskson Could be a problem with your index. Try adding ignore_index=True? Otherwise, reset both indices first and then concatenate.Corunna
@Diskson try df2['date'] = df1['date'].valuesNumber
@Diskson Ah I guessed it. The indices were not identicalCorunna
I had two dataframes where one had a range index and the other one had in integer index with vaules within that range. Looking at the dataframes, the indices seem fine but the intersection was empty. So I had to use to_numpy() instead.Dingle
Any idea how to make column names dynamic, instead of hard coded one?Solicitor
N
48

Try this ?

df2['date'] = df1['date'].values
df2['hour'] = df1['hour'].values
Number answered 18/8, 2017 at 2:39 Comment(2)
This is better than the accepted solution!Entice
when data is large(about 10^5),it's so slow,is there anyway?Tog
L
0

I know I'm late to this thread but this format has worked well for me.

df2.insert(1, value=df1['Name'], column='Name')

you can also perform functions on df1 in that value set argument. Hope this helps

Lament answered 19/10, 2022 at 2:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.