Merging dataframes on index with pandas

Asked 11/4, 2016 at 2:13 Answered 15/2, 2023 at 6:40

Solved python pandas dataframe join merge

100

I have two dataframes and each one has two index columns. I would like to merge them. For example, the first dataframe is the following:

                   V1
A      1/1/2012    12
       2/1/2012    14
B      1/1/2012    15
       2/1/2012     8
C      1/1/2012    17
       2/1/2012     9

The second dataframe is the following:

                   V2
A      1/1/2012    15
       3/1/2012    21
B      1/1/2012    24
       2/1/2012     9
D      1/1/2012     7
       2/1/2012    16

and as result I would like to get the following:

                   V1   V2
A      1/1/2012    12   15
       2/1/2012    14  N/A
       3/1/2012   N/A   21
B      1/1/2012    15   24
       2/1/2012     8    9
C      1/1/2012    17  N/A
       2/1/2012     9  N/A
D      1/1/2012   N/A    7
       2/1/2012   N/A   16

I have tried a few versions using the pd.merge and .join methods, but nothing seems to work. Do you have any suggestions?

Perlie answered 11/4, 2016 at 2:13 Comment(0)

114

You should be able to use join, which joins on the index as default. Given your desired result, you must use outer as the join type.

>>> df1.join(df2, how='outer')
            V1  V2
A 1/1/2012  12  15
  2/1/2012  14 NaN
  3/1/2012 NaN  21
B 1/1/2012  15  24
  2/1/2012   8   9
C 1/1/2012  17 NaN
  2/1/2012   9 NaN
D 1/1/2012 NaN   7
  2/1/2012 NaN  16

Signature: _.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False) Docstring: Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

Gorman answered 11/4, 2016 at 5:20 Comment(0)

You can do this with merge:

df_merged = df1.merge(df2, how='outer', left_index=True, right_index=True)

The keyword argument how='outer' keeps all indices from both frames, filling in missing indices with NaN. The left_index and right_index keyword arguments have the merge be done on the indices. If you get all NaN in a column after doing a merge, another troubleshooting step is to verify that your indices have the same dtypes.

The merge code above produces the following output for me:

                V1    V2
A 2012-01-01  12.0  15.0
  2012-02-01  14.0   NaN
  2012-03-01   NaN  21.0
B 2012-01-01  15.0  24.0
  2012-02-01   8.0   9.0
C 2012-01-01  17.0   NaN
  2012-02-01   9.0   NaN
D 2012-01-01   NaN   7.0
  2012-02-01   NaN  16.0

Chantress answered 11/4, 2016 at 3:19 Comment(0)

You can concatenate horizontally as well. Since concat matches on index and performs outer join by default, simply passing axis=1 argument to specify that the concatenation is horizontal suffices.

joined_df = pd.concat([df1, df2], axis=1)

An advantage of concat over merge and join¹ is that you can pass a list of dataframes and concatenate many frames in one go with minimal fuss.

joined_df = pd.concat([df1, df2, df1, df2], axis=1)

¹ It can be done with join too but if there are duplicate column names, they have to do be dealt with before the join call, whereas with concat, it doesn't matter.

Minard answered 15/2, 2023 at 6:40 Comment(0)

Recommended topics

Hot tags