pandas equivalent of R's cbind (concatenate/stack vectors horizontally)
Asked Answered
V

3

53

Suppose I have two dataframes:

import pandas as pd

test1 = pd.DataFrame([1,2,3,4,5])

test2 = pd.DataFrame([4,2,1,3,7])

I tried test1.append(test2) but it is the equivalent of R's rbind.

How can I combine the two as two columns of a dataframe similar to the cbind function in R?

Volteface answered 18/2, 2015 at 23:7 Comment(4)
have you considered changing which answer is accepted? I think Feng Mai's answer is far more complete.Tape
Sorry, I needed the answer in 2015, not in 2021 !!! Not fair to change the answer - particularly not fair to the person who responded to me 7 years ago when I needed the answerVolteface
I don’t like to go back 7 years ago. I appreciate you answering the question back then, but it’s no point in getting an answer 7 years later when I have long left Python for C# / Java and not interested in the answer anymoreVolteface
@cphlewis: Feng Mai's more recent answer is more complete. On SO we might not change which answer got accepted, but we can upvote the better more recent answer. Also, this question really needs an MCVE. If the indices of test1 and test2 aren't identical, pd.concat() causes issues due to always trying to align the misaligned axes, unlike R's cbind().Homosexual
T
81
test3 = pd.concat([test1, test2], axis=1)
test3.columns = ['a','b']

(But see the detailed answer by @feng-mai, below)

Tape answered 18/2, 2015 at 23:11 Comment(5)
I did this, and it's adding rows, as if it's a join -- which is not what I want at all.Scallop
Is axis=2 what you want?Tape
Just for completeness: #33088510Connor
axis = 0 is what you want.Copula
@Scallop the answer from Feng Mai addresses this crucial issue and is superior, in my estimation as it more faithfully behaves like R cbind. https://mcmap.net/q/338072/-pandas-equivalent-of-r-39-s-cbind-concatenate-stack-vectors-horizontallyGastrectomy
R
20

There is a key difference between concat(axis = 1) in pandas and cbind() in R:

concat attempts to merge/align by index. There is no concept of index in a R dataframe. If the two pandas dataframes' indexes are misaligned, the results are different from cbind (even if they have the same number of rows). You need to either make sure the indexes align or drop/reset the indexes.

Example:

import pandas as pd

test1 = pd.DataFrame([1,2,3,4,5])
test1.index = ['a','b','c','d','e']
test2 = pd.DataFrame([4,2,1,3,7])
test2.index = ['d','e','f','g','h']

pd.concat([test1, test2], axis=1)

     0    0
a  1.0  NaN
b  2.0  NaN
c  3.0  NaN
d  4.0  4.0
e  5.0  2.0
f  NaN  1.0
g  NaN  3.0
h  NaN  7.0

pd.concat([test1.reset_index(drop=True), test2.reset_index(drop=True)], axis=1)

   0  1
0  1  4
1  2  2
2  3  1
3  4  3
4  5  7

pd.concat([test1.reset_index(), test2.reset_index(drop=True)], axis=1)      

  index  0  0
0     a  1  4
1     b  2  2
2     c  3  1
3     d  4  3
4     e  5  7
Roer answered 7/11, 2021 at 16:31 Comment(1)
This index issue is absolutely central and yours should for that reason be the accepted answer. Anybody coming from R, looking for cbind, will need to know this.Gastrectomy
A
1

numpy.hstack

I know this question is really old and but for anyone coming to this question now, used pd.concat and got a dataframe with more rows than expected1, one solution is drop down to numpy2 and use one of its column-wise concatenation functions: np.concatenate(..., axis=1)/np.hstack(...), np.c_[] etc. and construct a dataframe afterwards. It will destroy the old indices and create a dataframe with a RangeIndex.

import numpy as np
import pandas as pd

test1 = pd.DataFrame([1,2,3,4,5], index=['a','b','c','d','e'])
test2 = pd.DataFrame([4,2,1,3,7], index=['d','e','f','g','h'])


df = pd.DataFrame(np.hstack([test1, test2]))


   0  1
0  1  4        <--- the indices are 0, 1, 2, 3, 4
1  2  2        <--- (completely new RangeIndex)
2  3  1
3  4  3
4  5  7

.set_axis

If you want to "coerce" one of the dataframes to have the indices of the other, then you can assign indices using set_axis and concatenate using pd.concat. Below is an example where we assigned the first dataframe's index to the second dataframe before concatenation. It preserves the first dataframe's index.

df = pd.concat([test1, test2.set_axis(test1.index)], axis=1)


   0  0
a  1  4        <--- the indices are a, b, c, d, e
b  2  2        <--- which are the indices of test1
c  3  1
d  4  3
e  5  7

1 The cause of this issue was explained in Feng Mai's answer. Looks like it was an issue for the OP as well.
2 Pandas is built on numpy, so if you have pandas installed, you will have numpy in your environment as well.

Amil answered 22/5 at 18:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.