pandas equivalent of R's cbind (concatenate/stack vectors horizontally)

Asked 18/2, 2015 at 23:7 Answered 22/5 at 18:13

Solved r python-3.x pandas concatenation cbind

Suppose I have two dataframes:

import pandas as pd

test1 = pd.DataFrame([1,2,3,4,5])

test2 = pd.DataFrame([4,2,1,3,7])

I tried test1.append(test2) but it is the equivalent of R's rbind.

How can I combine the two as two columns of a dataframe similar to the cbind function in R?

Volteface answered 18/2, 2015 at 23:7 Comment(4)

have you considered changing which answer is accepted? I think Feng Mai's answer is far more complete. – Tape 23/8, 2022 at 16:51

Sorry, I needed the answer in 2015, not in 2021 !!! Not fair to change the answer - particularly not fair to the person who responded to me 7 years ago when I needed the answer – Volteface 7/9, 2022 at 2:22

I don’t like to go back 7 years ago. I appreciate you answering the question back then, but it’s no point in getting an answer 7 years later when I have long left Python for C# / Java and not interested in the answer anymore – Volteface 7/9, 2022 at 2:24

@cphlewis: Feng Mai's more recent answer is more complete. On SO we might not change which answer got accepted, but we can upvote the better more recent answer. Also, this question really needs an MCVE. If the indices of test1 and test2 aren't identical, pd.concat() causes issues due to always trying to align the misaligned axes, unlike R's cbind(). – Homosexual 10/11, 2023 at 4:41

test3 = pd.concat([test1, test2], axis=1)
test3.columns = ['a','b']

(But see the detailed answer by @feng-mai, below)

Tape answered 18/2, 2015 at 23:11 Comment(5)

I did this, and it's adding rows, as if it's a join -- which is not what I want at all. – Scallop 7/11, 2016 at 19:30

Is axis=2 what you want? – Tape 7/11, 2016 at 19:31

Just for completeness: #33088510 – Connor 6/12, 2016 at 0:22

axis = 0 is what you want. – Copula 7/4, 2020 at 15:53

@Scallop the answer from Feng Mai addresses this crucial issue and is superior, in my estimation as it more faithfully behaves like R cbind. https://mcmap.net/q/338072/-pandas-equivalent-of-r-39-s-cbind-concatenate-stack-vectors-horizontally – Gastrectomy 21/8, 2022 at 21:26

There is a key difference between concat(axis = 1) in pandas and cbind() in R:

concat attempts to merge/align by index. There is no concept of index in a R dataframe. If the two pandas dataframes' indexes are misaligned, the results are different from cbind (even if they have the same number of rows). You need to either make sure the indexes align or drop/reset the indexes.

Example:

import pandas as pd

test1 = pd.DataFrame([1,2,3,4,5])
test1.index = ['a','b','c','d','e']
test2 = pd.DataFrame([4,2,1,3,7])
test2.index = ['d','e','f','g','h']

pd.concat([test1, test2], axis=1)

     0    0
a  1.0  NaN
b  2.0  NaN
c  3.0  NaN
d  4.0  4.0
e  5.0  2.0
f  NaN  1.0
g  NaN  3.0
h  NaN  7.0

pd.concat([test1.reset_index(drop=True), test2.reset_index(drop=True)], axis=1)

   0  1
0  1  4
1  2  2
2  3  1
3  4  3
4  5  7

pd.concat([test1.reset_index(), test2.reset_index(drop=True)], axis=1)      

  index  0  0
0     a  1  4
1     b  2  2
2     c  3  1
3     d  4  3
4     e  5  7

Roer answered 7/11, 2021 at 16:31 Comment(1)

This index issue is absolutely central and yours should for that reason be the accepted answer. Anybody coming from R, looking for cbind, will need to know this. – Gastrectomy 21/8, 2022 at 21:24

`numpy.hstack`

I know this question is really old and but for anyone coming to this question now, used pd.concat and got a dataframe with more rows than expected¹, one solution is drop down to numpy² and use one of its column-wise concatenation functions: np.concatenate(..., axis=1)/np.hstack(...), np.c_[] etc. and construct a dataframe afterwards. It will destroy the old indices and create a dataframe with a RangeIndex.

import numpy as np
import pandas as pd

test1 = pd.DataFrame([1,2,3,4,5], index=['a','b','c','d','e'])
test2 = pd.DataFrame([4,2,1,3,7], index=['d','e','f','g','h'])


df = pd.DataFrame(np.hstack([test1, test2]))


   0  1
0  1  4        <--- the indices are 0, 1, 2, 3, 4
1  2  2        <--- (completely new RangeIndex)
2  3  1
3  4  3
4  5  7

`.set_axis`

If you want to "coerce" one of the dataframes to have the indices of the other, then you can assign indices using set_axis and concatenate using pd.concat. Below is an example where we assigned the first dataframe's index to the second dataframe before concatenation. It preserves the first dataframe's index.

df = pd.concat([test1, test2.set_axis(test1.index)], axis=1)


   0  0
a  1  4        <--- the indices are a, b, c, d, e
b  2  2        <--- which are the indices of test1
c  3  1
d  4  3
e  5  7

¹ The cause of this issue was explained in Feng Mai's answer. Looks like it was an issue for the OP as well.
² Pandas is built on numpy, so if you have pandas installed, you will have numpy in your environment as well.

Amil answered 22/5 at 18:13 Comment(0)

`numpy.hstack`

`.set_axis`

Recommended topics

Hot tags