numpy.hstack
I know this question is really old and but for anyone coming to this question now, used pd.concat
and got a dataframe with more rows than expected1, one solution is drop down to numpy2 and use one of its column-wise concatenation functions: np.concatenate(..., axis=1)
/np.hstack(...)
, np.c_[]
etc. and construct a dataframe afterwards. It will destroy the old indices and create a dataframe with a RangeIndex
.
import numpy as np
import pandas as pd
test1 = pd.DataFrame([1,2,3,4,5], index=['a','b','c','d','e'])
test2 = pd.DataFrame([4,2,1,3,7], index=['d','e','f','g','h'])
df = pd.DataFrame(np.hstack([test1, test2]))
0 1
0 1 4 <--- the indices are 0, 1, 2, 3, 4
1 2 2 <--- (completely new RangeIndex)
2 3 1
3 4 3
4 5 7
.set_axis
If you want to "coerce" one of the dataframes to have the indices of the other, then you can assign indices using set_axis
and concatenate using pd.concat
. Below is an example where we assigned the first dataframe's index to the second dataframe before concatenation. It preserves the first dataframe's index.
df = pd.concat([test1, test2.set_axis(test1.index)], axis=1)
0 0
a 1 4 <--- the indices are a, b, c, d, e
b 2 2 <--- which are the indices of test1
c 3 1
d 4 3
e 5 7
1 The cause of this issue was explained in Feng Mai's answer. Looks like it was an issue for the OP as well.
2 Pandas is built on numpy, so if you have pandas installed, you will have numpy in your environment as well.
pd.concat()
causes issues due to always trying to align the misaligned axes, unlike R'scbind()
. – Homosexual