Pandas: Creating DataFrame from Series
Asked Answered
D

4

72

My current code is shown below - I'm importing a MAT file and trying to create a DataFrame from variables within it:

mat = loadmat(file_path)  # load mat-file
Variables = mat.keys()    # identify variable names

df = pd.DataFrame         # Initialise DataFrame

for name in Variables:

    B = mat[name]
    s = pd.Series (B[:,1])

So within the loop, I can create a series of each variable (they're arrays with two columns - so the values I need are in column 2)

My question is how do I append the series to the dataframe? I've looked through the documentation and none of the examples seem to fit what I'm trying to do.

Dubonnet answered 7/5, 2014 at 15:6 Comment(0)
T
88

Here is how to create a DataFrame where each series is a row.

For a single Series (resulting in a single-row DataFrame):

series = pd.Series([1,2], index=['a','b'])
df = pd.DataFrame([series])

For multiple series with identical indices:

cols = ['a','b']
list_of_series = [pd.Series([1,2],index=cols), pd.Series([3,4],index=cols)]
df = pd.DataFrame(list_of_series, columns=cols)

For multiple series with possibly different indices:

list_of_series = [pd.Series([1,2],index=['a','b']), pd.Series([3,4],index=['a','c'])]
df = pd.concat(list_of_series, axis=1).transpose()

To create a DataFrame where each series is a column, see the answers by others. Alternatively, one can create a DataFrame where each series is a row, as above, and then use df.transpose(). However, the latter approach is inefficient if the columns have different data types.

Taipan answered 28/6, 2015 at 8:57 Comment(0)
J
45

No need to initialize an empty DataFrame (you weren't even doing that, you'd need pd.DataFrame() with the parens).

Instead, to create a DataFrame where each series is a column,

  1. make a list of Series, series, and
  2. concatenate them horizontally with df = pd.concat(series, axis=1)

Something like:

series = [pd.Series(mat[name][:, 1]) for name in Variables]
df = pd.concat(series, axis=1)
Julius answered 7/5, 2014 at 15:28 Comment(2)
Tom, that works great - only issue is the columns in the resulting dataframe are named numerically. How would I go about using "name" as the column name in the resulting dataframe?Dubonnet
Sorry, answered my own question... df.columns = VariablesDubonnet
G
13

Nowadays there is a pandas.Series.to_frame method:

Series.to_frame(name=NoDefault.no_default)

Convert Series to DataFrame.

Parameters

nameobject, optional: The passed name should substitute for the series name (if it has one).

Returns

DataFrame: DataFrame representation of Series.

Examples

s = pd.Series(["a", "b", "c"], name="vals")
s.to_frame()
Genevagenevan answered 8/2, 2022 at 14:37 Comment(0)
M
2

I guess anther way, possibly faster, to achieve this is 1) Use dict comprehension to get desired dict (i.e., taking 2nd col of each array) 2) Then use pd.DataFrame to create an instance directly from the dict without loop over each col and concat.

Assuming your mat looks like this (you can ignore this since your mat is loaded from file):

In [135]: mat = {'a': np.random.randint(5, size=(4,2)),
   .....: 'b': np.random.randint(5, size=(4,2))}

In [136]: mat
Out[136]: 
{'a': array([[2, 0],
        [3, 4],
        [0, 1],
        [4, 2]]), 'b': array([[1, 0],
        [1, 1],
        [1, 0],
        [2, 1]])}

Then you can do:

In [137]: df = pd.DataFrame ({name:mat[name][:,1] for name in mat})

In [138]: df
Out[138]: 
   a  b
0  0  0
1  4  1
2  1  0
3  2  1

[4 rows x 2 columns]
Mediate answered 8/5, 2014 at 1:34 Comment(3)
That's a nice solution thanks! Is it possible to add an if statement within the dict (or list) comprehension to ignore arrays that are a different size? The dict I end up with from my MAT fiel has a few hundred 2x4000 arrays and a handful of random arrays of different sizes.Dubonnet
You can add if statement after for ... {name:mat[name][:,1] for name in mat if ... }Mediate
Thu use of dict comprehension has the drawback that the keys in standard python dicts are unordered, so the order of columns is not preserved.Taipan

© 2022 - 2024 — McMap. All rights reserved.