Pandas convert columns type from list to np.array
Asked Answered
T

2

15

I'm trying to apply a function to a pandas dataframe, such a function required two np.array as input and it fit them using a well defined model.

The point is that I'm not able to apply this function starting from the selected columns since their "rows" contain list read from a JSON file and not np.array.

Now, I've tried different solutions:

#Here is where I discover the problem

train_df['result'] = train_df.apply(my_function(train_df['col1'],train_df['col2']))

#so I've tried to cast the Series before passing them to the function in both these ways:

X_col1_casted = trai_df['col1'].dtype(np.array)
X_col2_casted = trai_df['col2'].dtype(np.array)

doesn't work.

X_col1_casted = trai_df['col1'].astype(np.array)
X_col2_casted = trai_df['col2'].astype(np.array)

doesn't work.

X_col1_casted = trai_df['col1'].dtype(np.array)
X_col2_casted = trai_df['col2'].dtype(np.array)

does'nt work.

What I'm thinking to do now is a long procedure like:

starting from the uncasted column-series, convert them into list(), iterate on them apply the function to the np.array() single elements, and append the results into a temporary list. Once done I will convert this list into a new column. ( clearly, I don't know if it will work )

Does anyone of you know how to help me ?

EDIT: I add one example to be clear:

The function assume to have as input two np.arrays. Now it has two lists since they are retrieved form a json file. The situation is this one:

col1        col2    result
[1,2,3]     [4,5,6]  [5,7,9]
[0,0,0]     [1,2,3]  [1,2,3]

Clearly the function is not the sum one, but a own function. For a moment assume that this sum can work only starting from arrays and not form lists, what should I do ?

Thanks in advance

Teenyweeny answered 21/9, 2016 at 13:56 Comment(5)
Use the .values attribute to convert it into an array.Deceptive
may you also tell me how ? I need to use it to single cell elements, not to the whole columns in one shot. I need one array per row.Teenyweeny
what do you mean one array per row? I understood from the question that you want to convert a whole column to a numpy array.Tychon
I've edited the question with an example. The functoin that work per row, assume to have as input np.array and not lists. That's the point. Hoping to be clear now.Teenyweeny
I actually have the opposite requirement, My pandas dataframe have numpy.ndarray that I want to convert to list so that It cant be stored into DynamoDB table. Does anyone have any inputs on how can I do thatDwyer
D
29

Use apply to convert each element to it's equivalent array:

df['col1'] = df['col1'].apply(lambda x: np.array(x))

type(df['col1'].iloc[0])
numpy.ndarray

Data:

df = pd.DataFrame({'col1': [[1,2,3],[0,0,0]]})
df

Image

Deceptive answered 21/9, 2016 at 14:21 Comment(2)
df['col1'] = df['col1'].apply(np.array) works as wellPollination
I came here because I wanted to get a one big np.array from a column of type list (i.e. no pandas types at all). For those who want that you can do this: np.stack(df['col1']) (e.g. necessary for keras)Pomade
B
0

You can apply pd.Series on top of the list. e.g.,

>>> X_train = df.col1.apply(pd.Series).to_numpy()

>>> type(X_train)
numpy.ndarray
Benco answered 26/10, 2023 at 15:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.