How to convert Pandas DataFrame to Julia DataFrame.jl
Asked Answered
C

2

6

I have not been able to find a way to convert my 30,000 x 1,000 Pandas.jl String DataFrame into a DataFrames.jl DataFrame. I have attempted previous stackoverflow solutions but they have not worked. I would like to know what the best way is to convert the dataframe. Thanks for your help.

Clarence answered 31/5, 2022 at 3:30 Comment(0)
D
4

Preparing data:

julia> import Pandas

julia> import DataFrames

julia> df_df1 = DataFrames.DataFrame(string.(rand(1:10, 10, 5)), :auto)
10×5 DataFrame
 Row │ x1      x2      x3      x4      x5
     │ String  String  String  String  String
─────┼────────────────────────────────────────
   1 │ 6       1       2       5       4
   2 │ 9       5       1       1       9
   3 │ 9       1       5       2       9
   4 │ 6       7       9       1       5
   5 │ 1       10      8       5       1
   6 │ 8       5       9       9       6
   7 │ 9       8       9       8       4
   8 │ 2       6       10      5       4
   9 │ 5       4       8       9       8
  10 │ 5       4       10      5       8

julia> pd_df = Pandas.DataFrame(df_df1)
  x1  x2  x3 x4 x5
0  6   1   2  5  4
1  9   5   1  1  9
2  9   1   5  2  9
3  6   7   9  1  5
4  1  10   8  5  1
5  8   5   9  9  6
6  9   8   9  8  4
7  2   6  10  5  4
8  5   4   8  9  8
9  5   4  10  5  8

and now the task you want to do:

julia> DataFrames.DataFrame([col => collect(pd_df[col]) for col in pd_df.pyo.columns])
10×5 DataFrame
 Row │ x1      x2      x3      x4      x5
     │ String  String  String  String  String
─────┼────────────────────────────────────────
   1 │ 6       1       2       5       4
   2 │ 9       5       1       1       9
   3 │ 9       1       5       2       9
   4 │ 6       7       9       1       5
   5 │ 1       10      8       5       1
   6 │ 8       5       9       9       6
   7 │ 9       8       9       8       4
   8 │ 2       6       10      5       4
   9 │ 5       4       8       9       8
  10 │ 5       4       10      5       8

(unfortunately Pandas.jl does not correctly support Tables.jl interface so such work-around seems to be needed; I also decided to drop Pandas Series and convert it to standard Julia Vector)

Deplorable answered 31/5, 2022 at 8:18 Comment(5)
Thank you very much, I'll try this!Clarence
This code did not work for my data set. The error is below. However, it did work on my numerical dataset but your method was significantly slower than converting the pandas dataframe to an array and then to a julia dataframe. Any suggestions? <PyCall.jlwrap (in a Julia function called from Python) JULIA: MethodError: Cannot convert an object of type Nothing to an object of type String Closest candidates are: convert(::Type{T}, !Matched::PyObject) where T<:AbstractString at C:\Users\jackn\.julia\packages\PyCall\7a7w0\src\conversions.jl:92Clarence
do you have missings in your data?Goober
Yes I do, what do you suggest that I doClarence
I am not sure. I do not use Pandas.jl. Maybe collect(Union{Nothing,String}, pd_df[col]) will work? I have just checked - it works.Goober
T
0

TLDR is export to Arrow or CSV, and import. (There might be a way to do this with pycall, but it won't be as easy)

Tedium answered 31/5, 2022 at 3:47 Comment(1)
This is what I am trying to avoid doing.Clarence

© 2022 - 2024 — McMap. All rights reserved.