I have not been able to find a way to convert my 30,000 x 1,000 Pandas.jl String DataFrame into a DataFrames.jl DataFrame. I have attempted previous stackoverflow solutions but they have not worked. I would like to know what the best way is to convert the dataframe. Thanks for your help.
How to convert Pandas DataFrame to Julia DataFrame.jl
Preparing data:
julia> import Pandas
julia> import DataFrames
julia> df_df1 = DataFrames.DataFrame(string.(rand(1:10, 10, 5)), :auto)
10×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ String String String String String
─────┼────────────────────────────────────────
1 │ 6 1 2 5 4
2 │ 9 5 1 1 9
3 │ 9 1 5 2 9
4 │ 6 7 9 1 5
5 │ 1 10 8 5 1
6 │ 8 5 9 9 6
7 │ 9 8 9 8 4
8 │ 2 6 10 5 4
9 │ 5 4 8 9 8
10 │ 5 4 10 5 8
julia> pd_df = Pandas.DataFrame(df_df1)
x1 x2 x3 x4 x5
0 6 1 2 5 4
1 9 5 1 1 9
2 9 1 5 2 9
3 6 7 9 1 5
4 1 10 8 5 1
5 8 5 9 9 6
6 9 8 9 8 4
7 2 6 10 5 4
8 5 4 8 9 8
9 5 4 10 5 8
and now the task you want to do:
julia> DataFrames.DataFrame([col => collect(pd_df[col]) for col in pd_df.pyo.columns])
10×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ String String String String String
─────┼────────────────────────────────────────
1 │ 6 1 2 5 4
2 │ 9 5 1 1 9
3 │ 9 1 5 2 9
4 │ 6 7 9 1 5
5 │ 1 10 8 5 1
6 │ 8 5 9 9 6
7 │ 9 8 9 8 4
8 │ 2 6 10 5 4
9 │ 5 4 8 9 8
10 │ 5 4 10 5 8
(unfortunately Pandas.jl does not correctly support Tables.jl interface so such work-around seems to be needed; I also decided to drop Pandas Series
and convert it to standard Julia Vector
)
Thank you very much, I'll try this! –
Clarence
This code did not work for my data set. The error is below. However, it did work on my numerical dataset but your method was significantly slower than converting the pandas dataframe to an array and then to a julia dataframe. Any suggestions? <PyCall.jlwrap (in a Julia function called from Python) JULIA: MethodError: Cannot
convert
an object of type Nothing to an object of type String Closest candidates are: convert(::Type{T}, !Matched::PyObject) where T<:AbstractString at C:\Users\jackn\.julia\packages\PyCall\7a7w0\src\conversions.jl:92 –
Clarence do you have missings in your data? –
Goober
Yes I do, what do you suggest that I do –
Clarence
I am not sure. I do not use Pandas.jl. Maybe
collect(Union{Nothing,String}, pd_df[col])
will work? I have just checked - it works. –
Goober TLDR is export to Arrow or CSV, and import. (There might be a way to do this with pycall, but it won't be as easy)
This is what I am trying to avoid doing. –
Clarence
© 2022 - 2024 — McMap. All rights reserved.