Here's one option:
First, extract the column names from you data
object:
julia> colnames = map(Symbol, data[:columns]);
12-element Array{Symbol,1}:
:Open
:High
:Low
:Close
:Volume
Symbol("Ex-Dividend")
Symbol("Split Ratio")
Symbol("Adj. Open")
Symbol("Adj. High")
Symbol("Adj. Low")
Symbol("Adj. Close")
Symbol("Adj. Volume")
Then pour all your columns into an DataFrame:
julia> y = DataFrame(Any[Array(data[c]) for c in colnames], colnames)
6×12 DataFrames.DataFrame
│ Row │ Open │ High │ Low │ Close │ Volume │ Ex-Dividend │ Split Ratio │
├─────┼───────┼───────┼───────┼───────┼──────────┼─────────────┼─────────────┤
│ 1 │ 28.75 │ 28.87 │ 28.75 │ 28.75 │ 2.0939e6 │ 0.0 │ 1.0 │
│ 2 │ 27.38 │ 27.38 │ 27.25 │ 27.25 │ 785200.0 │ 0.0 │ 1.0 │
│ 3 │ 25.37 │ 25.37 │ 25.25 │ 25.25 │ 472000.0 │ 0.0 │ 1.0 │
│ 4 │ 25.87 │ 26.0 │ 25.87 │ 25.87 │ 385900.0 │ 0.0 │ 1.0 │
│ 5 │ 26.63 │ 26.75 │ 26.63 │ 26.63 │ 327900.0 │ 0.0 │ 1.0 │
│ 6 │ 28.25 │ 28.38 │ 28.25 │ 28.25 │ 217100.0 │ 0.0 │ 1.0 │
│ Row │ Adj. Open │ Adj. High │ Adj. Low │ Adj. Close │ Adj. Volume │
├─────┼───────────┼───────────┼──────────┼────────────┼─────────────┤
│ 1 │ 0.428364 │ 0.430152 │ 0.428364 │ 0.428364 │ 1.17258e8 │
│ 2 │ 0.407952 │ 0.407952 │ 0.406015 │ 0.406015 │ 4.39712e7 │
│ 3 │ 0.378004 │ 0.378004 │ 0.376216 │ 0.376216 │ 2.6432e7 │
│ 4 │ 0.385453 │ 0.38739 │ 0.385453 │ 0.385453 │ 2.16104e7 │
│ 5 │ 0.396777 │ 0.398565 │ 0.396777 │ 0.396777 │ 1.83624e7 │
│ 6 │ 0.420914 │ 0.422851 │ 0.420914 │ 0.420914 │ 1.21576e7 │
Thanks to @Matt B. for the suggestions to simplify the code.
The problem with the above is that the column types are Any
inside the dataframe. To make it a bit more efficient here are a few functions which get the job done:
# first, guess the Julia equivalent of type of the object
function guess_type(x::PyCall.PyObject)
string_dtype = x[:dtype][:name]
julia_string = string(uppercase(string_dtype[1]), string_dtype[2:end])
return eval(parse("$julia_string"))
end
# convert an individual column, falling back to Any array if the guess was wrong
function convert_column(x)
y = try Array{guess_type(x)}(x) catch Array(x) end
return y
end
# put everything together into a single function
function convert_pandas(df)
colnames = map(Symbol, data[:columns])
y = DataFrame(Any[convert_column(df[c]) for c in colnames], colnames)
return y
end
The above, when applied to your data
gives the same column names as before, but with correct Float64
column types:
y = convert_pandas(data);
showcols(y)
9147×12 DataFrames.DataFrame
│ Col # │ Name │ Eltype │ Missing │
├───────┼─────────────┼─────────┼─────────┤
│ 1 │ Open │ Float64 │ 0 │
│ 2 │ High │ Float64 │ 0 │
│ 3 │ Low │ Float64 │ 0 │
│ 4 │ Close │ Float64 │ 0 │
│ 5 │ Volume │ Float64 │ 0 │
│ 6 │ Ex-Dividend │ Float64 │ 0 │
│ 7 │ Split Ratio │ Float64 │ 0 │
│ 8 │ Adj. Open │ Float64 │ 0 │
│ 9 │ Adj. High │ Float64 │ 0 │
│ 10 │ Adj. Low │ Float64 │ 0 │
│ 11 │ Adj. Close │ Float64 │ 0 │
│ 12 │ Adj. Volume │ Float64 │ 0 │
PyObject
with the Pandas.jl library. That doesn't get you into the JuliaStats ecosystem, but it simplifies using Pandas analysis functions from within Julia. – Diazomethane