Read CSV into array
Asked Answered
G

4

17

In Julia, using CSV.jl, it is possible to read a DataFrame from a .csv file:

using CSV

df = CSV.read("data.csv", delim=",")

However, how can I instead read a CSV file into an Vector{Float64} data type?

Grammer answered 28/1, 2019 at 20:44 Comment(0)
G
14

You can use the DelimitedFiles module from stdlib:

julia> using DelimitedFiles

julia> s = """
       1,2,3
       4,5,6
       7,8,9"""
"1,2,3\n4,5,6\n7,8,9"

julia> b = IOBuffer(s)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=17, maxsize=Inf, ptr=1, mark=-1)

julia> readdlm(b, ',', Float64)
3×3 Array{Float64,2}:
 1.0  2.0  3.0
 4.0  5.0  6.0
 7.0  8.0  9.0

I am showing you the example reading from IOBuffer to be fully reproducible, but you can also read data from file. In the docstring of readdlm you can find more details about the available options.

Notice that you will get Matrix{Float64} not Vector{Float64}, but I understand that this is what you wanted. If not then in order to convert a matrix into a vector you can call vec function on it after reading the data in.

EDIT

This is how you can read back a Matrix using CSV.jl:

julia> df = DataFrame(rand(2,3))
2×3 DataFrame
│ Row │ x1        │ x2       │ x3       │
│     │ Float64   │ Float64  │ Float64  │
├─────┼───────────┼──────────┼──────────┤
│ 1   │ 0.0444818 │ 0.570981 │ 0.608709 │
│ 2   │ 0.47577   │ 0.675344 │ 0.500577 │

julia> CSV.write("test.csv", df)
"test.csv"

julia> CSV.File("test.csv") |> Tables.matrix
2×3 Array{Float64,2}:
 0.0444818  0.570981  0.608709
 0.47577    0.675344  0.500577
Glabrescent answered 28/1, 2019 at 20:50 Comment(2)
Perhaps a more straightforward example: ``` using DelimitedFiles mat = readdlm("data.csv", ',') ```Hamamatsu
Right, I wanted the initial example to be reproducible without writing anything to disk.Pontiac
C
4

You can convert your DataFrame to a Matrix of a certain type. If there is no missing data this should work. If there is missing data, simply omit the type in convert.

arr = convert(Matrix{Float64}, df)

You can call vec on the result to get a vector if that is really what you want.

Depending on the file, I would go with readdlm as suggested in the previous answer.

Crazed answered 28/1, 2019 at 20:55 Comment(4)
This approach is also nice, note that you even can simply write Matrix{Float64}(df).Pontiac
Unfortunate that CSV can't directly create an array as its output without first going through a dataframe. I think CSV is Julia's fastest csv reader, but not necessarily the fastest way to produce an array. Producing an array is a common scenario.Hamamatsu
I have added an EDIT in my answer to show you how you can get a Matrix without an intermediate DataFrame.Pontiac
You could just take the output of a CSV.File (a CSV rows object) and use as input to Matrix() constructor. Which is exactly what the pipe version does. Curious about the performance of the CSV Rows object, which is an array of named tuples. Named Tuples aren't speed demons, I've found. Seems like we should just port the nice speed improvements of CSV's parsing into readdlm. Then we'd have 2 very nice fast approaches: one leaning to supporting arrays and the other leaning to supporting DFs. Both are desirable.Hamamatsu
L
3

To summarize Bogumil's answer, your can use:

using DelimitedFiles
data = readdlm("data.csv", ',', Float64)
Lennalennard answered 9/4, 2021 at 23:42 Comment(0)
S
2

You can ask CSV.read to use a Matrix as its destination in one go with:

julia> import CSV

julia> s = """
       1,2,3
       4,5,6
       7,8,9""";

julia> CSV.read(IOBuffer(s), CSV.Tables.matrix; header=false)
3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

Do note that there's a currently-outstanding issue to directly use the builtin Matrix type itself as the "sink", which would make this slightly more discoverable.

Segarra answered 5/12, 2023 at 21:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.