How do I load multiple CSV into DataFrames in Julia?
Asked Answered
D

4

5

I already know how to load a single CSV into a DataFrame:

using CSV
using DataFrames    
df = DataFrame(CSV.File("C:\\Users\\username\\Table_01.csv"))

How would I do this when I have several CSV files, e.g. Table_01.csv, Table_02.csv, Table_03.csv? Would I create a bunch of empty DataFrames and use a for loop to fill them? Or is there an easier way in Julia? Many thanks in advance!

Dander answered 16/6, 2020 at 12:18 Comment(1)
jlhub.com/julia/manual/en/function/mapHylotheism
E
5

If you want multiple data frames (not a single data frame holding the data from multiple files) there are several options.

Let me start with the simplest approach using broadcasting:

dfs = DataFrame.(CSV.File.(["Table_01.csv", "Table_02.csv", "Table_03.csv"]))

or

dfs = @. DataFrame(CSV.File(["Table_01.csv", "Table_02.csv", "Table_03.csv"]))

or (with a bit of more advanced stuff, using function composition):

(DataFrame∘CSV.File).(["Table_01.csv", "Table_02.csv", "Table_03.csv"])

or using chaining:

CSV.File.(["Table_01.csv", "Table_02.csv", "Table_03.csv"]) .|> DataFrame

Now other options are map as it was suggested in the comment:

map(DataFrame∘CSV.File, ["Table_01.csv", "Table_02.csv", "Table_03.csv"])

or just use a comprehension:

[DataFrame(CSV.File(f)) for f in ["Table_01.csv", "Table_02.csv", "Table_03.csv"]]

(I am listing the options to show different syntactic possibilities in Julia)

Epiglottis answered 16/6, 2020 at 12:39 Comment(3)
Thanks! I prefer e.g. broadcasting, because I can still use keywords there, but good to learn about other possibilities.Dander
And I guess for renaming the columns of the dfs, I'd use sth like for i in 1:length(dfs) rename!(dfs[i],[Symbol("A"),Symbol("B"),Symbol("C")]) endDander
you do not need to use Symbol. Just write foreach(df -> rename!(df, ["A", "B", "C"]), dfs).Daggerboard
P
3

This is how I have done it, but there might be an easier way.

using DataFrames, Glob
import CSV

function readcsvs(path)
    files=glob("*.csv", path) #Vector of filenames. Glob allows you to use the asterisk.
    numfiles=length(files)    #Number of files to read.
    tempdfs=Vector{DataFrame}(undef, numfiles) #Create a vector of empty dataframes.
    for i in 1:numfiles
        tempdfs[i]=CSV.read(files[i]) #Read each CSV into its own dataframe.
    end
    masterdf=outerjoin(tempdfs..., on="Column In Common") #Join the temporary dataframes into one dataframe.
end
Palais answered 16/6, 2020 at 12:46 Comment(0)
M
0

A simple solution where you don't have to explicitly enter filenames:

using CSV, Glob, DataFrames
path = raw"C:\..." # directory of your files (raw is useful in Windows to add a \)
files=glob("*.csv", path) # to load all CSVs from a folder (* means arbitrary pattern)
dfs = DataFrame.( CSV.File.( files ) ) # creates a list of dataframes

# add an index column to be able to later discern the different sources
for i in 1:length(dfs)
    dfs[i][!, :sample] .= i # I called the new col sample
end

# finally, if you want, reduce your collection of dfs via vertical concatenation
df = reduce(vcat, dfs)
Mortimer answered 28/7, 2020 at 9:9 Comment(0)
Q
0

An example of open write and close process. Reading is similar too.

    function main()
    f_max=365
    data=zeros(Float64,100,f_max)
    data[:,:].=rand()

    filenames=[]
    for i=1:f_max
        ci=string(i)
         filename="./testdata"*ci*".dat"
         push!(filenames,filename)
     end
     

    files = [open(file,"w") for file in filenames]


    for i=1:f_max
        write(files[i],data[:,i])
    end

    #println(odata[1,1]," ",odata[1,2])

    for i=1:f_max
        close(files[i])
    end

    end

    main()
Quoth answered 23/11, 2023 at 13:42 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.