Trying to merge multiple csv files in R

Asked 14/5, 2015 at 16:10 Answered 19/9, 2022 at 15:59

I'm attempting to merge multiple csv files using R. all of the CSV files have the same fields and are all a shared folder only containing these CSV files. I've attempted to do it using the following code:

multmerge=function(mypath) {
    filenames=list.files(path=mypath, full.names=TRUE)
    datalist= lapply(filenames, function (x) read.csv(file=x, header=True))
    Reduce(function(x,y) merge(x,y), datalist)}

I am entering my path as something like "Y:/R Practice/specdata". I do get an ouput when I apply the function to my 300 or so csv files, but the result gives me my columns names, but beneath it has <0 rows> (or 0-length row.names). Please let me know if you have any suggestions on why this isn't working and how I can fix it.

Wicked answered 14/5, 2015 at 16:10 Comment(2)

Are there any warnings when you output the files? Are you using the proper encoding for them? – Hatley 14/5, 2015 at 16:19

No there are no warnings. Everything processes fine. It's just that the output is nothing but my column names – Wicked 14/5, 2015 at 17:32

For a shorter, faster solution

library(dplyr)
library(readr)
df <- list.files(path="yourpath", full.names = TRUE) %>% 
  lapply(read_csv) %>% 
  bind_rows

Dasya answered 14/5, 2015 at 16:50 Comment(6)

Thanks for the reply. I attempted your solution. Maybe I'm not using it correctly, but after copying and pasting your code into my console I come up with "There were 50 or more warnings (use warnings() to see the first 50)". All of the problems are with parsing if that means anything to you. Thanks. – Wicked 14/5, 2015 at 19:47

Note that df is a default function in R. Maybe rename to d – Verticillaster 5/2, 2019 at 13:11

pasing problem can be solved by changing the function parameters. I could not find how to do it inside the dplyr code, so I created a different function I used inside the dplyr block: readmeta <- function (path){ read_tsv (path, col_types = cols( .default = col_character())) } – Quita 13/5, 2019 at 11:13

Succinct version of the other stuff out there. – Viol 1/5, 2020 at 21:27

May be obvious but if you are wondering where the final merged csv is, it is actually inside the dataframe df . You need to write it to a fresh csv with this write.csv(df, "newmergedfile.csv") – Theoretical 23/7, 2020 at 22:9

If you want to merge multiple CSV column-wise, instead of row-wise, you can adapt it this way: list.files() %>% lapply(read_csv) %>% data.frame – Jennifer 16/8, 2022 at 11:23

Your code worked for me, but you need change header = True to header = TRUE.

Fortnightly answered 14/5, 2015 at 16:24 Comment(2)

So do you have any idea what the output that I got means? Also, thanks for replying. – Wicked 14/5, 2015 at 17:26

I created a folder full of .csv files with the same column names as you described, then ran your code just with the change noted above, and it worked to combine all the .csv files. So I couldn't recreate your error. @Maiasaura's code is a much nicer solution though. – Fortnightly 14/5, 2015 at 17:39

Another option that has proved to work for my setup:

multmerge = function(path){
  filenames=list.files(path=path, full.names=TRUE)
  rbindlist(lapply(filenames, fread))
}


path <- "Dropbox/rstudio-share/dataset/MB"
DF <- multmerge(path)

If you need a much granular control of your CSV file during the loading process you can change the fread by a function like so:

multmerge = function(path){
  filenames=list.files(path=path, full.names=TRUE)
  rbindlist(lapply(filenames, function(x){read.csv(x, stringsAsFactors = F, sep=';')}))
}

Age answered 23/10, 2017 at 8:35 Comment(0)

If all your csv files have exactly the same fields (column names) and you want simply to combine them vertically, you should use rbind instead of merge:

> a
             A         B
[1,]  2.471202 38.949232
[2,] 16.935362  6.343694
> b
            A          B
[1,] 0.704630  0.1132538
[2,] 4.477572 11.8869057
> rbind(a, b)
             A          B
[1,]  2.471202 38.9492316
[2,] 16.935362  6.3436939
[3,]  0.704630  0.1132538
[4,]  4.477572 11.8869057

Oas answered 28/12, 2016 at 4:43 Comment(0)

I tried working with the same function but included the all=TRUE in the merge function and worked just fine.

The code I used is as follows:

multmerge = function(mypath){
  filenames=list.files(path=mypath, full.names=TRUE)
  datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
  Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
}

full_data = multmerge("path_name for your csv folder")

Hope this helps. Cheers!

Precious answered 24/3, 2019 at 9:56 Comment(0)

For anyone who has many csvs with the same header (I had ~1000), here is a quicker approach, which avoids parsing the csvs individually.

filenames <- list.files(path=mypath, full.names=TRUE)

#read the files in as plaintext
csv_list <- lapply(filenames , readLines)

#remove the header from all but the first file
csv_list[-1] <- sapply(csv_list[-1], "[", 2)

#unlist to create a character vector
csv_list <- unlist(csv_list)

#write the csv as one single file
writeLines(text=csv_list,
           con="all_my_csvs_combined.csv")
           )

#read the csv as one single file
all_my_csvs_combined <- read.csv("all_my_csvs_combined.csv")

Ramer answered 22/4, 2021 at 15:44 Comment(0)

Let me give you the best I have ever had:

library(pacman)
p_load(doParallel,data.table,stringr)

# get the file name
dir() %>% str_subset("\\.csv$") -> fn

# use parallel setting
(cl = detectCores() %>% 
  makeCluster()) %>% 
  registerDoParallel()

# read and bind
system.time({
  big_df = foreach(i = fn,
                    .packages = "data.table") %dopar% {
                      fread(i,colClasses = "chracter")
                    } %>% 
    rbindlist(fill = T)
})

# end of parallel work
stopImplicitCluster(cl)

This should be faster as long as you have more cores in your computer.If you are dealing with big data, it is preferred.

Gest answered 8/1, 2020 at 5:22 Comment(4)

I get Error in stopImplicitCluster(cl) : unused argument (cl) – Ditch 4/9, 2021 at 1:48

Also, why so many packages in the p_load? – Ditch 4/9, 2021 at 23:23

Thank you for the comments, I've edited to revise (lots of packages in p_load because this is extracted from my own working codes and I need them for other tasks). About the error, do you have the reproducible codes so I can yield your results? – Gest 5/9, 2021 at 5:47

I used parallel::stopCluster(cl) instead which removed the error. I was combining the csvs which starts with CCG in the following zipfile https://files.digital.nhs.uk/32/3986A6/Daily.zip, downloaded onto desktop. Then was running your code against that unzipped (i as the file list) – Ditch 5/9, 2021 at 9:20

To combine many CSV files:

Set your file location for R programming.

Run:

library(plyr)
veri <- ldply(list.files(), read.csv, header=TRUE, sep='\t')
View(veri)

Firewater answered 22/4, 2021 at 5:13 Comment(0)

i used the rbind() function. just separate the csv files you want to merge by commas.

after reading all of the csv files separately:

read_csv("name_of_file.csv")

you can then use rbind() to concatenate them into a single csv file.

all_of_the_marbles <- rbind("name_of_file.csv", "name_of_file2.csv", "name_of_file4.csv")

Chorography answered 1/9, 2021 at 15:38 Comment(1)

Please add further details to expand on your answer, such as working code or documentation citations. – Darkle 1/9, 2021 at 15:53

The following code is similar to the answer by @Michael but uses only base R functions; rbindlist is replaced.

multmerge <- function(path) {
  files <- list.files(path, pattern="csv")
  lst <- lapply(files, function(x){ read.csv(x, header=TRUE, stringsAsFactors=FALSE) })
  do.call("rbind", lst)
}

Visionary answered 19/9, 2022 at 15:59 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags