Read the file created/modified last in different directories in R
Asked Answered
S

4

8

I'd want to read the CSV files modified( or created) most recently in differents directories and then put it in a pre-existing single dataframe (df_total).

I have two kinds of directories to read:

A:/LogIIS/FOLDER01/"files.csv"

On others there a folder with several files.csv, as the example bellow:

"A:/LogIIS/FOLDER02/FOLDER_A/"files.csv"

"A:/LogIIS/FOLDER02/FOLDER_B/"files.csv"

"A:/LogIIS/FOLDER02/FOLDER_C/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_A/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_B/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_C/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_D/"files.csv"
Simmonds answered 11/5, 2017 at 17:1 Comment(3)
How do you define "last"? By creation date? Modification date? Alphabetical order (which has no standard, by the way)? Or else? (I guess it's last modification date since it looks like web logs)Thach
See file.mtimeMarlborough
Sorry, the last .csv files "by date"Simmonds
M
7

Something like this...

#get a vector of all filenames
files <- list.files(path="A:/LogIIS",pattern="files.csv",full.names = TRUE,recursive = TRUE)

#get the directory names of these (for grouping)
dirs <- dirname(files)

#find the last file in each directory (i.e. latest modified time)
lastfiles <- tapply(files,dirs,function(v) v[which.max(file.mtime(v))])

You can then loop through these and read them in.

If you just want the latest file overall, this will be files[which.max(file.mtime(files))].

Marlborough answered 11/5, 2017 at 18:5 Comment(2)
Tahnks Andrew, but the code only read the path files. Not, read the csv. it is what I'm needing.Simmonds
Once you have lastfiles you can read them all into a single data frame with df <- do.call(rbind,lapply(lastfiles,read.csv,...)) where ... is any other parameters you need for read.csv, depending on the nature of your files. See ?read.csv for details.Marlborough
J
4

Here a tidyverse-friendly solution

list.files("data/",full.names = T) %>% 
  enframe(name = NULL) %>% 
  bind_cols(pmap_df(., file.info)) %>% 
  filter(mtime==max(mtime)) %>% 
  pull(value)
Jaehne answered 16/8, 2019 at 22:43 Comment(0)
Y
1

Consider creating a data frame of files as file.info maintains OS file system metadata per path such as created time:

setwd("A:/LogIIS")
files <- list.files(getwd(), full.names = TRUE, recursive = TRUE)  

# DATAFRAME OF FILE, DIR, AND METADATA
filesdf <- cbind(file=files,
                 dir=dirname(files),
                 data.frame(file.info(files), row.names =NULL),
                 stringsAsFactors=FALSE)

# SORT BY DIR AND CREATED TIME (DESC)
filesdf <- with(filesdf, filesdf[order(dir, -xtfrm(ctime)),])
# AGGREGATE LATEST FILE PER DIR
latestfiles <- aggregate(.~dir, filesdf, FUN=function(i) head(i)[[1]])

# LOOP THROUGH LATEST FILE VECTOR FOR IMPORT
df_total <- do.call(rbind, lapply(latestfiles$file, read.csv))
Yesseniayester answered 11/5, 2017 at 18:43 Comment(2)
Thanks for the answer @Parfait. But, the target after read the directory is read the csv files, then stack up them building a single data frame. Sorry, if my question wasan't clear.Simmonds
I can't quite understand your second sentence. But this worked on my end reading from various directories. In fact, I have to thank you for the question as such a script is being utilized in my library. I didn't know how far file.info goes!Yesseniayester
T
1

Here is a pipe-friendly way to get the most recent file in a folder. It uses an anonymous function which in my view is slightly more readable than a one-liner. file.mtime is faster than file.info(fpath)$ctime.

dir(path = "your_path_goes_here", full.names = T) %>% # on W, use pattern="^your_pattern"
  (function(fpath){
    ftime <- file.mtime(fpath) # file.info(fpath)$ctime for file CREATED time
    return(fpath[which.max(ftime)]) # returns the most recent file path
  })
Tunis answered 13/12, 2021 at 11:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.