I am using furrr
which is built on top of future
.
I have a very simple question. I have a list of files, say list('/mydata/file1.csv.gz', '/mydata/file1.csv.gz')
and I am processing them in parallel with a simple function that loads the data, does some filtering stuff, and write it to disk.
In essence, my function is
processing_func <- function(file){
mydata <- readr::read_csv(file)
mydata <- mydata %>% dplyr::filter(var == 1)
data.table::fwrite(mydata, 'myfolder/processed.csv.gz')
rm()
gc()
}
and so I am simply running
listfiles %>% furrr::future_map(., processing_func(.x))
This works, but despite my gc()
and rm()
calls, the RAM keeps filling up until the session crashes.
What is the conceptual issue here? Why would some residual objects remain somehow in memory when I explicitly discard them?
Thanks!
rm()
is not doing anything for you. You need to tellrm()
what to remove. For examplerm(mydata)
. – Premonitionn
instances of R causes you to run out of memory, tryn-1
orn-2
instances of R. Doing things in parallel can decrease run-time, but always increases CPU and memory usage. (Or is there something else I'm missing in your workflow?) – Subaxillary