Lock file when writing to it from parallel processes in R
Asked Answered
F

2

11

I use parSapply() from parallel package in R. I need to perform calculations on huge amount of data. Even in parallel it takes hours to execute, so I decided to regularly write results to a file from clusters using write.table(), because the process crashes from time to time when running out of memory or for some other random reason and I want to continue calculations from the place it stopped. I noticed that some lines of csv files that I get are just cut in the middle, probably as a result of several processes writing to the file at the same time. Is there a way to place a lock on the file for the time while write.table() executes, so other clusters can't access it or the only way out is to write to separate file from each cluster and then merge the results?

Felicific answered 6/12, 2013 at 13:25 Comment(2)
I think basically no. I've encountered the same problem when attempting to write results from many different R sessions on a cluster to the same results file. What I do instead is write all results to separate files and run a quick script at the end to read in all those files and combine them into a single file in a single R session to avoid the problem of too many concurrent writes. I also delete all the intermediate files.Adalbert
yep, file locking is an OS thingyBlackwood
F
0

It is now possible to create file locks using filelock (GitHub)

In order to facilitate this with parSapply() you would need to edit your loop so that if the file is locked the process will not simply quit, but either try again or Sys.sleep() for a short amount of time. However, I am not certain how this will affect your performance.

Instead I recommend you create cluster-specific files that can hold your data, eliminating the need for a lock file and not reducing your performance. Afterwards you should be able to weave these files and create your final results file. If size is an issue then you can use disk.frame to work with files that are larger than your system RAM.

Fayre answered 3/12, 2020 at 10:39 Comment(0)
B
0

The old unix technique looks like this:

`#make sure other processes are not writing to the files by trying to create a directory: if the directory exists it sends an error and then tries again. Exit the repeat when it successfully creates the lock directory

repeat{ 
        if(system2(command="mkdir", args= "lockdir",stderr=NULL)==0){break}
    }
write.table(MyTable,file=filename,append=T)

#get rid of the locking directory

    system2(command = "rmdir", args = "lockdir") 

`

Bubal answered 8/6, 2022 at 9:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.