I have a batch process running under java JDK 1.7. It is running on a system with RHEL, 2.6.18-308.el5 #1 SMP.
This process gets a list of metadata objects from a database. From this metadata it extracts a path to a file. This file may or may not actually exist.
The process uses the ExecutorService (Executors.newFixedThreadPool()
) to launch multiple threads. Each thread runs a Callable that launches a process that reads that file and writes another file if that input file exists (and logs the result) and does nothing if the file does not exist (except log that result).
I find the behavior is indeterminate. Although the actual existence of the each of the files is constant throughout, running this process does not give consistent results. It usually gives correct results but occasionally finds that a few files which really do exist do not. If I run the same process again, it will find the files that it previously said did not exist.
Why might this be happening, and is there an alternative way of doing that would be more reliable? Is it a mistake to be writing files in a multithreaded process while other threads are attempting to read the directory? Would a smaller Thread Pool help (currently 30)?
UPDATE: Here is the actual code of the unix process called by the worker threads in this scenario:
public int convertOutputFile(String inputFile, String outputFile)
throws IOException
{
List<String> args = new LinkedList<String>();
args.add("sox");
args.add(inputFile);
args.add(outputFile);
args.addAll(2, this.outputArguments);
args.addAll(1, this.inputArguments);
long pStart = System.currentTimeMillis();
int status = -1;
Process soxProcess = new ProcessBuilder(args).start();
try {
// if we don't wait for the process to complete, player won't
// find the converted file.
status = soxProcess.waitFor();
if (status == 0) {
logger.debug(String.format("SoX conversion process took %d ms.",
System.currentTimeMillis() - pStart));
} else {
logger.error("SoX conversion process returned an error status of " + status);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return status;
}
UPDATE #2:
I have tried the experiment of switching from java.io.File.exists() to java.nio.Files.exists() and this seems to provide more reliability. I have yet to see the failure condition over multiple attempts, where as before it occurred approximately 10% of the time. So I guess I'm looking to know whether the nio version is somehow more robust in how it handles the underlying File System. This finding was later proven false. nio is no help here.
UPDATE #3: Upon further review I still find the same failure condition occurring. So switching to nio is not a panacea. I've obtained better results by reducing the thread pool size of the executor service to 1. This seems to be more reliable and there is that way no chance of one thread reading the directory while another thread is launching a process that writes to the same directory.
One further possibility that I have not yet investigated is whether I would be better served by putting my output files in a different directory than the input files. I put them in the same directory because it was easier to code, but that may be confusing things, since the output file creation is affecting the same directory as the input directory scan.
UPDATE #4: Recoding so that the output files are written to a different directory than the input files (whose existence is being checked for) does not particularly help things. The only change that helps things is having an ExecutorService thread pool size of 1, in other words, not multithreading this operation.
ps --forest aux | grep java
, there would not be 30 jvm processes, just one. – DefoliateRuntime.exec
, you are most definitely not spawning processes, but threads instead. Multiprocessing != multithreading. – Defoliate