If I need to run many serial programs "in parallel" (because the problem is simple but time consuming - I need to read in many different data sets for the same program), the solution is simple if I only use one node. All I do is keep submitting serial jobs with an ampersand after each command, e.g. in the job script:
./program1 &
./program2 &
./program3 &
./program4
which will naturally run each serial program on a different processor. This works well on a login server or standalone workstation, and of course for a batch job asking for only one node.
But what if I need to run 110 different instances of the same program to read 110 different data sets? If I submit to multiple nodes (say 14) with a script which submits 110 ./program# commands, will the batch system run each job on a different processor across the different nodes, or will it try to run them all on the same, 8 core node?
I have tried to use a simple MPI code to read different data, but various errors result, with about 100 out of the 110 processes succeeding, and the others crashing. I have also considered job arrays, but I'm not sure if my system supports it.
I have tested the serial program extensively on individual data sets - there are no runtime errors, and I do not exceed the available memory on each node.