I am aware there are a multitude of questions about running parallel for loops in Julia, using @threads, @distributed, and other methods. I have tried to implement the solutions there with no luck. The structure of what I'd like to do is as follows.
for index in list_of_indices
data = h5read("data_set_$index.h5")
result = perform_function(data)
save(result)
end
The data sets are independent, and no part of this loop depends on any other. It seems this should be parallelizable.
I have tried, e.g.,
"@threads for index in list_of_indices..." and I get a segmentation error
"@distributed for index in list_of_indices..." and the code does not actually perform the function on my data.
I assume I'm missing something about how parallel processes work, and any insight would be appreciated.
Here is a MWE:
Assume we have files data_1.h5, data_2.h5, data_3.h5 in our working directory. (I don't know how to make things more self-contained than this because I think the problem is arising from asking multiple threads to read files.)
using Distributed
using HDF5
list = [1,2,3]
Threads.@threads for index in list
data = h5read("data_$index.h5", "data")
println(data)
end
The error I get is
signal (11): Segmentation fault
signal (6): Aborted
Allocations: 1587194 (Pool: 1586780; Big: 414); GC: 1
Segmentation fault (core dumped)